QIIME 2 and Probe Design

JMJ_Lapage · October 9, 2020, 7:50am

Hello all, I am new here and new to QIIME 2. My -omics knowledge is very rusty and out-of-date but I have been getting back on the horse with the excellently written tutorials.

My new project will involve taking 16S rRNA-Seq data and designing CLASI-FISH probes against the species found within the population, which will then be applied to confocal microscopy. Probes need to have high affinity for their target species and minimal affinity for non-target species.

The process of probe design would probably involve making a multiple alignment of all the 16S sequences, going through each target species/OTU iteratively to find regions of n bases that are dissimilar to the other species/OTU, and blasting these against database whole genome sequences to score their likelihood for off-target binding.

I have three questions at this stage:

Is there already a plugin that has this feature, or a way to replicate this functionality with a bit of a hack?
Is it silly to consider doing this within QIIME 2, and if so should I just be exporting to elsewhere
If there's no way to do it, but it seems like a worthwhile module to make, will this be a particularly difficult workflow for me (a fairly experienced programmer) to craft into a new module?

Nicholas_Bokulich · October 9, 2020, 8:56am

Welcome @JMJ_Lapage!

I think existing plugins have some of this functionality, but you would need to add a plugin/method to fill in the gaps. Here's what I think:

q2-alignment will support this, via mafft

this is the part that I do not think is available in existing plugins. Wrapping other existing software packages for probe design would be an opportunity here.

q2-feature-classifier has BLAST- and VSEARCH-based taxonomy classification methods that would probably work for this. On the other hand, qiime quality-control exclude-seqs is probably exactly what you are looking for, and it even has a primer alignment mode (close enough to a probe?). Basically, you'd give it a list of probes (as a FeatureData[Sequence] artifact, either a fasta file imported to QIIME 2 or the output from your novel primer finder method), and it would output those that hit the reference genomes and those that miss.

By the way (shameless plug alert), those non-target reference genomes (and/or reference 16S rRNA genes from your target species) could be assembled using RESCRIPt so that everything in your workflow is preserved in provenance.

No I do not think it's silly... this would be a bit of a convoluted workflow and the advantage of doing it in QIIME 2 is that you can leverage existing functionality, as well as preserve all workflow steps in provenance.

On the other hand, other probe design methods exist in the outer world and so could probably be used without bothering to write a new one (though, as I said above, you could just wrap one in a plugin)

QIIME 2 is designed to be extensible, and we have written extensive documentation, etc, to make it easy for other programmers to write QIIME 2 plugins. Creating the plugin is the easy part, writing the methods inside should be the more difficult part so if you are confident in that part then creating a QIIME 2 plugin for it should be straightforward

You can see the developer documentation here: https://dev.qiime2.org/latest/

And you are more than welcome to ask development questions on this forum... we even have a "developer support" channel for it, so we can help you along the way

JMJ_Lapage · October 12, 2020, 9:50am

Hi @Nicholas_Bokulich, thank you for your comprehensive response. I think you are right; there might be some merit for me to develop a probe design plugin that wraps a pre-existing method. I'm fairly confident that I can handle that, particularly since there's a community to support it here. Even better if I design it in such a way that the method of probe design is modular as well, so that I can try a few different methods and leave the door open for people to add further approaches.

You are right that primer design is basically the same as probe design: it's all about getting specific matches. The very slight differences are that in CLASI-FISH, we don't really mind off-target matches within the same species we are probing for, so long as they are not present in other species (that's an edge case though), and obviously, I need to balance a whole matrix of potential conflicts rather than just a pairwise relationship.

Thanks again, I will look into what it will take to make a wrapper plugin, and read up on the methods you mentioned.