Finding a target organism

Natalia_Bednarska · October 3, 2019, 7:17am

Hi Nicholas,
Thanks for reply. Indeed the 99OTUs from Greene genes worked well after all.
Another question is can I train the classifier to search for one particular species of bacteria , having its sequence? Or whats the best/most efficient way for making a targeted search on a feature table?

colinbrislawn · October 3, 2019, 1:29pm

Hello Natalia,

I think using a untargeted classifier then filtering your table to just the taxa of interest is a good way to do it. In case you don't find the exact Species you want, you could find the taxa that match at the Family and/or Genus level.

If you are only interested one taxa, you could use qiime vsearch cluster-features-closed-reference to align all your reads against just the sequence of the one taxa you are interested in.

Colin

Nicholas_Bokulich · October 3, 2019, 2:08pm

Honestly I would just classify everything in an untargeted fashion and pick through the results.

For a targeted search you could use classify-consensus-blast against a small database of sequences of interest, or even use qiime quality-control exclude-seqs. The latter really just runs a BLAST search under the hood to find query sequences that are within some similarity threshold to a set of reference sequences. You can use a single reference sequence to find all queries that are within some % similarity to that sequence, and then try to classify those with a separate method to verify identity.

Perhaps an even simpler/quicker method would be to use qiime feature-classifier extract-reads on your reference sequence of interest (the species you are looking for) to extract a sequence of the exact same length and target as your query sequences and get the md5 hash of that sequence... then you can just find that ASV in your feature table without classification.

The problem with all of the latter approaches is that you are assuming that the sequence is unique to that species, and hence finding the sequence indicates the presence of your species. That may be valid under many conditions (e.g., synthetic organisms or in well-defined systems) but in most environments is a dangerous assumption to make — hence why an untargeted classification on a full reference database is a safer, better route in my opinion.

(but I am guessing that approach does not hit the species you are looking for, in which case I think it is okay to follow up with a targeted approach to say something like "we did not find species X in our samples, but sequences classified to family Y in this study have close resemblance to those of X, indicating the possibility that these belong to X — sequence similarity within family Y is too close to provide accurate resolution of species within this group to fully differentiate X from closely related species")

system · November 3, 2019, 8:09pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.