Taxonomic assignations

Doing taxonomic assignations with sklearn and a fit classifier is easy to obtain incongruencies. For example, two sequences very similar, being probably the same taxon can be classify as different taxons if one is very similar to a taxon, and the other is more similar to one in the database classified as uncultured cyanobacterium, for example or to a sequence that was bad assigned. Then they are classified as different being the same taxon. This could be improved if the classification were made based not only in the similarity to the database but also in a phylogenetic tree that groups similar ASVs.
Another thing that help me to assign taxonomy is to compare the ASVs (or OTUs) sequences to a list of sequences obtained from cultures isolated from the studied place. is it possible to do these with any of the pluggins implemented in qiiime2?, if not, it will usefull

1 Like

Hi @mamunnoz,

Have you looked at the qiime library? Specifically q2-clawback and q2-fragment-insertion? Have you looked at the other methods of feature classification like consensus blast?



So in other words your classifications are only as good as your reference database. Junk in, junk out as they say. You could filter the reference database to remove these ambiguous and inaccurate classifications. Misannotations do exist in the reference databases, and some have even suggested quite high rates of misannotation. Of course, cleaning that up is a monumental task, but that is where the chief issue here exists...

Great idea! Maybe you should develop a novel method. However such a method already exists in QIIME 2 — see the experimental classifier in q2-fragment-insertion. Problem is, that classifier does a really poor job based on our benchmarks.

Check out q2-clawback as @jwdebelius suggested. It would be possible to compile taxonomic frequencies based on culture data and feed that to q2-clawback/q2-feature-classifier to perform the type of analysis you mention — but basing this on culture data alone would be error prone. We should discuss some more if you want to attempt to validate a new method for q2-clawback using culture data!

Hope that helps!


Thank you for your help. Of course that performing taxonomic assignations basing on culture data alone would be error prone, because not all the microorganism can be cultured and a lot of them are very difficult of isolate, but this method is useful to assign part of the features very accurately and It would be interesting to use it in combination with other method. Now I do it manually (by comparing sequences), but it is time consuming and implies making the taxa plots also manually. Anyway I will try q2-clawback