I have been looking at the relative abundance of taxa from the taxa bar plots (using silva database). I’m only really interested up to level 4 (/order level). However, I’ve noticed incorrect assignment at level 4 and above (a mix of things that should not be there and families put with an order they don’t belong to). I’d like to use BLAST to confirm the identity of orders I’m interested in, but how can I do this without BLASTing every single representative sequence?
And even if I did that, I don’t understand how to see how the features have been classified (for taxonomy), and which features are present in each sequence.
Any help appreciated.
families in an order they don’t belong to indicates an issue with your reference database.
QIIME 2 does have alignment-based classifiers. You can use
classify-consensus-blast to align against a reference database.
Note that NCBI BLAST will not necessarily give you the “right” answer either. But it seems like maybe the best approach for you is to figure out which sequences do not “belong” (and hence you suspect misclassification)… just use NCBI BLAST to confirm the order-level identity of those sequences.
Hi thanks for your reply,
How would you find those sequences that “don’t belong”? There are the ones I have spotted as being outright classifications that don’t exist from the (Silva) database that I want to investigate. I can see how I can do that using classify-consensus-blast so thank you
But, I’d ideally like a general picture of how consistent the classification would be if I used a different reference. There are one or two taxa that were unexpected, but to be honest I don’t really know what I’m expecting to see.
If you assemble a list of “expected” taxa it would be very simple to screen against this list to identify a list of “unidentified” taxa. But that’s the problem: this requires expert knowledge of your sample types and there is no “easy” way to do this without that knowledge.
Taxonomic annotations are often different between different databases, so even comparing those classifications can be a challenge. So this is actually a much more challenging question that it would seem at first thought.
Is there are way to see which feature IDs have been classified as each taxa? ie something like a table with taxa and feature IDs
Yes. Have you read the tutorials, e.g., here?
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.