Hello fellow qiimers! That's my first post here so I hope that it'll follow all community guidelines
I'm using :qiime2: to perform 16S metagenomic analysis. Recently I wanted to review all possible options for classifying tools (been using vsearch classifier previously) and when I've tried to incorporate other commands from feature-classifier
into my workflow I've received following error message from taxa barplot
tool:
Feature IDs found in the table are missing from the taxonomy: {'b7d56027f65de48259ecde66b95a2247', '452c2a0920b0890c86a02f3079c2f756'}
Upon further inspection I've found out that number of features doesn't match for input (1.8 MB) and output (754.2 KB) for classifier method. Indeed both of two missing ASVs belong to species that couldn't be found in my curated silva database, since they belong to two separate fungal organisms, and my database is targeted at V3-V4 regions of bacterial 16S gene.
Previously I've found vsearch classifier to treat those sequences as unassigned and leave them be (example (646.3 KB) containing aforementioned IDs as "Unassigned", which is an output of classify-consensus-blast
method, with exception of using unmodified silva database), but classify-hybrid-vsearch-sklearn
classifier, as well as classify-consensus-blast
method discard them, which may later lead to consistency issues, as presented above - on example of taxa barplot
command.
I believe that there is some way around it, using various filtering methods, but I wasn't able to come up with anything elegant yet.
What I find most interesting in this situation is that using those two commands that make up hybrid classification method separately doesn't create such problems, but I suppose that those were heavily modified in order for them to work combined.