I’m posting a simple pipeline that I developed to quickly validate the taxonomic assignments of a new QIIME 2 taxonomic classifier (described at https://github.com/gavinmdouglas/taxa_sanity_check).
I wanted to make a quick way to run sanity checks on QIIME 2 taxonomic assignments, because I recently came across a major issue with a custom taxonomic classifier I had created (see original issue) with qiime2-2019.7. Essentially all ASVs were being classified as the genus Alteromonas despite the 16S sequence being extremely different and matching different phyla based on BLAST.
It turns out that using these more restrictive options for
qiime feature-classifier extract-reads fixed the major problem of Alteromonas popping up, which was also discussed in a similar thread.
--p-identity 0.9 \ --p-min-length 300 \ --p-max-length 500 \
However, I wanted to make sure there weren’t any other issues specific to rarer lineages that might not be easy to notice (i.e. if similar misclassification issues were happening for a subset of ASVs they wouldnt be as easy to notice). The simple pipeline I came up with is based on ordering all ASVs based on the number of taxonomic label mismatches between QIIME 2 and an independent assignment approach. This lets you run a quick sanity check that there aren’t any major misclassification errors (which there weren’t in my case after using the new
qiime feature-classifier extract-reads options!) and I think others might find this useful too.