Annotation Challenges with COI Amplicon Analysis in QIIME2

Hi @mengpf0409,

These are great questions!

This issue can be partially explained as outlined here, here, here and here. That is, the more strict your --p-perc-identity 0.9 setting, the fewer reference sequences will match. This will result in a smaller pool (this is tied to the --p-min-consensus parameter) from which to calculate the lowest common ancestor (LCA) consensus taxonomy. Thus resulting in a consensus taxonomy that is more broad, or at at higher taxonomic rank.

When using --p-perc-identity 0.8, you are allowing the retention of more hits to the reference sequences, increasing the pool of taxonomic information that can contribute to the LCA consensus taxonomy, again tied to the --p-min-consensus parameter. Kind of a "majority-rule" approach.

Keep in mind, for some datasets & primer-pair choice, it is not unusual for short-read amplicon data to be unable to classify taxa to the genus level. In some cases the more specific classification is the result of incorrect over-classification. That is, returning a more specific identification than it should be able to.

Also, feature-classifier classify-sklearn works a differently than feature-classifier classify-consensus-blast. I'd highly recommend reading the following papers:

3 Likes