Parameters surrounding training of feature-classifier (16S V4-V5).

Dear Colleagues,

Thank you for all the help with my QIIME2 analyses already. As most posts/tutorials have outlined, it is important to train your own feature classifier on your own data. Through this process, I encountered a few parameters that I could not make sense of.

Specifically: From where am I truncating which sequences? Is this parameter constant or should I change it for the V4-V5 regions (as these are longer reads)?

What is the importance of the min/max length?

Please find below the code that I ran to train my feature classifier.

Kind regards,


The truncation is optional (and probably will not impact results greatly). The idea here is that you can truncate to the exact same position at which your query sequences are truncated (e.g., if you truncated to a set length during dada2 denoising).

Sequences shorter/longer than these settings are discarded. This is to discard outliers that are most likely false positives or junk sequences.

I recommend checking the help documentation for some more details:

qiime feature-classifier extract-reads --help

Good luck!


For Johann, or future folks who find this post, check out the full RESCRIPt tutorial!

There is lots of discussion about how and why this process works.