Min and max length necessary for feature-classifier extract reads from reference database

taxonomy
feature-classifier

#1

Hello,

I am working on training feature classifiers for my data set using qiime2-2018.11 and working through the associated QIIME2 documentation.

I noticed that when extracting the reference reads, there is a command line to add in the minimum and maximum length for the amplicon.

Is this step necessary? Or is leaving these values as default fine?

In previous versions of the QIIME2 documentation for training feature classifiers, these commands are omitted. So I was curious if these commands are necessary for improved downstream analysis?

Thank you


(Nicholas Bokulich) #2

It is not necessary, but it is recommended.

These parameters were added in a more recent release of QIIME 2 essentially to correct an issue that we were observing with specific databases and specific primers. Some database/primer combinations resulted in simulated amplification of very short sequences that were probably due to non-target amplification (i.e., that amplicon had high mismatch to the primers and probably would not amplify under biological conditions). This had the potential to impact classification results and so should be avoided if possible. There are two ways to avoid: increase the mismatch threshold to prevent simulated amplification of those sequences, or remove excessively short and long amplicons that clearly indicate non-target amplification.


#3

Thank you for the response! I am currently using the 515F and 806R primers. So if I set the min to 50 and max to 350 that should provide enough range to minimize any potential mismatches of the primers?


(Nicholas Bokulich) #4

That sounds fine. For those primers you could probably use an even tighter range, but I don’t have a good threshold on the top of my head.


#5

Great! Thank you very much for your help!