V3V4 trim length and length parameters for extract-reads

Deni_Ribicic · June 13, 2019, 1:10pm

Hi Nicholas,

I have to say that I am puzzled as well regarding training the classifier, and that the training tutorial is not really into the detail explaining different parameters- or at least it is hard for me to understand it.

Just a short info what I want to do- I'd like to train classifier based on pro341F and pro805R primers and Silva-132 db. My sequences are paired-end with each read of about 300 bp.

The total DNA stretch which should be covered by these primer pairs is 464 bp, taking off 20 bp from each paired read (5'---3', quality/primer trimming) would give me about 424 bp after the reads are paired.
Does this mean that I can use following parameters -p-min-length of 400 and -p-max-length of 450 in order to extract sequences for training which would be targeting this previously calculated region (424 bp)? This is how I understand what these two different parameters are doing.

Now, where I am getting puzzled is when looking at provenance of the silva classifier generated by you guys (silva-132-99-515-806-nb-classifier.qza).
You have used min_length of 50 and max_length of 0, meaning that no sequences would be extracted since max_length 0? After my understanding I would expect here you to use something like min_length 200 and max_length say 300, since the region covered by the primer pairs is about 290 bp.

Would really appreciate if this could be explained in more detail, since this is actually imo the most important part/step of the pipeline.

Best,
Deni