ITS analysis: does it make sense to use dada2 trunc-len parameters?

Nicholas_Bokulich · September 12, 2019, 7:49pm

I think this is a really good point. After all, we recommend using q2-cutadapt or q2-itsxpress to avoid read-through on short ITS amplicons. Setting truncate parameters with dada2 will then cause these trimmed reads to be dropped.

But as you say it depends on read length. In the case of the tutorial data (the focus of the original question), the reads are all evidently shorter than the total amplicon length and there is no read-through, judging from the cutadapt results. In cases like these, and especially when using dada2 denoise-single, setting a truncation length can be useful for simplifying quality control and downstream processing.

the minimums in those ranges seem extremely short, much shorter than I have seen reported elsewhere in the literature, and I suspect may be errors in their simulation.

I would add a 4th option to your proposals:

Use q2-itsxpress and/or cutadapt, examine read lengths before and after (to assess how much trimming occurred). Examine the read length distributions to see (a) what truncation lengths are acceptable and (b) if the length distributions make sense (e.g., very short reads could be junk!)
Use dada2.
2a. If denoise-paired, definitely don't use truncation unless if it is needed for read quality purposes.
2b. if denoise-single, test out a reasonable truncation length and pay close attention to the dada2 stats output to make sure you are not losing too many reads during the initial filtering step (if you do, it is not related to pre-trimming if the truncation setting is lower than the minimum trimmed read lengths).