ITS data analysis : How to determine which dada2 analysis is the best

ChrisKeefe · January 22, 2021, 12:54am

This is an important consideration with ITS, and it's great that you're thinking about it. How much length variation is there in your sequences?

The impact of these varying lengths is often significant in cases of readthrough (where you capture non-target nucleotides on "the outside" of the primer), and in cases where sequences are kept or dropped unfairly due to length.

Readthrough can be managed by trimming properly with cutadapt. This is discussed at some length here. DADA2 could be involved in dropping sequences, but your DADA2 results show a high rate of sequence recovery, with no prominent bottlenecks where you're losing a lot of reads.

I agree. You seem to be getting similar results from single-end and paired end runs, with slightly different parameters.

You haven't answered my question above, and I'm not entirely clear on what your concern is with these results. Are you just uncertain about how to select DADA2 parameters? Do you think something problematic is happening in DADA2 specifically? Or just in general? If so, what do you think might be causing the unexpected results?