This is an important consideration with ITS, and it’s great that you’re thinking about it. How much length variation is there in your sequences?
The impact of these varying lengths is often significant in cases of readthrough (where you capture non-target nucleotides on “the outside” of the primer), and in cases where sequences are kept or dropped unfairly due to length.
Readthrough can be managed by trimming properly with cutadapt. This is discussed at some length here. DADA2 could be involved in dropping sequences, but your DADA2 results show a high rate of sequence recovery, with no prominent bottlenecks where you’re losing a lot of reads.
I agree. You seem to be getting similar results from single-end and paired end runs, with slightly different parameters.
You haven’t answered my question above, and I’m not entirely clear on what your concern is with these results. Are you just uncertain about how to select DADA2 parameters? Do you think something problematic is happening in DADA2 specifically? Or just in general? If so, what do you think might be causing the unexpected results?
If you haven’t already, please spend some time with the DADA2 preprint, this walkthrough of DADA2 for ITS by DADA2’s creator, and the ITS tutorial I linked above. There are also many in-depth discussions of how to choose DADA2 parameters on this forum, as well as many great discussions about fungal ITS workflows. The search tool is your friend.
Your approach with ITS data may be very different from your approach with 16s (e.g. you may not want to truncate with ITS). Understanding the tools will help you make good choices in both contexts.
Once you have taken some time to learn about DADA2, and have developed more specific questions, please feel free to to post them here.
Here are two questions that may help guide your exploration:
Why do you think your results are bad? (Not why could they be bad, but what evidence makes you think they are bad.)
Why do you suspect DADA2 is at fault?
It takes many steps to produce a taxonomic barplot. Your data has likely been preprocessed in some way, imported, trimmed, denoised, classified using a classifier built on some kind of database, etc. If your results are not what you expected, you will have to consider which of the many steps in the process might be at fault.