Q2-dada2 trimming recommendations for ITS data

Hey folks,
I'm just starting to work with some soil ITS2 data for the first time and am looking for some recommendations regarding pre-dada2 prepping.
Unfortunately the reads I am working with are rather poor in quality, and from my experience with 16S data, my instincts tell me to use the forward reads only and truncate around 100bp.
My first question is, (if this was 16S and not ITS), would my 100bp truncate parameter be too conservative or would that be fitting given the quality plots?

Secondly, The DADA2 tutorial actually recommends NOT setting a truncate parameter for ITS data:

"If using this workflow on your own data: For common ITS amplicon strategies, it is undesirable to truncate reads to a fixed length due to the large amount of length variation at that locus. That is OK, just leave out truncLen. Make sure you removed the forward and reverse primers from both the forward and reverse reads though!"

I was wondering if this recommendation stays true for the q2-dada2 implementation? Why or why not?

1 Like

Hey @Mehrbod_Estaki!

Your data actually looks pretty nice to me. The median score seems pretty reasonable up to ~200 on the forward, with the distributions falling in a pretty linear way.

That is also a perfectly fine way to do it.

Yep, that still applies to our plugin. Here is my rough understanding of the situation:

The reason to not truncate at a common length is to prevent systematic bias against certain clades which may have shorter than typical lengths for your primer-pair.

While Illumina gives you data that is a consistent length (variable-length adapters excluded), the amplicons themselves are not necessarily a consistent-length. Since you must to remove all non-biological sequences from your reads, which means even the reverse-primers on your forward reads and vice-versa, you will end up with sequences which match the (variable) length of your amplicon target/primer-pair. (We have a new cutadapt plugin to help with primer-trimming.)

If a given clade has a very short length, then after stripping off the primers you might be left with X bp. If you were to set your trunc-len at anything greater than X then you would bias your analysis against that clade as all of those reads would be dropped. I do not know enough about ITS2 to say whether 100bp would be sufficiently small to avoid this problem altogether. You can also have this problem in reverse, where the length is longer than your primer-pair and so they cannot be merged.

Here is a GitHub issue with some relevant discussion as well.

Hope that's helpful!

q2view puts the dropbox link in its url and can fetch it again, so you can share pre-baked q2view links like this!

1 Like

Aha! This is the 2nd time today I'm realizing perhaps my truncating parameters have been too stringent since I've been focusing too much on the bottom whiskers of the quality plots rather than the medians. I'll have to re-adjust my approach.

It took me till I read the GitHub link to understand this part! This didn't even occur to me that the forward reads could reach the reverse primers. Fungi are crazy...This makes sense now and I guess this isn't that big of a deal with 16S as most of those amplicons within a primer set are pretty similar in size and long enough that they don't reach the reverse.
I guess I need to figure out if there are taxa detected by our primers that have those very short full ITS2 regions, or as a precaution just try and remove any left-over primers with cutadapt.

I'll keep that in mind for the next post!

As always, thanks for all the help and insight! :pray:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.