Trim/trunc length for ITS

Mehrbod_Estaki · July 17, 2018, 8:44pm

One easy way is to simply ask your sequencing facility what has been done to the reads and whether or not primers have been removed from both directions. Usually with 16S data you could just use something like head/tail to look at your fastq files and look for your primers at the beginning of your reads, however with ITS data this is a bit trickier. Due to the large size variation of ITS region it is common to have length reads (300 in your case) that are longer than the amplicon size (ex 150bp) which leads to your reads going into the reverse primers on the 3’. You’ll want to deal with those by removing the opposing primers using cutadapt, just make sure that you use the reverse compliment of your primers.
There’s a good discussion/workflow for a similar case here that might be useful and another one here as well.

The reason why it may seem this way is because there are quite a few factors involved in this decision making process preventing it from having a one size fit all approach, and at the end of the day it really depends on your data and what your end goal is. The dada2 tutorial and moving pictures tutorial do provide some guidance but ultimately you'll have to select them based on your data. For example, a general rule for minimal quality is to truncate where the median score drops below 20 in your quality plots. See here for the definition of these phred-like scores. An additional consideration for paired-end reads is to ensure there is enough overlap between your reads after truncating to ensure proper merging, minimum of 20bp is adequate overlap.

Remember you don't have to set the truncating length to the same for Forward and Reverse reads. So for example in your case I would set the Forward reads to ~295 and the Reverse reads to ~ 280. Reverse reads dipping in quality is very typical of Illumina runs. The good news is that your quality plots look excellent! I'm very jealous! So you probably don't need to worry about truncating and merging issues all that much.

Could you provide us with the links to these discussions? This is somewhat of an untested territory so we wanted to look into this a bit more in detail.

There isn’t currently a separate tutorial regarding the feature-table summaries, is there a particular question you have or area where you’d like more clarification?

Hope this answers some of your questions.