Where to trim/truncate reads in DADA2

Continuing the discussion from Why do the DADA2 default setting have such a low PHRED score?:

Hi @joaomiranda,

Upon initial inspection of your quality plot, I would recommend against using your reverse reads, based on the average quality score throughout. That being said, there are some unusual quality scores between the 40-60 mark in your sequences. Can you double check that all of your primers, etc have been removed from your data before moving forward with the de-noising step? You shouldn't be running any de-noising methods until all primers have been removed from your sequences.

Cheers,
Liz

1 Like

Hi @lizgehret ,

I checked with the platform that made my NGS and they confirm that I received my sequences demultiplexed and with the barcodes removed.
So I observed that I have 4.169.369 sequences totally, both forward and reverse, that's a lot of reads. If I come to run a de-noising with both forward and reverse reads, and the filtering step exclude the half of my sequences, I will still have many sequences, what do you think?

In addition where to trim and truncate, I have doubts about use trunc-q and max-ee, if I should use those parametres in my particular case.

Thanks

Hi @joaomiranda,

Yes - I would just be mindful of the average quality score of your reads. Going back to your original question:

Since you've confirmed that there are no remaining barcodes in your sequences and that they have already been de-multiplexed, we can probably assume that you are just seeing poor quality forward reads from ~45-65nt. In that case, you could set your trim/truncate values to cut off that segment of poor quality forward reads, and just utilize something from, say 65nt to anywhere from 160-230nt (depending on the average quality score you'd like to use as your cutoff point).

In response to this question, I'd like to direct you to this forum response which goes over trunc-q in detail, and whether you should utilize that in your analysis.

With regards to max-ee, here is a great forum response with additional reading on the functionality which should also help inform your decision on whether to utilize this parameter.

Cheers,
Liz

2 Likes

Hi @lizgehret

Following your clues I made a denoising with just the forward sequences, so I runned a dada2-single end. And this is my results, as far as I could interpret I don't think I missed much but it's strange, my percentage of input filtered aroun 40% in all of my samples, can you check and give me your opinion if the results are useful?
rep-seqs.qzv (938.5 KB)
stats-dada2.qzv (1.2 MB) table.qzv (1.1 MB)

Additionally I found in this forum > Blockquote Dada2 denoise - why do I have so many reads filtered out - #8 by Jo_mee, recommended by you, that V3-V4 2x250 runs, that is my case, often the reverse reads are not used. Why does this happen? I'm curious to try the denoising with my forward and my reverse reads. For the dada2-single end I used:
qiime dada2 denoise-single
--i-demultiplexed-seqs single-end-demux.qza
--p-trim-left 65
--p-trunc-len 230
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

I'm thinking about applying the same trim and trunc values to my reverse reads, what do you think? It will be useful or would be wasting time?

Thanks

Hi @joaomiranda,

After taking a look at your stats-dada2.qzv file, this doesn't seem unreasonable from the trim/trunc lengths we discussed above.

One point I do want to bring up with my suggestions above is that it was fairly difficult to make out the actual averages in your quality scores from the original visualization you provided in your post. Something to keep in mind when trimming/truncating based on quality scores is that you want to be looking at the average score (i.e. the middle value for each box & whisker plot you'll see when zooming in on your quality plot) rather than the edges of each box, or the whiskers. You could have a small number of outliers for the quality score at any given nt location, which can skew the overall visualization - so making sure you are looking at the average quality score at each nt, and looking for a general decrease in the average can help you to determine where to trim/truncate.

All of that to say, you should ultimately make an informed decision as to where you want to trim/truncate your reads, since you know your data and the analysis you'll be doing on it much better than I do!

That's a great question! This forum response goes into detail on why merging typically fails for V3-V4 2x250 runs. Essentially, you are attempting to utilize a large amplicon length (the V3-V4 region) with 250bp for your forward and reverse reads. This leaves a small margin for error with respect to your overlap region in order to successfully merge your forward and reverse reads. I'll provide a short quote from @Mehrbod_Estaki from his forum response above that dives into the numbers a bit more:

With the most common V3-V4 primers you will have a ~460bp amplicon, but with a 2x250 bp run you will have a maximum of 500bp reads which means there is only 40bp of overlap. DADA2 requires a minimum 12bp overlap for proper merging, otherwise it will toss any reads (both forward and reverse) that it can’t merge. Take into consideration the natural variation of this amplicon length meaning some true taxa would need more than 12bp overlap, and the fact that we need to truncate the poor quality tails of our reads on the 3’ (where merging occurs).

Again, based on the information you provided above, it seems like utilizing your forward reads with denoise-single will most likely be your best bet in this situation.

Cheers,
Liz

2 Likes

Thanks! Grateful for your help

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.