Need help in deciding truncation length for running dada2 and understanding the demultiplexed sequence counts summary output

I have pair-end sequencing data with varying read lengths. I am confused at what position I should truncate my reads. As 240 is minimum sequence length, Do I need to truncate it at 240 to avoid losing data? If I choose a higher truncation length of around to 280 I lose many samples. I do not want to lose any information. The maximum read length is 300.
Please suggest me the best forward and reverse truncation length.

Hi @Anoop_Singh

Welcome to the QIIME2 forum! :man_dancing:
Your sequence quality scores are really good over the entire length from 220 to 300. Hence, I would choose the following values:
Forward sequences: Trim at 220 (-left); Truncate at 300 (-right)
Reverse sequences: Trim at 220 (-left); Truncate at 300 (-right)

You should get a nice feature table with lots of features!! (of course it also depends on samples, sequencing depth etc.)

Let me know if you need any help,
Happy Qiime-ing!
Best.
Anirban

Hi @Anoop_Singh,

I’m going to offer slightly different advice, because I dont think the answer i so clear cut, and your data isn’t great toward the end fo the read.

To some degree, the answer depends on your sequencing technology and hypervariable region. I think DADA2 needs about 12nt to join reads, so you need to think about that in terms of your overlap. …You also may need to consider pre procesing: DADA2 works best on data that has been demultiplexed and had the primers and adapters removed. It doesn’t work well with data that has already been quality fitlered because it relies on the noise in the data to denoise. (Deblur makes different assumptions and can be applied to quality filtered data.) So, a little bit more detective work about the variable length might be good.

My recommendation without additional information would be to trim at 240 in the reverse reads because your quality drops off pretty badly past that point. I’d be wary with the formard reads too, certainly not past 270, but I might just trim at 240. You may need to try it and see where you lose reads. You’re looking for a balance between quality filtering, denosing, and merging.

Best,
Justine

5 Likes

Hi @jwdebelius,
Thank you very much for your helpful reply. I wanted to share some more information with you of my data to be more clear about my question as I am still confused.
The picture given above is a zoomed-in view. For more clarity, I am posting the normal view too.
-Number of Samples: 40
-Illumina paired-end Hiseq sequencing was done Phred64
-16rRNA V3-V4 region.
I have demultiplexed clean data (Primer and adapters removed)
Since my reads have varying length minimum of 240 and a maximum of 300 bp in length. If I choose a truncation length above 240 I lose my data, at 280 I lose many samples out of 40.
Truncation length(reverse) and number samples:
300 bp gives an error
298 bp only a few samples are left
280 bp lose around more than 5 samples
240 bp all samples are retained.

I have used the following command before as suggested by you (All the samples were retained with loss of some reads):-
qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trim-left-f 2 --p-trim-left-r 2 --p-trunc-len-f 270 --p-trunc-len-r 240 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza --p-n-threads 16

Do I need to have the same length for all the reads so that I do not lose my data?
Will it be the right choice to truncate both forward and reverse reads at 240 so that I retain all the reads?

Please also suggest me both trimming and truncation length by looking at the image given below.

Best,
Anoop

Hi @Anoop_Singh,

It sounds like you’ve answered your question. Your reads definately drop in quality on the forward reads toward the 290 position, and you see drops around 240 in the reverse, so if trimming at 240 keeps the samples you’d like to keep, its probably a good solution for now. With denoising, you’re expected to lose reads (I think 50%ish is normal) because you’re only retaining high quality reads that meet a set of specific criteria.

Best,
Justine

3 Likes

Thanks for the help…:smiley:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.