dada2 parameters setting problem : --p-trunc-len-

Susun · March 2, 2021, 3:48pm

Hello!
I have some confusion when I use dada2 to denoise.
My target region is V4-V5 of the 16S gene with primers 515F and 907R, and 2x300 bp PE reads.
This is the quality plot of my sequences.

I tried different truncation length (–p-trunc-len-f and --p-trunc-len-r) and my results confuse me.
I chose truncation length according to Q20, but I got little very few sequences.
when dada2 parameters were set as following:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 0 --p-trim-left-r 0
--p-trunc-len-f 212 --p-trunc-len-r 208
--o-table dada2-table.qza
--o-representative-sequences dada2-rep-seqs.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 0
this is result:

But the interesting thing is when set --p-trunc-len-f 200 --p-trunc-len-r 200

The number of sequences I got increased
When set --p-trunc-len-f 196 --p-trunc-len-r 196, I think there is no overlap, confusingly, I got more data

I learned that there must be at least 20 nts overlap, so, I don’t understand why my results with less overlap but more sequences.
Thank you!

cdiener · March 3, 2021, 2:19am

DADA2 will also discard all sequences that are not at least as long as trunc-len. For those longer sequencing protocols (300bp) you often get a small fraction of reads that weren't extended completely and are shorter. A long trunc-len will discard more of those. However, that seems to be only a pretty small fraction in your case, so I would not worry about that too much.

The merging is odd though. The easiest explanation would be that the primers are actually different ones. I haven't seen 907R before. The V4-V5 primers are usually 515F-926R. Maybe you got a dataset with only the V4 primers (515F-806R)? You can double-check by aligning a small portion of your reverse reads at Alignment, Classification and Tree Service. This will give you the position in the matching 16S gene.

Susun · March 3, 2021, 10:27am

Thank you, I seem to have found the problem

Mehrbod_Estaki · March 3, 2021, 7:25pm

Hi @Susun,

This has been changed to 12 nt in q2-dada as of several updates ago.
In addition to @cdiener's excellent points, I'll add that up to the point that the overlap is still sufficient to merge, truncating more tends to always increase read outputs. This is because truncating almost always is discarding poor quality tails and so more reads pass the initial filtering. You can see this at the 2nd column with the increased % of reads passing the filtering. Those extra reads are then passed on through to the end.

system · April 4, 2021, 1:25am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.