DADA2: losing a high number of reads after filtering

Leo_Opolain · September 13, 2021, 3:46pm

Are you already have answers on the last question? They are also are very interesting for me. Thanks for replies.

DADA2: loss too many reads after filtering

Hi, timanix!

Thanks for your suggestions! I have two questions now after I read through some other posts.

So based on the below post, it explains --p-trunc-len discard sequence greater than the value it gives, which in my case, it discard sequence longer than 244 instead of discard sequence shorter than 244? If I understand correctly? I got confused by this post and your reply, which is opposite answer to me.

DADA2 truncation - #7 by ek_97

DADA2 truncation

Wouldn't the --p-trunc-len discard all the sequences greater than this value since this command would be truncating the right side? So if I set the value to 150, wouldn't that get rid of all the bases above the 150 position, and not below?

You're exactly correct, good catch! I'll make sure we get that documentation updated.

I actually run it again with lower truncation values, like

--p-trunc-len-f 230
--p-trunc-len-r 240 \

and

--p-trunc-len-f 240
--p-trunc-len-r 240 \

All get worse results, and with the lower value I give to the forward reads, I only got less than 10% reads shown as non-chimeric.

In case you also need to check the stats file, I will upload it here.
denoising-stats_230_240.qzv (1.2 MB) denoising-stats_240_240.qzv (1.2 MB) employee monitoring denoising-stats_no_truncation.qzv (1.2 MB)

What other parameters I can set to get rid of low-quality reads? Also, what causes the worse results after I lower truncation values since it should be better after I truncate the bad quality reads?

Again, really appreciate your help!

timanix · September 14, 2021, 7:14am

Hello!
In that particular case, topic starter worked with v3-v4 region. Those amplicons are relatively large and it is preferable to sequence them as 300X300 paired reads. But it looks like library was sequenced as 250X250 paired reads and truncation resulted in insufficiency of overlapping bases and a lot of reads failed to merge.
So, one need to account for overlapping region to decide which truncating parameters are the best for the dataset in question.

system · October 15, 2021, 1:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.