Quality filtering in DADA2

steffi · January 19, 2020, 6:30am

Dear All,
I have started QIIME2 analysis for HMP dataset. I checked the quality of fastq files using fastqc. In my samples, poor quality reads are distributed randomly. And all the fastq files have different distribution of quality scores. But they do not have adaptor contamination.

But when I do DADA2 analysis, We have two options:
--p-trim-left m , which trims off the first m bases of each sequence, and --p-trunc-len n which truncates each sequence at position n.
How do I give remove poor quality reads which are present randomly?
If I do filtering quality score separately using "qiime quality-filter q-score" and perform DADA2, Sequence counts will drop more drastically.
How do I resolve this?

Thank you for your time and consideration

Nicholas_Bokulich · January 22, 2020, 5:31pm

That is sort of the point of dada2 — to find and correct likely errors within the sequences. But it only does so after trimming away the noisy sections of the reads. So truncating the 3' ends and trimming the 5' ends to remove those sections is the recommended approach. Errors will of course occur randomly scattered throughout, but the goal is just to remove the positions where errors are most likely to occur, based on your quality profile.

Good luck!

steffi · February 3, 2020, 10:24am

@Nicholas_Bokulich
Thank you for the reply.

Based on the above mentioned fastq file, what would be the most suitable trimming parameters?

Nicholas_Bokulich · February 3, 2020, 9:52pm

Hi @steffi,

Those fastqc screenshots are a bit difficult to read and appear to be sample specific, not the average quality plots that QIIME 2 produces, so that muddies the waters — but more importantly the forum contains lots of past discussion about trimming lengths decisions for dada2 so I recommend reading through to get an idea of what others are doing.

Good luck!

system · March 6, 2020, 3:52am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.