Dear All,
I have started QIIME2 analysis for HMP dataset. I checked the quality of fastq files using fastqc. In my samples, poor quality reads are distributed randomly. And all the fastq files have different distribution of quality scores. But they do not have adaptor contamination.
But when I do DADA2 analysis, We have two options: --p-trim-left m , which trims off the first m bases of each sequence, and --p-trunc-len n which truncates each sequence at position n.
How do I give remove poor quality reads which are present randomly?
If I do filtering quality score separately using "qiime quality-filter q-score" and perform DADA2, Sequence counts will drop more drastically.
How do I resolve this?
That is sort of the point of dada2 — to find and correct likely errors within the sequences. But it only does so after trimming away the noisy sections of the reads. So truncating the 3' ends and trimming the 5' ends to remove those sections is the recommended approach. Errors will of course occur randomly scattered throughout, but the goal is just to remove the positions where errors are most likely to occur, based on your quality profile.
Those fastqc screenshots are a bit difficult to read and appear to be sample specific, not the average quality plots that QIIME 2 produces, so that muddies the waters — but more importantly the forum contains lots of past discussion about trimming lengths decisions for dada2 so I recommend reading through to get an idea of what others are doing.