Losing reads to filtering DADA2

saatkinson · April 25, 2022, 6:43pm

Hello,

I'm using Q2 v. 2022.2 to analyze a 16S dataset sequencing the V3/V4 region of fecal samples. 2x300 paired end reads were generated. I'm having a problem with the DAD2 step filtering out 55-60% of my reads at the filtering step.

I first looked for primers/adapters using the grep commands from a previous forum post and found greater than 25% of reads had primers. Therefore, I trimmed with cutadapt:

qiime cutadapt trim-paired --i-demultiplexed-sequences demux_paired_end.qza --p-cores 10 --p-front-f CCTACGGGNGGCWGCAG --p-front-r GACTACHVGGGTATCTAATCC --output-dir cutadapt_out

Here is the resulting summary file showing quality scores:
vis_trimmed_seqs.qzv (320.1 KB)

I used 284 as the forward cutoff and 221 as the reverse cutoff. If I mathed correctly that should leave a 41bp overlap for DADA2 (284 + 221 - 464 = 41).

qiime dada2 denoise-paired --i-demultiplexed-seqs cutadapt_out/trimmed_sequences.qza --p-trunc-len-f 284 --p-trunc-len-r 221 --p-n-threads 10 --output-dir dada2_out_trimmed --verbose

What I'm finding is that 55-60% of the reads are being kicked out during the filtering step. I have read some of the previous forum posts (such as this and this), but their issues seemed to stem from poor quality of the reverse reads/poor truncation values. From the vis_trimmed_seqs it looks to me that the read quality is ok in general. According to the summary length section 98% of my forward reads are 285bp and 98% of reverse reads are 301bp. This leads me to believe my cutoffs are ok, so I'm not sure my case fits into the previous forum posts.
Here are my denoising stats: vis_denoising_stats.qzv (1.2 MB)

Is this just as good as I'm going to get and the reads just aren't great quality or have I done something wrong?

Any help would be appreciated!
Thanks,
Samantha

llenzi · April 26, 2022, 8:41am

Hi @saatkinson,

The denoising stat is showing that your big drop happen at the filtering step, which is the first step for the dada2 denoise process. Looking at your trimmed quality plot, it may be due to the drop in quality that is happening soon earlier than the 284bp position you have set for forward reads. Given that your overlap region is long, I would try to trim more aggressively the forward reads, with something like '--p-trun-len-f 260'. An overlap of 20bp is still good enough for the latest dada2, and should rescue lots of reads for you.
Let us know,
Cheers
Luca

system · May 27, 2022, 2:41pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.