I'm using Q2 v. 2022.2 to analyze a 16S dataset sequencing the V3/V4 region of fecal samples. 2x300 paired end reads were generated. I'm having a problem with the DAD2 step filtering out 55-60% of my reads at the filtering step.
I first looked for primers/adapters using the grep commands from a previous forum post and found greater than 25% of reads had primers. Therefore, I trimmed with cutadapt:
qiime cutadapt trim-paired --i-demultiplexed-sequences demux_paired_end.qza --p-cores 10 --p-front-f CCTACGGGNGGCWGCAG --p-front-r GACTACHVGGGTATCTAATCC --output-dir cutadapt_out
Here is the resulting summary file showing quality scores:
vis_trimmed_seqs.qzv (320.1 KB)
I used 284 as the forward cutoff and 221 as the reverse cutoff. If I mathed correctly that should leave a 41bp overlap for DADA2 (284 + 221 - 464 = 41).
qiime dada2 denoise-paired --i-demultiplexed-seqs cutadapt_out/trimmed_sequences.qza --p-trunc-len-f 284 --p-trunc-len-r 221 --p-n-threads 10 --output-dir dada2_out_trimmed --verbose
What I'm finding is that 55-60% of the reads are being kicked out during the filtering step. I have read some of the previous forum posts (such as this and this), but their issues seemed to stem from poor quality of the reverse reads/poor truncation values. From the vis_trimmed_seqs it looks to me that the read quality is ok in general. According to the summary length section 98% of my forward reads are 285bp and 98% of reverse reads are 301bp. This leads me to believe my cutoffs are ok, so I'm not sure my case fits into the previous forum posts.
Here are my denoising stats: vis_denoising_stats.qzv (1.2 MB)
Is this just as good as I'm going to get and the reads just aren't great quality or have I done something wrong?
Any help would be appreciated!