Hi QIIME 2 team and community,
I am processing paired-end 16S amplicon data in QIIME 2 (version 2025.4, conda installation on Ubuntu/WSL). After demultiplexing, primer trimming with cutadapt trim-paired, and running DADA2 with the following parameters:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux_trimmed.qza \
--p-trim-left-f X --p-trim-left-r Y \
--p-trunc-len-f 240 --p-trunc-len-r 245 \
--p-n-threads 0 \
--o-representative-sequences rep-seqs.qza \
--o-table table.qza \
--o-denoising-stats denoise-stats.qza
I summarized the results in denoise-stats.qzv (file attached here).
From the exported TSV, I noticed:
-
Input reads per sample: ~40k–65k
-
Filtered reads: ~87–91% of input (looks good)
-
Merged reads: ~70–80% of input (also looks reasonable)
-
Non-chimeric reads: only ~25–38% of input remain
So, the step with the largest drop is chimera removal.
My questions:
-
Is it normal that the percentage of input non-chimeric is only ~25–38%?
-
Are there recommended thresholds or “typical ranges” for this metric in QIIME 2 workflows?
-
Could this indicate that my truncation lengths (240/245) are too permissive and retaining low-quality tails that lead to false chimeras? Or is this more likely due to inherent properties of the dataset (PCR artifacts, etc.)?
-
Would adjusting truncation parameters (e.g., trimming more aggressively at the 3′ ends) usually help improve non-chimeric retention, without sacrificing merging efficiency?
Any guidance or examples from your experience would be greatly appreciated!
Thanks a lot in advance,