Losing high percentage of reads in dada2 denoise-paired

imanamoeba · April 21, 2022, 12:05pm

Hi there,

I am also experiencing a similar problem while using qiime2-2021.11 as I work through my first qiime dataset. My amplicon is the V4-V5 region of the 16S rRNA gene (used 515F/926R) with 300bp pe reads.
They appear to be in Casava 1.8 pe demuxed format, so I have imported them as such.
When I examine the dada2 results, the % of input passed filter is extremely low (>5%) and the % merged and of non-chimeras is correspondingly less than 1% for all samples (see attached) This is despite the quality of the demux file looking quite acceptable and typical (also attached).
I have tried denoising both a trimmed and untrimmed version of the reads ( I was not involved with the sequencing of this data so wasn't sure if adapters and primers had already been trimmed), but this step doesn't make a difference to my loss of reads at the denoising stage.

demux-paired-end.qzv (322.9 KB)
16s_denoising_stats.qzv (1.2 MB)
My pipeline looks like this:

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path raw_data \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path analysis/seqs/demux-paired-end.qza
  
qiime demux summarize \
--i-data analysis/seqs_trimmed/trimmed_sequences.qza \
--o-visualization analysis/visualisations/demux-paired-end.qzv

# View read quality
qiime tools view analysis/visualisations/demux-paired-end.qzv 

qiime dada2 denoise-paired \
--i-demultiplexed-seqs analysis/seqs_trimmed/demux-paired-end.qza \
--p-trunc-len-f 300 \
--p-trunc-len-r 220 \
--p-n-threads 0 \
--output-dir analysis/dada2out \
--verbose

# Inspect denoising results
qiime metadata tabulate \
--m-input-file analysis/dada2out/denoising_stats.qza \
--o-visualization analysis/visualisations/16s_denoising_stats.qzv \
--verbose

qiime tools view analysis/visualisations/16s_denoising_stats.qzv

Hope that provides the key bits of info. Any troubleshooting tips/thoughts are appreciated.
Thanks!
Bonnie

colinbrislawn · April 21, 2022, 10:42pm

Welcome to the forums! :qiime2:

Let's start here: When running dada2, the --p-max-ee for both forward and reverse is set to 2 by default. This filters out reads with cumulative q-scores that imply it contains a total of 2 or more incorrect bases. Trimming off the low quality ends of your reads will reduce the cumulative Expected Errors (ee) and lead to more reads passing the filter.

Because most of your reads are being removed during the filtering step, lets see if we can find trimming settings that keep more of your reads.

Try these settings for dada2:
--p-trunc-len-f 260
--p-trunc-len-r 180

Those are pretty short (especially for the reverse read ), but that will show us that we can trim off enough low quality bases from the ends that the remaining reads will pass the Expected Error filter.

Let us know what you find!

imanamoeba · April 22, 2022, 7:54am

Hi Colin,
Thanks alot for the helpful post. Looks like altering the trunc-len as suggested did help, as the number of reads retained is much higher (around 80% for most, yay!). See attached FYI.
16s_denoising_stats3.qzv (1.2 MB)
I'm going to experiment with slightly less aggressive trimming thresholds to see if I can increase the number of reads that are successfully merged just a little without losing out on % of passed reads. My expected amplicon length is ~450bp so just a little more length would be nice
Cheers,
Bonnie

system · May 23, 2022, 1:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.