Massive loss of reads during denoising?

We recently have been learning to use Qiime 2 to replace our Qiime 1 method. My supervisor ran his Qiime 1 OTU clustering method on the same data set that I am using Qiime 2 for comparison of methods. For the most part, these methods produce comparable results. Our samples are low biomass environmental samples taken from wells, lots of singlets and doublets etc.

My reverse reads showed very poor quality for this data set, so I opted to use only forward reads.

After importing demultiplexed sequences, I used cutadapt to trim primers, resulting in the following reads per sample:

I passed the foward reads through DADA2 using the following command:

qiime dada2 denoise-single \
   --i-demultiplexed-seqs reads_qza/reads_trimmed.qza \
   --p-trunc-len 285 \
   --o-representative-sequences dada2_single_output/dada2_rep_seqs.qza \
   --o-table dada2_single_output/dada2_table.qza \
   --o-denoising-stats dada2_single_output/dada2_stats.qza

I expect these samples to have high diversity, but these are the resulting features:

I tried a method more similar to our Qiime 1 method. I dereplicate my sequences and cluster them using VSEARCH, this results in many more features.

Could this be due to the quality of the reads? It's not clear to me why DADA2 is discarding so many reads. I apologize if I've used any incorrect language in this post. I am still new to bioinformatics.

It's a bit confusing but DADA2 will actually discard any read shorter than the trunc-len parameter. In your case, it looks like most of your forward reads are a bit shorter. Setting this parameter to somewhere in the 260-280 range should give you many more reads passing the filter and more ASVs.


Hi @April_Oliver ,
I think @cdiener has got it exactly right here. If you do however run into additional problems with DADA2 output, it would be very useful if you could provide the dada2 stats summary results as well, you can get visualization of that by running:

qiime metadata tabulate \
  --m-input-file stats-dada2.qza \
  --o-visualization stats-dada2.qzv

Thanks so much! I just re-ran it, and I have way more features as I was expecting. Looking at the demux sequence length summary, the majority of my reads are 284 bp, which is why they were discarded. I'll keep that in mind for the future.


Awesome, glad to hear it worked :tada:


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.