Massive loss of reads during denoising?

April_Oliver · February 10, 2022, 7:08pm

We recently have been learning to use Qiime 2 to replace our Qiime 1 method. My supervisor ran his Qiime 1 OTU clustering method on the same data set that I am using Qiime 2 for comparison of methods. For the most part, these methods produce comparable results. Our samples are low biomass environmental samples taken from wells, lots of singlets and doublets etc.

My reverse reads showed very poor quality for this data set, so I opted to use only forward reads.

After importing demultiplexed sequences, I used cutadapt to trim primers, resulting in the following reads per sample:

I passed the foward reads through DADA2 using the following command:

qiime dada2 denoise-single \
   --i-demultiplexed-seqs reads_qza/reads_trimmed.qza \
   --p-trunc-len 285 \
   --o-representative-sequences dada2_single_output/dada2_rep_seqs.qza \
   --o-table dada2_single_output/dada2_table.qza \
   --o-denoising-stats dada2_single_output/dada2_stats.qza

I expect these samples to have high diversity, but these are the resulting features:

I tried a method more similar to our Qiime 1 method. I dereplicate my sequences and cluster them using VSEARCH, this results in many more features.

Could this be due to the quality of the reads? It's not clear to me why DADA2 is discarding so many reads. I apologize if I've used any incorrect language in this post. I am still new to bioinformatics.

cdiener · February 10, 2022, 8:02pm

It's a bit confusing but DADA2 will actually discard any read shorter than the trunc-len parameter. In your case, it looks like most of your forward reads are a bit shorter. Setting this parameter to somewhere in the 260-280 range should give you many more reads passing the filter and more ASVs.

Mehrbod_Estaki · February 10, 2022, 8:17pm

Hi @April_Oliver ,
I think @cdiener has got it exactly right here. If you do however run into additional problems with DADA2 output, it would be very useful if you could provide the dada2 stats summary results as well, you can get visualization of that by running:

qiime metadata tabulate \
  --m-input-file stats-dada2.qza \
  --o-visualization stats-dada2.qzv

April_Oliver · February 10, 2022, 8:52pm

Thanks so much! I just re-ran it, and I have way more features as I was expecting. Looking at the demux sequence length summary, the majority of my reads are 284 bp, which is why they were discarded. I'll keep that in mind for the future.

cdiener · February 10, 2022, 9:44pm

Awesome, glad to hear it worked

system · March 14, 2022, 3:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.