Questions about performing quality control with dada2 and generating a feature table

When using DADA2 for quality control and feature table construction with paired-end sequences, I employ the following code:

qiime dada2 denoise-paired --i-demultiplexed-seqs paired-end-demux.qza --p-trunc-len-f XXX--p-trunc-len-r XXX --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza

Sometimes the quality control results are quite good, ultimately retaining over 85% of the reads after removing chimeras, contaminants, and other artifacts. However, sometimes the processed data ends up retaining only a fraction of a percent of the reads. Moreover, there are also many cases of singleton ASVs.

But I think the main issue lies in how to set the --p-trunc-len-f/r parameters based on this paired-end-demux. However, I really don't know how to interpret these quality control plots. Sometimes I get the quality control right, and sometimes I don't.


My qiime2 version is 2024.2. Thanks a lot !

Fu Yang

Hello!

It is a compromise between the truncation and the remaining overlapping region!

Based on your screenshot, the values you set for truncation allowed most of your reads to pass quality filtering, but at the cost of a smaller overlapping region (low merging rate). Your options are:

  • increase the truncation values.
  • decrease the minimum overlap size (from default 12 to a lower value).

You can play with both parameters to find the combination that will recover the highest number of reads.

Thank you so much dear professor, And I'll reset my parameters.

Let me have try. So many thanks.