I am unable to figure out why 90% of the amplicon sequencing reads are getting removed while running dada2 denoise-paired.
I have ran these following codes:
qiime dada2 denoise-paired --i-demultiplexed-seqs reads-qza/demux.qza --p-trunc-len-f 200 --p-trunc-len-r 170 --p-trim-left-f 1 --p-trim-left-r 0 --p-min-overlap 10 --p-min-fold-parent-over-abundance 4 --output-dir dada2-output --verbose
I have also tried changing truncate and trim parameters but wasn't able to recover most of the reads. Is there any other option other than analyzing only the forward reads posted here - DADA2 denoise paired result - 90% loss in reads - #2 by codea
For your information, I am using V3-V4 16S region. Here's my denoising stat information -
stats.tsv (559 Bytes)
According to your stats, most reads were lost on the merging step. I am not surprised by it since you targeted V3-V4 region, which is very large. It is recommended to sequence it with 2x300 to get good overlapping region.
As a workaround, I can suggest:
Option 1. Increase values for truncating parameters or disable it (0) and decrease min overlap. But even if percentage of merged reads will increase, data still may be biased towards reads with shorter V3-V4 region (this region is variable in size). You need to compare amount of reads passed filters and merged reads to make a conclusion if obtained data is good enough or not.
Option2. Use only forward (V3) or reverse (V4) reads. No bias caused by the length of the region, but poorer taxonomy annotations and (probably) lower in general alpha diversity metrics.
I see. I will try increasing truncating parameters. Otherwise only have to perform with either forward or reverse.
So I have increased the values of truncating parameters and decreased the min overlap. I ran these following codes:
qiime dada2 denoise-paired --i-demultiplexed-seqs read-qza/demux.qza --p-trunc-len-f 240 --p-trunc-len-r 240 --p-trim-left-f 6 --p-trim-left-r 6 --p-min-overlap 8 --p-min-fold-parent-over-abundance 4 --p-n-threads 4 --output-dir dada2-output --verbose
Now I am getting enough reads on the merging step. But how to know whether obtained data is good enough by looking at the stats?
Here's my stats file -
stats.tsv (938 Bytes)
Now stats looks better!
First, estimate if chosen length is enough. You need to calculate or find in the literature expected max length of targeted rRNA region. Then compare it with sum of F and R reads length minus minimum ovelapping region used in the command (240 + 240 - 8). If targeted rRNA region length is smaller then obtained number then you should be safe.
Also, compare amount of reads passed filters and merged taking into the account quality of the reads. For example, if quality drops at the ends of the reads (ends = trunc parameter), you can expect that some portion of reads would fail to merge. The worse is the quality, the bigger amount of reads will fail to merge.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.