DADA2: less than 10 feature number from pair-end reads

wyvl · May 26, 2019, 3:44am

Hi,

I'm trying to perform quality control on my 16S (V3-V4 region) pair-end reads using DADA2 and have been observing only 6-7 feature numbers (amplicon sequence variants), abnormally lowering that what others have reported in this forum (usually in the hundreds or thousands?). I have tested different truncation lengths but either got less than 10 feature numbers or an error during the "Remove chimeras" step of DADA2. My dataset consists of 99 samples, with 90% of them containing >1000 reads

Below is the command I ran:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 240
--p-trunc-len-r 230
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

A few truncation lengths I tried are as follow:

--p-trunc-len-f 190 --p-trunc-len-r 220 - ERROR
--p-trunc-len-f 200 --p-trunc-len-r 202 - ERROR
--p-trunc-len-f 240 --p-trunc-len-r 230 - 6 FEATURES
--p-trunc-len-f 260 --p-trunc-len-r 230 - 3 FEATURES

This error occurred when I set a smaller truncation length which may be due to the inability of the shortened forward and reverse reads to merge as mentioned in a few other posts.

Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
Execution halted

I'm aware that increasing truncation length would include more low quality bases near the end of the reads, thus resulting in lower feature number from the ASV inference. However, shortening the truncation length kept giving me error when running "qiime dada2 denoise-paired" so I'm not sure how to proceed or approach this issue.

PS: I have also tried processing my data with deblur which gave me 3 feature numbers.

Any help would be much appreciated!

A few visualization files are included for reference

demux.qzv (299.5 KB)
denoising-stats.qzv (1.2 MB)
rep-seqs.qzv (201.1 KB)
table.qzv (405.5 KB)

Mehrbod_Estaki · May 26, 2019, 8:05am

Hi @wyvl,
Welcome to the forum and thanks for providing such detailed info about your issue
I believe you have circled around the answer(s) yourself

This is your most likely issue when you have shorter truncating lengths, insufficient overlap.
With the most common V3-V4 primers, you are expecting a ~ 460 bp amplicon, meaning that on a 2x300 run you should have about ~140 overlap. It is recommended that you leave a minimum of 20bp overlap after truncation, plus a little bit extra to account for natural variation. At most you should truncate ~120 bps. In your first 2 scenarios you right out don't have enough overlap so your guess is correct. In the latter 2 scenarios you have just enough for an overlap, though it still may be insufficient to capture longer features. However, truncating less of your poor quality tails is causing most of your reads to be filtered right at the beginning and before even the denoising/merging steps. This is because your reads tend to drop in quality very early on. In essence you are stuck between a rock (needing to truncate less) and a hard place (losing more reads if you truncate less). I would recommend just discarding either your forward or reverse reads. Your reverse reads actually look better than your forwards, and you should still truncate it before the quality scores start to drop. say 180-200bp. You will lose a little resolution but at least you will retain much more of your reads and you also don't need to worry about the merging issues.
As for the low number of features you are observing, this may or may not be related to above. What are the sample sources? Are you expecting a high diversity?
Let's start with re-running with the reverse reads only and see if the problem persists, then we can troubleshoot further.

wyvl · May 27, 2019, 6:35am

Hi @Mehrbod_Estaki,

Thanks for the suggestion of using single-end reads instead. In fact, this solved the low feature number issue. I ran dada2 denoise-single separately on the forward reads (truncation at 170bp) and reverse reads (truncation at 190bp), resulting in 259 and 444 features, respectively - looking much more reasonable than just 6 features in an arthropod gut microbiome dataset!

system · June 27, 2019, 12:35pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.