Low frequency counts after DADA2 (7%)

jng · July 19, 2018, 12:57pm

Hi everyone!

I'm currently trying to process my paired-end demultiplexed MiSeq libraries (primers 341F & 785R, targeting the V3-V4 16S region) with DADA2 and I've been getting low frequency count - I've started with 11,077,765 sequences but obtained 740,243 sequences after DADA2 (~7%). My quality plots are shown below:

The following parameters for DADA2 were set:

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 23
--p-trim-left-r 40
--p-trunc-len-f 300
--p-trunc-len-r 193
--o-table table.qza
--p-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
--verbose

My table.qzv:

After the run, I thought maybe I was trimming too much and the sequences were failing to overlap, so I re-ran DADA2 using these parameters:

--qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 22
--p-trim-left-r 39
--p-trunc-len-f 300
--p-trunc-len-r 276
--o-table table2.qza
--p-representative-sequences rep-seqs2.qza
--o-denoising-stats denoising-stats2.qza
--verbose

However, my total frequency dropped to 143,100!

I am not sure if my first parameters are reasonable according to my graph above. Are there any suggestions on how to obtain a higher sequence count?

Thank you so much for your help! Any guidance would be greatly appreciated!!

Mehrbod_Estaki · July 19, 2018, 5:42pm

Hi @jng,

Thanks for the details about your issue!
You're right that the output does seem rather low but this just may be the nature of your data and DADA2 is doing it's job properly.
With the 785R, 341F primers we're expecting an amplicon size of ~ 444bp (785-341) so with 2x300 read cycle you should have an overlap of roughly 600-444= 156bp. So we want to make sure our truncating length doesn't go over 156 - (20 bp minimum overlap required + 20 bp natural variation to be safe) = 116 bp. So based on that calculation I would say both your scenarios are logical with their truncating parameters. What I suspect is happening however is that the the quality of your reverse reads are dropping low enough for dada2 to drop them due to low quality. This makes sense considering your second attempt kept more of the 3' tail of the reverse reads which allowed more poor quality reads, so more likely for a read to be dropped.
Can you share the result of your denoising-stats.qza? This should tell us a bit more about what is happening.
An easy solution would be to discard the reverse reads and just denoise the forward reads since they are in pretty good shape, this should yield much higher reads though at the cost of shorter reads.

jng · July 20, 2018, 12:17pm

Thank you so much @Mehrbod_Estaki for your help and detailed explanations!

Please find attached my denoising stats for the first run (p-trim-left-f 23, p-trim-left-r 40, p-trunc-len-f 300, p-trunc-len-r 193):
denoising-stats.qzv (1.2 MB)

And for the second run (p-trim-left-f 22, p-trim left-r 39, p-trunc-len-f 300, p-trunc-len-r 276):
denoising-stats2.qzv (1.2 MB)

I've actually underwent a third attempt running DADA2, but ended up with 17,801 sequences! The parameters for the run were set to:

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 22
--p-trim-left-r 40
--p-trunc-len-f 238
--p-trunc-len-r 153
--o-table table4.qza
--o-representative-sequences rep-seqs4.qza
--o-denoising-stats denoising-stats4.qza
--verbose

My table4.qzv and denoising-stats4.qzv:

denoising-stats4.qzv (1.2 MB)

If the quality of my reverse reads are too low, I will follow your suggestion in only denoising my forward reads and continue my analysis as single-ended data.

Thanks once again for your help and suggestions!!

Mehrbod_Estaki · July 20, 2018, 4:23pm

Hi @jng,

Thanks for sharing those stats! These support our suspicion that the reverse reads are causing lots of reads to be filtered out initially. In the first scenario you're merging a lot more reads because you've trimmed much of the poor quality tail of the reverse reads. In scenario 2 there's a lot more reads initially being discarded since the quality reads start to dip quite a bit by position 276, so your initial pool to denoise/merge is low. Scenario 3 has the highest number of reads being brought forward since the truncating parameters are pretty stringent, but trimming that much of course leads to insufficient overlap so most reads can't merge properly.
If you absolutely must keep paired-ends, then one last attempt would be to relax the maxEE parameters to lets say 5 (as suggested in the DADA2 tutorial). This should increase the number of reads that initially make it through. You can probably improve the error rates a bit by increasing the --p-n-reads-learn INTEGER though I believe the benefits would be limited.
Otherwise, I'm willing to bet using the forwards only and trimming by 40 and truncating at 280 would give you much higher output than all the other scenarios.

jng · July 26, 2018, 6:36am

Hi @Mehrbod_Estaki,

Thank you for your suggestions! I just wanted to update you on my runs

I've tried running DADA2 with maxEE using the parameters below - although it increased the number of sequences, it was still not quite enough.

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 23
--p-trim-left-r 40
--p-trunc-len-f 300
--p-trunc-len-r 193
--p-max-ee 5
--o-table table7-maxee.qza
--o-representative-sequences rep-seqs7-maxee.qza
--o-denoising-stats denoising-stats7-maxee.qza
--verbose

Thus, I ran it with forward reads only using the parameters you've suggested - trimming at 40 and truncating at 280, and I've got enough reads (~60%)!

Thanks once again for all of your help @Mehrbod_Estaki!

Mehrbod_Estaki · July 26, 2018, 8:43pm

Hi @jng,

I'm happy things seem to have worked out! I do have a concern I want to clarify with you though. In the final result you posted you show over 100,000 unique features in your 6.6. million reads! This seems rather high to me, unless you are actually looking at some very diverse range of samples?
The other possibilities we want to check are a) Have all your non-biological sequences been removed from your reads prior to running dada2? Like your primers, adaptors and barcodes? We often see inflated features # when these are not removed. Lastly, if you are running the forward reads only, you should revert the --p-max-ee back to default since we aren't worried about the number of reads we would bring forward in this scenario so we want to be very strict with our quality of reads.
Perhaps this is a non-issue but we wanted to make sure before moving forward!

system · August 27, 2018, 2:43am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.