I'm currently trying to process my paired-end demultiplexed MiSeq libraries (primers 341F & 785R, targeting the V3-V4 16S region) with DADA2 and I've been getting low frequency count - I've started with 11,077,765 sequences but obtained 740,243 sequences after DADA2 (~7%). My quality plots are shown below:
The following parameters for DADA2 were set:
qiime dada2 denoise-paired
After the run, I thought maybe I was trimming too much and the sequences were failing to overlap, so I re-ran DADA2 using these parameters:
--qiime dada2 denoise-paired
However, my total frequency dropped to 143,100!
I am not sure if my first parameters are reasonable according to my graph above. Are there any suggestions on how to obtain a higher sequence count?
Thank you so much for your help! Any guidance would be greatly appreciated!!
Thanks for the details about your issue!
You’re right that the output does seem rather low but this just may be the nature of your data and DADA2 is doing it’s job properly.
With the 785R, 341F primers we’re expecting an amplicon size of ~ 444bp (785-341) so with 2x300 read cycle you should have an overlap of roughly 600-444= 156bp. So we want to make sure our truncating length doesn’t go over 156 - (20 bp minimum overlap required + 20 bp natural variation to be safe) = 116 bp. So based on that calculation I would say both your scenarios are logical with their truncating parameters. What I suspect is happening however is that the the quality of your reverse reads are dropping low enough for dada2 to drop them due to low quality. This makes sense considering your second attempt kept more of the 3’ tail of the reverse reads which allowed more poor quality reads, so more likely for a read to be dropped.
Can you share the result of your denoising-stats.qza? This should tell us a bit more about what is happening.
An easy solution would be to discard the reverse reads and just denoise the forward reads since they are in pretty good shape, this should yield much higher reads though at the cost of shorter reads.
Thank you so much @Mehrbod_Estaki for your help and detailed explanations!
Please find attached my denoising stats for the first run (p-trim-left-f 23, p-trim-left-r 40, p-trunc-len-f 300, p-trunc-len-r 193):
denoising-stats.qzv (1.2 MB)
And for the second run (p-trim-left-f 22, p-trim left-r 39, p-trunc-len-f 300, p-trunc-len-r 276):
denoising-stats2.qzv (1.2 MB)
I've actually underwent a third attempt running DADA2, but ended up with 17,801 sequences! The parameters for the run were set to:
qiime dada2 denoise-paired
My table4.qzv and denoising-stats4.qzv:
If the quality of my reverse reads are too low, I will follow your suggestion in only denoising my forward reads and continue my analysis as single-ended data.
Thanks once again for your help and suggestions!!
Thanks for sharing those stats! These support our suspicion that the reverse reads are causing lots of reads to be filtered out initially. In the first scenario you’re merging a lot more reads because you’ve trimmed much of the poor quality tail of the reverse reads. In scenario 2 there’s a lot more reads initially being discarded since the quality reads start to dip quite a bit by position 276, so your initial pool to denoise/merge is low. Scenario 3 has the highest number of reads being brought forward since the truncating parameters are pretty stringent, but trimming that much of course leads to insufficient overlap so most reads can’t merge properly.
If you absolutely must keep paired-ends, then one last attempt would be to relax the maxEE parameters to lets say 5 (as suggested in the DADA2 tutorial). This should increase the number of reads that initially make it through. You can probably improve the error rates a bit by increasing the
--p-n-reads-learn INTEGER though I believe the benefits would be limited.
Otherwise, I’m willing to bet using the forwards only and trimming by 40 and truncating at 280 would give you much higher output than all the other scenarios.
Thank you for your suggestions! I just wanted to update you on my runs
I've tried running DADA2 with maxEE using the parameters below - although it increased the number of sequences, it was still not quite enough.
qiime dada2 denoise-paired
Thus, I ran it with forward reads only using the parameters you've suggested - trimming at 40 and truncating at 280, and I've got enough reads (~60%)!
Thanks once again for all of your help @Mehrbod_Estaki!
I’m happy things seem to have worked out! I do have a concern I want to clarify with you though. In the final result you posted you show over 100,000 unique features in your 6.6. million reads! This seems rather high to me, unless you are actually looking at some very diverse range of samples?
The other possibilities we want to check are a) Have all your non-biological sequences been removed from your reads prior to running dada2? Like your primers, adaptors and barcodes? We often see inflated features # when these are not removed. Lastly, if you are running the forward reads only, you should revert the
--p-max-ee back to default since we aren’t worried about the number of reads we would bring forward in this scenario so we want to be very strict with our quality of reads.
Perhaps this is a non-issue but we wanted to make sure before moving forward!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.