Dada2 denoise-paired removing most sequences during filtering

Hi there!

Apologies - I found past discussions on this topic but the topics were closed for discussion and the suggestions did not work for me, so I am hoping someone can help to find a solution. We had a fantastic QIIME2 workshop in Melbourne 2 weeks ago and I was finally able to give it a try last week with 3 samples and then a larger sample set and mostly it seemed to work okay. However, after I ran dada2 using the default settings the number of sequences reduced dramatically, and I ended up with only about 25-30% of the original reads. I tried to run again not trimming further the F/R sequences (as per below) but it didn’t change the results. I have added some of the files including unprocessed F and R sequences for the samples, the demux.qzv (quality) and denoised.qzv (number of reads) files at Example QIIME2 - Google Drive.

Briefly, I follow the EMP protocol to amplify V4-V5 region in the MiSeq (fragment is ~450 bp, sequencing 2x300 bp), so I have forward and reverse sequences that need to be joined.

I am using QIIME2 v11.18 installed in a server and I used the following commands:

To import: qiime tools import --input-path ~/xxxxx --output-path demux.qza --input-format CasavaOneEightSingleLanePerSampleDirFmt --type ‘SampleData[PairedEndSequencesWithQuality]’

To summarise demultiplexing results: qiime demux summarize --i-data demux.qza --o-visualization demux.qzv

To join F/R: qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --o-representative-sequences rep-seqs2.qza --o-table table2.qza --o-denoising-stats denoised2.qza

This was followed by metadata tabulate and feature table.

Based on my experience with QIIME 1 and based on what others have written this is not expected. I was wondering if there is anything else I can try to do to reduce the waste of reads?

This has happened now with 2 datasets that were produced by different MiSeq runs so I don't think it is an issue with the run itself either (and the protocol is exactly the one in the EMP website).

Thank you so much! Any help will be much appreciated!

Cheers,

Fran

1 Like

Hi @FrancineMarques,
Thanks for searching through the forum first and providing us in depth detail about your situation. Really helps with the troubleshooting!
This is a common enough scenario so let's take a closer look. Your dada2 stats summary show that your major loss occurs at the initial filtering step. Denoising/merging etc all look good! So, moving backwards one step, looking at your demux.qzv quality plots, it looks as though your reads are unfortunately not high in quality. There is a significant drop in quality in both direction starting at around the 150-160bp point. What is most likely happening is that DADA2 is dropping much of your reads because it considers them poor in quality. This is especially true when you don't truncate any of those poor ends. It may sound counter-intuitive but truncating poor quality tails actually increases the number of reads that pass that initial filtering step because of the filtering parameters of dada2. Have a look here at how you can play around with these if you think you want to force more lenient filter parameters (See Filter and Trim section). I personally don't like relaxing filtering parameters so if I can I will avoid them.
Your best bet then is that you truncate as much as of the poor quality tails of your reads before running DADA2. In your case you have 2x300 reads and 450bp amplicons, meaning there is about 150bp overlap. DADA2 requires a min 20bp overlap for merging so you want to truncate no more than a total of ~ 130 bp between your forward and reverse reads. If this is what you initially have done and didn't get good results, not truncating any at all is probably going to give you even poorer results.
At that point, you may consider ditching you reverse reads all together and only use your forward reads. You obviously will lose resolution in not having longer reads but you will be able to retain more of your reads.That being said, your current table is still considered in good shape. The lowest sample still has 9,000+ sequences which is considered plenty enough in many samples, so this may not be that big of a loss to begin with.
Hope this helps a bit!

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.