Dada2 denoise-paired removing most sequences during filtering

Mehrbod_Estaki · November 27, 2018, 4:03am

Hi @FrancineMarques,
Thanks for searching through the forum first and providing us in depth detail about your situation. Really helps with the troubleshooting!
This is a common enough scenario so let's take a closer look. Your dada2 stats summary show that your major loss occurs at the initial filtering step. Denoising/merging etc all look good! So, moving backwards one step, looking at your demux.qzv quality plots, it looks as though your reads are unfortunately not high in quality. There is a significant drop in quality in both direction starting at around the 150-160bp point. What is most likely happening is that DADA2 is dropping much of your reads because it considers them poor in quality. This is especially true when you don't truncate any of those poor ends. It may sound counter-intuitive but truncating poor quality tails actually increases the number of reads that pass that initial filtering step because of the filtering parameters of dada2. Have a look here at how you can play around with these if you think you want to force more lenient filter parameters (See Filter and Trim section). I personally don't like relaxing filtering parameters so if I can I will avoid them.
Your best bet then is that you truncate as much as of the poor quality tails of your reads before running DADA2. In your case you have 2x300 reads and 450bp amplicons, meaning there is about 150bp overlap. DADA2 requires a min 20bp overlap for merging so you want to truncate no more than a total of ~ 130 bp between your forward and reverse reads. If this is what you initially have done and didn't get good results, not truncating any at all is probably going to give you even poorer results.
At that point, you may consider ditching you reverse reads all together and only use your forward reads. You obviously will lose resolution in not having longer reads but you will be able to retain more of your reads.That being said, your current table is still considered in good shape. The lowest sample still has 9,000+ sequences which is considered plenty enough in many samples, so this may not be that big of a loss to begin with.
Hope this helps a bit!