Dada2 denoising clarifications

benjjneb · August 18, 2017, 3:25pm

That looks much better and based on what you posted, that now 50% of reads are making it through the full pipeline (filtering+denoising+merging+chimera removal) I think the results are likely reasonable. You can try to relax parameters more to get more reads through, but it is quite likely to be counter-productive as the additional data will be of lower quality. (one tiny thing, you don’t have to have the same trunc-len for F and R reads)

Since I imagine a number of people will run into the same initial issue you hit, its worth reiterating the two key points that caused your initial denoising to fail to get most reads through.

For ASV methods, removing the primers is critical. The ambiguous nucleotides in primer regions are seen as real variation by ASV methods (whereas you could mostly get away with it when using fuzzier OTUs). Failure to remove primers causes many, even most, reads to be lost to the chimera removal step, due to chimeric models formed between alternate primer versions and the actual sequence.
For ASV methods, it is almost always advised to trim off the sequence after quality scores crash. This was a good idea with OTU methods, but it is even more critical for ASV methods, as these rely on repeated observations of the complete error-free sequence. The more post-quality-crash tail that is included, the lower the error-free-read fraction gets, which in turn hurts sensitivity to lower frequency variants.