Lost of data with dada2

Nicholas_Bokulich · October 11, 2017, 4:57pm

Hi @blau,
I have encountered similar problems myself with some datasets. Unfortunately, sometimes the read quality (on reverse reads in particular) can be too low to permit merging forward/reverse reads. Currently available Illumina read lengths are typically long enough to cover some common 16S rRNA gene domains fully, e.g., V4 domain is fully covered by 300 nt reads, so merging may not always be necessary.

Some of the mockrobiota data sets are a few years old and the read quality in some runs is not great as you have noticed; I have always just used the forward reads so cannot offer much insight on using paired-end reads with these data.

That is going to depend on the quality of the data sets and where you do the trimming. If large sections of the read are erroneous, it may be better to set a lower --p-trunc-len-r; if the read quality drops off below the point at which reads overlap, however, you may be out of luck and these reads cannot be merged. @benjjneb may be able to comment on the issue of dada2 sensitivity more.

Absolutely (and results suggest that is a good thing). Qiime1 did not have a method for denoising Illumina data. The quality filtering methods used for Illumina data relied mostly on trimming and filtering reads based on PHRED scores and length. So dada2 will discard many more reads than qiime1 did because it offers functionality not available in qiime1. In that respect, similar parameters is beside the point. The dada2 paper illustrates that qiime1 with default parameters is much more permissive, but whether this would change with, e.g., more stringent quality filtering parameters (as described here) and chimera filtering methods, is untested to my knowledge.

I hope that answers your questions — please let me know if you have any more questions or concerns.