Dada2 error with grinder simulated sequence errors

Hi @Jeremie_Auger,

This occurs when dada2 encounters artificial quality scores, as you have discovered, and as described here:

How many errors are you inserting and how? What does the quality score distribution look like on your sequences? I strongly suspect dada2 is still failing because you are giving it artificial-looking quality scores, they are still breaking the assumptions of this method.

I think the bottom line is that if you want to use dada2, you need to give it the intended form of data: sequences with at least realistic quality scores.

That is the purpose; deblur identifies sequences that look noisy and throws them away. dada2 does the same if the reads look too noisy, and the fact that you get no sequences filtered out with dada2 is probably a good indication that your quality scores are not "real" enough. The fact that you do not want reads discarded leads me back to my earlier question: do you really want to denoise these data? maybe you do not want to "noise" it to begin with.

EDIT: I see now that you declared the purpose earlier on:

So I don't think you want to generate noisy data to begin with, if you want to compare the sequenced compositions to the expected compositions. You can just compare the real to the "perfect" composition downstream as a feature table. I have a little example of how to do it in this tutorial (ignore the quirks related to fungi, skip to the part about q2-quality-control and the various methods in there that you can use):

I hope that helps!

1 Like