Dada2: Deciding on # of reads for training error model

I’m wondering if there are any guidelines or suggestions in picking the n-reads-learn in DADA2. i.e. using a percentage of your total reads etc.The default is set to 1,000,000 but I’m curious as what that is based on and how much increasing this number helps considering this seems to be a very computationally heavy step.

Hi @Mehrbod_Estaki, this is a great question! There is a bit of discussion about this in the DADA2 docs. Unfortunately q2-dada2 doesn’t expose the error model plots. We have an open issue to expose some of the substeps in the current DADA2 steps performed in dada2 denoise-* right now, which would allow us to potentially create a visualizer for interacting with these error model plots. Unfortunately we don’t currently have an ETA for when this will land in QIIME 2. Your options right now are to play with n-reads-learn through q2-dada2 and see how it impacts your sequences (that would basically be like blackbox testing), or you could load your sequences up in DADA2 (straight up, not through QIIME 2) and follow the steps in the tutorial linked to above to generate those plots. Sorry for the hassle! We will update this thread whenever that feature becomes available. Thanks! :t_rex:

Ah, yes, those error plots would certainly be useful to have access to! Thanks for the link. I’ll play around with it in R and report back if I discover anything worth sharing.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.