Accuracy of dada2 using mock

Hi

I had previously analysed my data using closed-reference OTU picking (default parameters, Qiime v. 1.9.1). I am currently reanalysing the same data with dada2 (Qiime2-2018.06) with the following command

qiime feature-classifier classify-sklearn --i-classifier gg-13-8-99-515-806-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza

The composition of the mock community with dada2 is considerably different in terms of the relative abundances compared to the expected. This was not the case with closed-reference.
In addition, members of the enterobacteriaceae are not identified in the mock after dada2 processing (enterobacteriaceae is identified in the samples though). Both taxonomy assignments were against Greengages database.
I would appreciate if you could tell me what/if something has gone wrong?

2 Likes

This problem did not exactly happen to me, but DADA2 changed my outcomes, a couple of recommendations which troubleshoot/troubleshot my problem that may help you:

  1. Look at the bar charts and at the distribution of annotations (e.g., your mock should contain the same thing despite use of QIIME1 or QIIME2
  2. Did you do a quality filter step?
  3. You can do closed picking from the DADA2 table output using a 97% OTU greengenes database (q2-vsearch command)
  4. You may want to train your feature classifiers (but if you're using the recommended primers with the same region wouldn't change your taxonomy)

My issue (my mock sample was correctly annotated using DADA2/closed OTU picking):

Quality control/filter step:

I ended up doing closed picking using vsearch and a quality filtration step too, someone mentioned that his step was unnecessary.

Ben

edit: I kept the DADA2/denoise step and closed OTU picked from that - the classifier will match according to the groups that are assigned to DADA2. From my understanding if you do not close OTU pick these counts won't be grouped together. When you group with Closed OTU step, it will clearly match with whatever level of confidence you set - so I matched what was done in QIIME1.

1 Like

In addition to @ben's advice (thanks @ben!) I have a few more thoughts to add:

Take a careful look at how you are trimming/truncating reads with dada2. We have lots of other posts on this forum discussing truncation parameters — I suspect that is the issue here.

Are you using QIIME 2 to classify the QIIME 1 sequences as well? If not, that's another source of variation, and you will need to make sure you are using the correct classifier (are you using the 515f + 806r primers for bacterial 16S rRNA?)

I hope that helps!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.