DADA2 giving less observed ASV and taxonomy classifier

Keegan-Evans · November 15, 2022, 6:41pm

In regards to not obtaining species level classification, this is pretty common and is a combination of the lengths of the input reads and the classifier. The machine learning model used to generate the taxonomy has to hit a certain confidence level before it will make an assignment at a particular taxonomic level, and often with short read sequencing, it is difficult to reach this level of confidence.

The commonly used V3V4 region sequencing particularly simply are not quite long enough to consistently produce confident species level identification, this is a function of the statistical power you can generate with a limited number of base pairs in a sample, rather than an issue with the database itself.

That is not to say that you absolutely will not be able to produce more species level matches using a different classifier. However, rather than a "better" general classifier, you may be be able to produce somewhat better results by training a classifier tailored to your environment/particular experiment, see the link to RESCRIPt.

DADA2 drops singletons, because they are far more likely to indicate a sequencing error than a relevant biological distinction. Thus, the ASVs produced by DADA2 should have shannon/simpson indexes that are essentially the same as those produced by clustering methods. In fact, in the "Moving Pictures" tutorial, the ASV output from DADA2 is used to calculate all of the diversity indexes, so you should be good to go