Hello, apologies in advance if this is the wrong tag.
I've noticed that in some papers and from previous students in my lab that we use the DADA2 plugin followed by taxonomic classification using the Silva 138 99% OTUs from 515F/806R region of sequences from the QIIME2 website (v2021.8). It is my understanding, however, that DADA2 produces ASV-level sequencing data, whereas we're classifying using an OTU-based classifier.
My questions are then:
In simplified terms, what is the difference between ASV/OTU? My understanding is the former is based on more refined differences in the sequence rather than clustering similar sequences together to create an OTU
If we can indeed use the SILVA OTU classifier, is there no discrepancy between using an ASV-level pipeline (DADA2) but classifying the taxa by OTU?
At the end classifier will compare sequences, no matter if it is an ASV or OTU.
Of course, you may notice that when working with 97% OTUs, you will have less amount of OTUs assigned to certain taxa, compared to the ASVs, and some alpha diversity metrics will be lower, but it is just because highly similar sequencer were clustered together.
It is possible to make a dereplicated (but not clustered) reference database with the plugin RESCRIPt. This was actually one motivation for RESCRIPt: to make it easier to build databases, but also to build and test dereplicated (rather than clustered) databases.
In practice the difference between a 100% dereplicated and 99% clustered database is not all that big. The clustered database is rather more efficient, but it is possible to make a dereplicated database, see the instructions for using RESCRIPt here (for SILVA and NCBI databases):