A better ASV's database for Qiime2

Nicholas_Bokulich · May 28, 2021, 5:34am

Hi @fellora ,
From the description there, it sounds like this is just a version of the SILVA 138 bacterial sequences that is formatted for use with dada2 in R.

We have similar pre-formatted databases for SILVA 138 here:
https://docs.qiime2.org/2021.4/data-resources/

These will not be exactly the same as what dada2 and SILVA provide: we perform some additional QC steps as described in this tutorial:

Yes, the NR99 sequences are clustered at 99% to reduce redundancy. It is possible to create a 100% unique (ASV) database using RESCRIPt in QIIME 2, as described in that tutorial above.

that level of resolution might not matter for taxonomy classification. The idea of ASVs is to allow mapping of unique sequence variants that theoretically represent subspecies-level variation. Taxonomic classification smooths over this variation to some degree, by mapping ASVs or OTUs to the nearest known reference taxonomy. An ASV database would most likely be redundant unless if the subspecies-level variants are annotated as such (e.g., strain ID) — though this would not be practically useful either since 16S (even full-length) does not fully resolve at subspecies level (this is why I refer to ASVs as "theoretically" subspecies variants — they are, but you cannot use 16S to distinguish true strains).

But you could certainly build a SILVA ASV database following the tutorial above, and test the level of resolution you get on your own data

Good luck!