Also, I want to know if I’m right thinking that Silva 138.1 database is an OTU-based one, but not ASV-based. So my taxonomic assignment could be better if I use ASV-based database instead of the other. In theory, OTUs are groups of biological sequence so Dr. Callahan released the sequences within OTUs. Am I right?
These will not be exactly the same as what dada2 and SILVA provide: we perform some additional QC steps as described in this tutorial:
Yes, the NR99 sequences are clustered at 99% to reduce redundancy. It is possible to create a 100% unique (ASV) database using RESCRIPt in QIIME 2, as described in that tutorial above.
that level of resolution might not matter for taxonomy classification. The idea of ASVs is to allow mapping of unique sequence variants that theoretically represent subspecies-level variation. Taxonomic classification smooths over this variation to some degree, by mapping ASVs or OTUs to the nearest known reference taxonomy. An ASV database would most likely be redundant unless if the subspecies-level variants are annotated as such (e.g., strain ID) — though this would not be practically useful either since 16S (even full-length) does not fully resolve at subspecies level (this is why I refer to ASVs as “theoretically” subspecies variants — they are, but you cannot use 16S to distinguish true strains).
But you could certainly build a SILVA ASV database following the tutorial above, and test the level of resolution you get on your own data
Really awesome!! I and my short peruvian group are going to work with RESCRIPt to create a costumized database. But, our weakness is that we don’t have a server so we are working at google cloud. Recentely, a RESCRIPt docker container was created by my collaborator (GitHub - gadgrandez/qiime2-rescript) and we are going to compare with other methods. Thanks a lot!!
Awesome!! So if we would improve the converted ASV-based SILVA database with a curated species-level NCBI-RefSeqs with RESCRIPt and, then, we add ASVs from other 16S SRA studies with their taxonomy assignation, so we can improve my taxonomic resolution and assigment for my specific ecosystem!!
Thanks a lot for the idea. I’ve read RESCRIPt’s paper and taken a look in q2-clawback, I definitely want to use it. However, I’m blocked trying to convert OTU-based SILVA database. You mentioned that It is possible to do this in Rescript, but I don’t have any idea how to start doing this (What command line?). Because the paper mention all the procedure in OTUs. Could you please give me some advice??
Hi @fellora ,
The tutorial above shows how to make such an ASV database (dereplicate but do not cluster the sequences). Likewise, this is how the sequences and pre-trained classifiers shared on the QIIME 2 data-resources page are created.