Hi @Rob_DNA, let's see if we can get you sorted.
This depends on which classifier you are using. The pre-made SILVA classifiers were constructed from the NR99 SSU version of the database, after some additional quality control via RESCRIPt, you can check out the tutorial. You can use the RESCRIPt plugin to download the full, raw SILVA database instead of the NR99.
Correct. The main downside, is a practical one... that is, the file size and memory footprint of the full-length classifiers can be quite large. Preventing their use on machines that lack appropriate resources. The amplicon-specific versions are much smaller. Depending on your taxa of interest, you might lose a tiny bit of classification accuracy compared to the amplicon-specific versions, but this has been minimal for the data sets that I've worked with. Though your mileage may vary. You can always compare the outputs.
Great questions! I refer you to these great papers. Several of these also discuss the benefit of constructing an amplicon-specific region classifiers. But in a nutshell, there are many cases in which several different organisms have identical DNA sequence over the amplicon region. Making it hard to disambiguate between taxa. That is BLAST, for example, might return all equivalent hits, which may not be helpful. We would prefer that consensus taaxonomy, or lowest common ancestor (LCA), be returned for our query sequence.
- Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms
- Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin | Microbiome | Full Text
- Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys | The ISME Journal | Oxford Academic
- http://dx.doi.org/10.3389/fmicb.2021.644487
- RESCRIPt: Reproducible sequence taxonomy reference database management
- Species abundance information improves sequence taxonomy classification accuracy | Nature Communications
Yes. Although we provide the raw files that were used to make the classifiers, and the classifiers themselves (for the full-length and V4 region of the SSU gene), you can use RESCRIPt (linked above) to choose among several versions of the SILVA database and curate as you'd like. Many on the forum have made their own V3V4 classifier for example.
Yep.
Finally, RESCRIPt provides some tools to help you compare the various reference databases and classifiers you generate.
-Cheers!
-Mike