Hi @vetalinesantana,
It sounds like a super cool project! There are a couple of issue here to consider when we talk about species accuracy. I've made an expanable list because this is relatively bit post.
1. The limits of the biology
Species are kind of a weird theoretical concept in organisms that can undergo sexual reproduction, asexual reproduction, and just kind of randomly pick up DNA from friends, family and strangers. (Seriously, bacterial reproduction and sex is kinda weird, and HGT is a pain.) So, even the fundemental idea of a "species" is challenging in bacteria. I'm going to link you to a wiki rabit hole on the topic. This is further complicated by the fact that our artifical naming conventions don't actually relate to true molecular phylogeny. ( are
and somehow not entirely terrifying.)
Related to this evolution/classification/biology problem is that there are "species" that are sometimes very interesting to researchers can't actually be identified by 16S rRNA sequences. Sometimes, we can't even tell apart at higher taxonomic levels, the Shigella/Escherichia problem being a classic point of frustration for fecal researchers.
But... let's say that you want a name for your organisms because it sparks joy.
2. The limits of the database
I think it's probably worth reading the Species caveat in the RESCRIPt tutorial, but I'll reiterate a key point in terms of species and Silva: We don't know if Silva curates their species. I tried going through the Silva readme to see if I could find anything, but alas.
Correlated to this issues: species assignments are notoriously database dependent, and taxonomy gets renamed at random. (This is separate from the whole taxonomy ≠ phylogeny problem
.)
3. Limits of your algorithm
My go-to paper on classification and quality is the Wang et al 2007 RDP classifier paper is a key read for naive bayesian classification. Specifically, I think it's worth looking at Figure 1 which describes the classification accuracy based on sequence length and taxonomic level. You'll note that they don't even describe species. Updated papers like Bokulic et al about optimizing feature classifiers are worth a read. (It's the q2-feature-classifier paper!) The short version is that curation, or knoweldge about your enviroment can improve performance. You may want to look at clawback here:
4. Implications
My last question is a largely theoretical one. Would Roseae rosa called Roseae ASV a4bc smell as sweet? Is there a distance benefit in your analysis or your ecosystem in having the specific taxonomic label? Is enough known about your ecosystem for there to be specificity? (I have no idea about
skin)
It's probably also worth a search of the forum on this topic, because there's been a fiar bit of discussion in the past
https://forum.qiime2.org/search?q=species
Best,
Justine