Hello David,
Great questions!
Let's start with the taxonomy.
Correct! The ASV matches to this specific entry in SILVA down to the species level, but SILVA does not yet list a species for this entry.
Probably. Not sure how SILVA makes this decision.
That's right. There's a big bucket of D_5__unknowns, which is why it can be helpful to summarize these results at the D_4__ Family level.
(If another sequence has the same ASV id, a9016c5734d00d83a3741982ceb49c44, it should have the exact same DNA sequence.)
Now let's talk about the magic of a Naive Bayes k-mer Classifier.
Naive Bayes is an old-school (i.e. 1960s) supervised machine learning classification method.
k-mers are the collection of substrings of a sequence. It's like the summary of the sequence, and similar sequences will have similar k-mer compositions.
This method takes each sequence, counts it's k-mers, and then does the Naive Bayes thing to classify it, based on k-mers and taxonomy from the database on which it was trained.
Which is really close, but not quite what you had described
It compares it's k-mer composition to the k-mers in the database.
What you described is pretty much what classify-consensus-vsearch does, if you want to try that for comparison!
Colin