Dear QIIME2 Community,
I wanted to double check I understood how a sequence is assigned a taxonomic identity. I used the following script:
“qiime feature-classifier classify-sklearn --i-classifier silva-132-99-515-806-nb-classifier.qza --i-reads example.qza --o-classification taxonomy --p-reads-per-batch 10000 --verbose”
Does this basically take each representative sequence and compare it to the SILVA database at each taxonomic level until it is at least 70% sure it can assign a Kingdom, then Phylum, etc… or does it just compare it to the database until it can find the sequence that it most closely matches? If it is the latter then how does the 70% confidence interval come into play?
If a sequence is identified as the following:
a9016c5734d00d83a3741982ceb49c44: D_0__Bacteria; D_1__Bacteroidetes; D_2__Bacteroidia; D_3__Flavobacteriales; D_4__Flavobacteriaceae; D_5__unknown; D_6__unknown
Does that mean that whoever assigned that taxonomic id to that reference sequence was unable to determine what genus it was?
If it was uncultured instead, does that mean the sequence was similar enough to other Flavobacteriaceae sequences to assign it to that family, but not similar enough to a genus to assign it at the genus level?
If another sequence has that same taxonomic id, that does not necessarily mean that it is a different genus correct? Just that it could not be assigned to a reference sequence with a taxonomic id at the genus level?
Thank you for the time and help. Sorry for all the questions, just trying to understand it and a literature review was not helpful.