Hi there,
I used the Midori reference database which contains all metazoan species based on Genbank for my diet study. The database was trimmed with my forward and reverse primers. I used BLAST + for my classification with the following parameters:
qiime feature-classifier classify-consensus-blast
--i-query dada2-rep-seqs.qza
--i-reference-reads midori-ref-seqs.qza
--i-reference-taxonomy midori-ref-taxa.qza
--p-maxaccepts 1000
--p-perc-identity 0.97
--p-query-cov 0.89
--p-strand both
--o-classification EMR-diet-taxonomy-blast.qza
--verbose
After all of my filtering steps, I took some of the sequences identified to species and put them into NCBI blast. For example, this sequence:
TTTATCCAGAAACATTGCGCATGCTGGACCCTCTGTAGATCTAGCAATTTTCTCTCTTCATTTAGCTGGAGCATCATCAATTCTTGGTGCCATTAACTTTATTACAACAGTTATTAATATACGATGAAGGGGCCTACGTCTAGAACGTATTCCCTTATTTGTATGAGCAGTATTAATTACTGTAGTGTTACTTCTTCTCTCTTTACCAGTTCTTGCTGGTGCAATTACTATACTTCTTACAGACCGAAACCTAAACACCTCATTCTTTGATCCTGCAGGGGGCGGAGACCCAATTCTATATCAGCATTTATTC
was identified as Dendrodrilus rubidus in my output with a consensus of 1, but NCBI blast identified this sequence as Dendrodrilus sp., Lumbricidae sp., and Bimastos rubidus with 100% percent identity.
My question is, how does the BLAST+ classification method deal with classifying a taxa if the sequences are the same for multiple taxa? I'm a little concerned about using a species level classification if NCBI blast is identifying multiple species with 100% identity. I'm wondering if I should move classification down to a different level if the sequence belongs to multiple species.
Please let me know if anymore information is required. Thank you!