I have been using the bioinformatics services provided by Novogene. We use the most cost efficient option which still uses QIIME and the Silva 138.1 database.
I have made my own classifier using the GreenGenes 2022.10 backbone files and extracting the V3V4 region. This seems to identify sequences well using the tutorial datasets.
I reclassified the OTUs provided by Novogene with this classifier and I was expecting some differences but most of the OTUs have different identifications, and many are not simply different phylogenetic levels of identification that agree.
This is not surprising, and has much to do with how the reference databases are curated. There are many approaches and issues to deal with as described here:
It may come down to which of these classifications make sense given your study system. This is why we provide multiple avenues to classify sequences. You can also try the SILVA weighted classifiers.
In most cases, you are lucky to obtain a true species-level designation with short read data. See here:
I was mostly surprised how many were identified to different phylums and classes. I assumed these high level classifications would be pretty similar between databases.
I understand that the species classifications are not 100% accurate but my boss really prioritizes them so I want to maximize the species identifications.