Hello QIIME team,
Thanks for developing such a useful analysis package.
I observed different classification outputs with the new silva-nb classifier when I was running the feature-classifier with different silva full-length classifiers.
My dataset was generated using PacBio machine. Used the HiFi_16S nf workflow. I believe that the HiFI_16S workflow uses VSEARCH to classify the sequences.
Taxa plots of VSEARCH output:
taxonomy_barplot_vsearch.qzv (630.9 KB)
I was more interested in NB classifiers, so I used the DADA2 output from that workflow and tried with a self-trained full-length classifier.
Taxa plots with self-trained classifier:
3_taxa_plots.qzv (1.3 MB)
Since that classifier was generated using 2023.05 version, I thought of using the latest classifiers from the resources page. As expected, there was scikit-learn
version error. I downloaded the recent QIIME2 2024.10 version and tried the classification. Well, the results were pretty different with the recent 2024.5 classifier.
Taxa plots with 2024.5 classifier:
3_taxa_plots_202405.qzv (1.0 MB)
I tried with the environment-specific classifier. That gave a similar output as the self-trained classifier.
Taxa plots with 2024.5 weighted classifier:
3_taxa_plots_202405_env.qzv (1.5 MB)
I thought there was an issue with my commands. So, I tried the 2021.4 classifier from the resources page. The output was similar to that of the self-trained classifier.
Taxa plots with 2021.4 classifier:
3_taxa_plots_2021.qzv (1.3 MB)
So, when I checked the file size of each classifier, the 2024.5 classifier seems to be only ~220 MB, whereas the others were ~500 MB. Tried the SHASUM and the output matched with that mentioned on the page.
$ shasum -a 256 silva-138-99-nb-classifier_202405.qza
# c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616 silva-138-99-nb-classifier_202405.qza
So, is there an issue with that file?
Regards,
Anwesh