Hi Mike!
Thank you for sharing your pipeline and these classifiers! I also took full lenght sequences with species labels (ver0.02) and trained them for V4-V5 (515f and 926r) classifier.
A closer look at taxa barplots (taxa-bar-plots_16S-V4V5-35.qzv (2.7 MB) ) shows that I have many weird named phyla in my samples (10bav-F6,LCP-89, AncK6, WPS-2, DTB120, FCPU426, MBNT15, NKB15, Sva0485, WS1, CK-2C2-2, RCP2-54, TA06, WS2, GN01, PAUC34f, NB1-j, SAR324_clade(Marine_group_B), WS4 and others).
Have I done something wrong or it just needs some additional filtering? I will filter out chloroplast and mitochondria sequences.
The phyla names you see are normal. You can investigate this on the SILVA Taxonomy browser. If you click on Bacteria youâll see all sorts of odd names. Many of these groups (at the phyla-level and lower) are still being defined and are considered to be at the candidate (i.e. Candidatus) status.
The field of bacterial taxonomy is undergoing many changes due to leveraging genomic data to aid in taxonomic identification. Many of these taxa have no culture-type specimen and are only defined by genome sequence (or other) data. This has caused quite a bit of debate in the field of bacterial taxonomy . Anyway, this has resulted in many Candidate Phyla, and other proposed groupings.
Thanks for uploading this classifier here, I am trying to make my own classifer but it seems very memory-intensive and not possible atm. In the meantime I tired to use your pre-trained classifier for 341F-805R primer set but I have an error which I couldnât solve based on the previous discussions in the forum. I would appreciate any help.
Here is the error: qiime feature-classifier classify-sklearn --i-classifier classifier-consensus.qza --i-reads repseqs.qza --o-classification taxonomy Plugin error from feature-classifier:
** The scikit-learn version (0.21.2) used to generate this artifact does not match the current version of scikit-learn installed (0.23.1). Please retrain your classifier for your current deployment to prevent data-corruption errors.**
Debug info has been saved to /tmp/qiime2-q2cli-err-w3yust9p.log
I am using qiime2.2020.6 and it seems that the version of the pluging I am using is higher than what had been used for training this classifier?
Sadly, if you would like to run the classifier in 2020.6, then you'll have to re-train them for that version of QIIME 2, as the sklearn version changes with each update. The best option in your case might be to make use of qiime2-2020.2 for the classification step.
See this post for more details:
Finally, check out initial post of this thread, it has been updated to redirect you to a new plugin. This should make it much easier for you to make a SILVA classifier.
When training the classifier, should one use the Moving Pictures data, or their own data? And if their own, should it be a subset of their own data or the entire data set?
You train the classifiers with standard reference data, typically from a curated database, e.g. SILVA, or of your own making. For example, you can use the input files here to construct your own classifier. For more details on how to do this, take a look at the RESCRIPt tutorial linked at the top of this thread.