This is my first dive into microbiome work and qiime.
Things have been going great until I got to the Training feature classifiers bit. I have tried to follow the tutorial on training, but I think I am missing something about reference sequences and the corresponding taxonomic classifications. The tutorial provides these files for training, but which data/files do I use for my own reference sequences and what would be the corresponding taxonomic classifications for my reference data? I downloaded the "silva-138-99-nb-classifier.qza" and tried to run:
qiime feature-classifier classify-sklearn \
I get the error:
Plugin error from feature-classifier:
The scikit-learn version (0.23.1) used to generate this artifact does not match the current version of
scikit-learn installed (0.24.1). Please retrain your classifier for your current deployment to prevent
Debug info has been saved to /tmp/qiime2-q2cli-err-o7s8rvhl.log
This error led me to the training tutorial, where I am unclear on which reference data to use for my own data.
We are looking at V3-V4 of 16S for hundreds of patients from dozens of locations (hands, nose, throat, etc).
thank you in advance for any guidance.
- Version of QIIME 2 - Conda native install (qiime2-2021.8) on Ubuntu 20,
Welcome to the forum! I'm re-classifying this as user support, since this isnt a technical problem with the software. It sounds like you're working on a super cool project!
You can find reference files for the Silva and Greengenes databases on our data resources page. You'll need a representative sequence file and taxonomy file. Then, you can follow the tutorial for training your own classifier (use your primers rather than the EMP 515-806).
You can also follow the RESCRIPt tutorial to download and format your own database.
The error is due to a problem with a
conda environment and a newer version of
scikit-learn. To get rid of it:
- Activate the env with QIIME2 installed
conda install scikit-learn=0.23.1
Thanks @crusher083 - I think that suggestion might not work in this case, though --- q2-feature-classifier 2021.8 has a hard pin on scikit-learn 0.24.1 --- force installing an older version of scikit-learn will cause conda to uninstall q2-feature-classifier.
Thank you all! I have been away from my servers for a few day, but will be connecting tomorrow and will follow up on the suggestions above. @[jwdebelius, if I understand you correctly, I do not use my own data for training, I should use the provided reference sequences and corresponding taxonomic classifications? May I assume that since these are publicly available, the training is already done and there are up-to-date trained data? Or is there some reason the users need to do the training? Prolly dumb questions
Here you go! Data resources — QIIME 2 2021.8.0 documentation
I ran feature classifier using the downloaded "silva-138-99-nb-weighted-classifier.qza" and it worked>
#!/bin/bash qiime feature-classifier classify-sklearn \ --i-reads deblur_output/representative_sequences.qza \ --i-classifier taxa_classifiers/silva-138-99-nb-weighted-classifier.qza \ --p-n-jobs 32 \ --output-dir taxa
It completed in a reasonable amount of time. The tutorial says something about trusting the person who trained, but I am just starting so I do not really trust myself. I will also try to follow the training tutorial using my primers, but they are standard V3 and V4 primers.
Thanks again for all the amazing and quick support!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.