Obtaining and importing reference data sets

Hello,
This is my first dive into microbiome work and qiime.
Things have been going great until I got to the Training feature classifiers bit. I have tried to follow the tutorial on training, but I think I am missing something about reference sequences and the corresponding taxonomic classifications. The tutorial provides these files for training, but which data/files do I use for my own reference sequences and what would be the corresponding taxonomic classifications for my reference data? I downloaded the "silva-138-99-nb-classifier.qza" and tried to run:

qiime feature-classifier classify-sklearn \

--i-reads deblur_output/representative_sequences.qza
--i-classifier taxa_classifiers/silva-138-99-nb-classifier.qza
--p-n-jobs 32
--output-dir taxa

I get the error:

Plugin error from feature-classifier:

The scikit-learn version (0.23.1) used to generate this artifact does not match the current version of 
scikit-learn installed (0.24.1). Please retrain your classifier for your current deployment to prevent 
data-corruption errors.

Debug info has been saved to /tmp/qiime2-q2cli-err-o7s8rvhl.log

This error led me to the training tutorial, where I am unclear on which reference data to use for my own data.

We are looking at V3-V4 of 16S for hundreds of patients from dozens of locations (hands, nose, throat, etc).

thank you in advance for any guidance.

  • Version of QIIME 2 - Conda native install (qiime2-2021.8) on Ubuntu 20,

Hi @Simey,

Welcome to the :qiime2: forum! I'm re-classifying this as user support, since this isnt a technical problem with the software. It sounds like you're working on a super cool project!

You can find reference files for the Silva and Greengenes databases on our data resources page. You'll need a representative sequence file and taxonomy file. Then, you can follow the tutorial for training your own classifier (use your primers rather than the EMP 515-806).

You can also follow the RESCRIPt tutorial to download and format your own database.

Best,
Justine

3 Likes

Hi, Brian!

The error is due to a problem with a conda environment and a newer version of scikit-learn. To get rid of it:

  1. Activate the env with QIIME2 installed
  2. Run: conda install scikit-learn=0.23.1

Cheers!

Thanks @crusher083 - I think that suggestion might not work in this case, though --- q2-feature-classifier 2021.8 has a hard pin on scikit-learn 0.24.1 --- force installing an older version of scikit-learn will cause conda to uninstall q2-feature-classifier.

2 Likes

Thank you all! I have been away from my servers for a few day, but will be connecting tomorrow and will follow up on the suggestions above. @[jwdebelius, if I understand you correctly, I do not use my own data for training, I should use the provided reference sequences and corresponding taxonomic classifications? May I assume that since these are publicly available, the training is already done and there are up-to-date trained data? Or is there some reason the users need to do the training? Prolly dumb questions :stuck_out_tongue:

Here you go! :point_right: Data resources — QIIME 2 2021.8.0 documentation

:sparkles: :qiime2: :sparkles:

2 Likes

I ran feature classifier using the downloaded "silva-138-99-nb-weighted-classifier.qza" and it worked>
I ran:
#!/bin/bash qiime feature-classifier classify-sklearn \ --i-reads deblur_output/representative_sequences.qza \ --i-classifier taxa_classifiers/silva-138-99-nb-weighted-classifier.qza \ --p-n-jobs 32 \ --output-dir taxa

It completed in a reasonable amount of time. The tutorial says something about trusting the person who trained, but I am just starting so I do not really trust myself. I will also try to follow the training tutorial using my primers, but they are standard V3 and V4 primers.

Thanks again for all the amazing and quick support!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.