update scikit-learn version

rparadiso · July 22, 2020, 9:23am

Good morning to all,
I have this message error when I perform the classification:

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep_seqs_150.qza
--o-classification taxonomy.qza

Plugin error from feature-classifier:

The scikit-learn version (0.23.1) used to generate this artifact does not match the current version of scikit-learn installed (0.22.2.post1). Please retrain your classifier for your current deployment to prevent data-corruption errors.

Can someone help me to understand in which way I can update the required version of the classifier?

Thank you to someone help me.

Rubina

the_dummy · July 22, 2020, 11:27am

Hello,

Which q2 version did you use to train your classifier and which q2 version do you use to run this command?

If they are different and if q2 developers have updated scikit-learn in new q2 version, this error could come up.

You can retrain your classifier with new q2 version or you can perform your analysis with the q2 version you have trained your classifier with.

Regards.

devonorourke · July 22, 2020, 3:07pm

I just ran into this issue also. Will sklean fail just because there are different versions? It's not a big deal to rerun code with an updated qiime deployment, but these classifier trainings can take several days.

Any chance this kind of error message can just be a warning, but not kill the process?

thermokarst · July 22, 2020, 3:33pm

Hey @rparadiso!

I will address this by answering some of the questions raised by @the_dummy & @devonorourke:

Check this part of the error message to see which version the classifier was trained with:

This is the right answer: if you're using an old version of QIIME 2 (which you might be, @rparadiso), then you just need to use the version of the pretrained classifier that was released with that version of QIIME 2. The easiest way to do that is to go to docs.qiime2.org, and in the upper left, select the version of QIIME 2 you are using, then navigate to the "Data resources" link.

No, this error is generated (deliberately) by q2-feature-classifier, not scikit-learn.

No, because the underlying sklearn model can change between versions. Worst-case scenario, if we didn't let q2-feature-classifer error out, you could see classification results that appear to have run successfully, but in fact have produced incorrect results (we call this a data-integrity bug - think "uh oh, I need to retract a paper" sort of issue). That scenario isn't super likely IMO, a more likely situation is that sklearn would just crash, potentially with a less clear error message.

We hope that one day scikit-learn will have portable serializations of their classifier models, which would make this process a little bit smoother, but no clue if that is even on their radar.

BTW, this is a useful read relevant to this discussion - scikit-learn: machine learning in Python

devonorourke · July 22, 2020, 3:46pm

Thanks @thermokarst, appreciate the insider knowledge.

This feels like a really stupid question... the error message shows what version sklearn was applied to generate the --i-classifier object, but how do I go about identifying what QIIME version was used to create that same object? If I run qiime tools peek..., I don't get that specific info:

UUID:        50e48dcb-e40c-4757-b918-2bacfbf0afc6
Type:        TaxonomicClassifier
Data format: TaxonomicClassiferTemporaryPickleDirFmt

but maybe there's something in the UUID that will be useful? I thought maybe I'm supposed to view it in view.qiime2.org and get other provenance info, but the example doesn't show which QIIME version was used in the screenshot in that example.

Related stupid question / feature request: would it be to possible to add the QIIME version associated with any artifact as an output in that qiime tools peek... command?

Turns out I was trying to run @Nicholas_Bokulich 's hybrid classifier, so I ended up needing not only to figure which QIIME version I used to generate that classifier object, but I also needed to resolve when the hybrid classifier became an option. I'm going to have to recreate the classifier object, so at the moment on Monsoon I'm eating up something like 1TB or RAM. Apologies anyone in Flagastaff...

thermokarst · July 22, 2020, 4:15pm

Not a silly question at all - we don't have a great solution here - maybe we need to publish a simple table or something with this information. Otherwise, scrolling through different versions of the docs (or checking out the commit history on GitHub - qiime2/q2-feature-classifier: QIIME 2 plugin supporting taxonomic classification) (cc @Oddant1 - let's chat about this, I could use a hand putting together this table, if you're available).

Ah bummer, too bad screenshots can't scroll! The framework version is literally just below the bottom of that image cutoff:

Great question - its on our radar! Provide brief information about an artifacts provenance from `peek` · Issue #423 · qiime2/qiime2 · GitHub

lol, all is forgiven

system · August 22, 2020, 10:15pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.