Save a feature classifier properly

Hi Qiime2 team,

I am using the Artifact API to train a feature classifier. I managed to save the pre-trained classifier as an Artifact file, but could not load it back to predict my sequences.

Here is my code to train and save the classifier

ref_taxo_path = './our_taxonomy.qza’
ref_otus_path = ‘./our_otus.qza’

f_primer = 'GTGCCAGCMGCCGCGGTAA’
r_primer = 'GGACTACHVGGGTWTCTAAT’
trunc_len = 500

ref_taxo = Artifact.load(ref_taxo_path)
ref_otus = Artifact.load(ref_otus_path)
target_otus = extract_reads(ref_otus,
f_primer=f_primer,
r_primer=r_primer,
trunc_len=trunc_len)
clr = fit_classifier_naive_bayers(target_otus.reads,
ref_taxo)
clr.classifier.save(’./my_classifier.qza’)

When I executed the command to load it back

clr = Artifact.load(’./my_classifier.qza’)

I got error message like

~/anaconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/qiime2/sdk/result.py in load(cls, filepath)
59 def load(cls, filepath):
60 “”“Factory for loading Artifacts and Visualizations.”""
—> 61 archiver = archive.Archiver.load(filepath)
62
63 if Artifact._is_valid_type(archiver.type):

~/anaconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/qiime2/core/archive/archiver.py in load(cls, filepath)
290 Format = cls.get_format_class(archive.version)
291 if Format is None:
–> 292 cls._futuristic_archive_error(filepath, archive)
293
294 path = cls._make_temp_path()

TypeError: _futuristic_archive_error() takes 2 positional arguments but 3 were given

Could you guide me the correct way to train and save the classifier so that I can use it in the production? Thanks!

Hi @spencerimp!

:rocket:

It looks to me like you may have loaded the wrong environment, you are running this in 2017.7 (which ironically had a bug which prevented it from explaining the problem correctly).

I think you probably trained the classifier in 2017.9 or later. There were a couple of minor changes to the provenance format, which meant we bumped the internal archive/.qza/.qzv version. QIIME 2 recognizes that it doesn't have the code to understand what that means, and so it errors (or tries to anyway).

I would recommend upgrading QIIME 2 and retraining your classifier.

Let me know if you have any other question!

Note:
There is another storage-problem specific to feature-classifier that is probably also relevant to you. Its trained classifiers need to match exactly the version of scikit-learn installed. So in general saving the classifiers for long-term use isn't going to be very robust (you'll be fine within a given release however).

The reason for this, is the format records the version of scikit-learn used and will error if it doesn't match (it is basically a Python object dump, which isn't a stable format). This relates to a larger problem in the machine learning community where there really aren't any well defined file formats to save this kind of data, so an object dump is about the best we have for now.

1 Like

Hi ebolyen,

Thanks for pointing out the mismatch of environment of training and testing. After changing the environment of testing to 2017.9, I can import the saved classifier now.

Thanks!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.