how to convert a trained taxonomy classifier to a gz file?

Hi everyone,

I have decided to use R for my own continence as my pipeline recently. However, I really couldn’t find a method to train a region-specific classifier (based on my primers) according to SILVA 138 database in R. However, back then I have already trained my own classifier based on my primer-set in QIIME2 through this tutorial, which its format is .qza, while the format which assignTaxonomy() function in R knows is a .gz as the input for the classifier library. Is there any way to convert my silva138-classifier-341f-805r.qza to silva138-classifier-341f-805r.gz?
Much appreciated in advance.

Perhaps because this method and workflow are quite specific to QIIME 2. There are other taxonomy classifiers available in R, but they have their own workflows and input formats.

No. That file does not contain a gzipped set of DNA sequences, it contains a trained scikit-learn classifier, which would be unreadable by anything in R. So there is no way to export it and use that classifier in R.

note: I edited the title to make it more specific to your question. QZA is a vague extension (just as gz can contain any gzipped contents, a QZA can contain any QIIME 2 results).

Good luck!

3 Likes

Thanks Nicholas,

I already downloaded the silva138 classifier but it is for the whole gene, while I want to have it region-specific to my primer sets. Since couldn’t find a workflow to make the classifier region-specific to my primer, do you think it would be alright if I just go with the whole classifier?

Kinds

Hi @farhad1990,

Earlier in this thread you said you worked through the RESCRIPt tutorial. Did you not try this part of the tutorial?

-Mike

2 Likes

Ah thanks for clarifying @SoilRotifer — I think I understand your question now @farhad1990

Are you asking rather “can RESCRIPt be used to create a classifier that can be exported and used in R?

You could use RESCRIPt (following that tutorial) to compile and format a custom reference sequence database, then export those formatted sequences (prior to training the classifier) to fasta format and use them in R (e.g., for taxonomy classification). However, you cannot export a trained classifier and use it in R because it is in a very special format.

3 Likes

Hi Mike,
Yes I did this part for sure and trained my classifier. However, now I would like to use this classifier which I trained and specified it to my primers, to be used in my R workflow. Since I couldn’t find a way in R for training and making an amplicon-specific classifier, I was wondering if there is any ways to convert use my in-qiime-trained classifier in R.

Kinds,
Farhad

Okay, then you can proceed as @Nicholas_Bokulich suggested above. Just take the formatted taxonomy and sequence files (the ones you’d input into the classifier) and import them into R instead. Then use your favorite R tools to train your reference database and classify. For example, you can likely use the approach from this pipeline.

Thanks Mike,
I will go for it :slight_smile:

Kinds,
Farhad

1 Like

That was the exact question I asked and thanks for the answer. I am currently working on it.