Training an ITS classifier

Hi everyone!

I'm a bit puzzled about training the classifier, and I'd really appreciate your help.
I followed the "Training feature classifiers with q2-feature-classifier" tutorial to train an ITS classifier.
I should note that I chose the resource file from the UNITE (fungal ITS) section of the QIIME release.
Did I do the training correctly, or should I be using the "Fungal ITS analysis tutorial"?

1 Like

Hello @Beh_Yaad,

There's an even newer tutorial for building an ITS classifier using RESCRIPt.

This is how I do it:

If you are looking for full length ITS UNITE, I have files already trained that you can download here: GitHub - colinbrislawn/unite-train: 🍄 Qiime2 ITS classifiers for the UNITE database

I'm interested in the UNITE database, so let me know if there is some other way you wanted to use these resources.

3 Likes

Hi Colin,
Thanks a lot for your solution.
When I ran the command 'Evaluate the classifier', I got the following error:
Plugin error from rescript:
A plugin named 'longitudinal' could not be found.

First, I'd appreciate it if you could demonstrate how to resolve the error.
Next, should I run the command using the 'qiime2-shotgun-2023.9' or 'rescript' environment?
Third, if I intend to use your pre-trained ITS classifier, should I apply and follow the 'Training feature classifiers with q2-feature-classifier' tutorial?

Thanks ahead of time!

Sure, I'm happy to help.

Can you post the full command you ran, along with the conda environment you ran it in?

Uh... I run these right in qiime2-amplicon-2023.9, so maybe the shotgun env is the issue?

The classifiers I distribute on GitHub are pre-trained, so you can skip directly to using them with
qiime feature-classifier classify-sklearn

(Just wanted to let you know that I've installed the latest Conda update.
WARNING: A newer version of conda exists. <==
current version: 23.7.3
latest version: 24.1.0)

1- Sure, here is the full command I ran:
(qiime2-shotgun-2023.9) uw-user@V74RMK29HK qiime2-shotgun-2023.9 % qiime rescript evaluate-fit-classifier
--i-sequences uniteDB/sequences-filtered.qza
--i-taxonomy uniteDB/taxonomy-no-SH.qza
--p-n-jobs 2
--o-classifier uniteDB/classifier.qza
--o-evaluation uniteDB/classifier-evaluation.qzv
--o-observed-taxonomy uniteDB/predicted-taxonomy.qza

qiime rescript evaluate-taxonomy
--i-taxonomies uniteDB/taxonomy-no-SH.qza uniteDB/predicted-taxonomy.qza
--p-labels ref-taxonomy predicted-taxonomy
--o-taxonomy-stats uniteDB/both-taxonomy-evaluation.qzv
Plugin error from rescript:

A plugin named 'longitudinal' could not be found.

Debug info has been saved to /var/folders/d1/l587ml155pq01sd8nm1f9d2r0000gr/T/qiime2-q2cli-err-4crrzeki.log

2- So, what's your opinion? Should I go with the "QIIME 2 Amplicon Distribution" for updating QIIME2, or could I achieve what I need using the "QIIME 2 Shotgun Distribution" option?

1 Like

I think installing a new conda env with the amplicon distribution is probably the easiest option. Let's try that first!

We can also reach out to the rescript devs if that does not solve the problem

I installed 'QIIME 2 Amplicon Distribution' and ran the first command based on your tutorial 'How to train a UNITE ITS classifier using RESCRIPt':
(qiime2-amplicon-2023.9) uw-user@V74RMK29HK qiime2-amplicon-2023.9 % qiime rescript get-unite-data
--p-version 9.0
--p-taxon-group eukaryotes
--p-cluster-id dynamic
--p-no-singletons
--verbose
--output-dir uniteDB

and I got the following error:
Error: QIIME 2 has no plugin/command named 'rescript'.

But if I'm not mistaken, the 'QIIME 2 Shotgun Distribution' environment already comes with rescript installed, doesn't it!?!

1 Like

You are doing everything right! It looks like there are dependency conflicts so q2-longitudinal is not installed by default in all distributions.

Try installing RESCRIPt in the amplicon distro following the instructions on GitHub:

Thanks for using RESCRIPt, especially while we work to streamline the install process.

2 Likes

I installed RESCRIPt using the link you provided and following Nicholas's tutorial, as shown in the command below:

"conda create -y -n rescript
conda activate rescript

conda install
-c https://packages.qiime2.org/qiime2/{ENV_VERSION}/shotgun/passed/
-c https://packages.qiime2.org/qiime2/{ENV_VERSION}/amplicon/passed/
-c conda-forge -c bioconda -c qiime2 -c defaults
qiime2 q2cli q2templates q2-types q2-types-genomics q2-longitudinal q2-feature-classifier
"pandas>=0.25.3" xmltodict ncbi-datasets-pylib rescript

qiime rescript --help"

This time, I ran into this error:
Error: QIIME 2 plugin 'rescript' has no action 'get-unite-data'. Did you mean 'get-ncbi-data'?

When I set up RESCRIPt, I noticed these commands got installed and showed up:

get-gtdb-data Download, parse, and import GTDB referencedata.
get-ncbi-data Download, parse, and import NCBI sequences and taxonomies
get-ncbi-data-protein Download, parse, and import NCBI protein sequences and taxonomies
get-ncbi-genomes Fetch entire genomes and associated taxonomies and metadata using NCBI Datasets.
get-silva-data Download, parse, and import SILVA database.

Hi @Beh_Yaad,

I think you skipped a very important part of the instructions:

Note: update {ENV_VERSION} in the commands below to match the QIIME 2 release.

That is, replace {ENV_VERSION} with 2023.9.

1 Like

Hi Mike, big thanks for your instructions!

I followed the steps in your GitHub tutorial precisely, yet I'm still missing the "get-unite-data" as per Colin's tutorial on 'How to train a UNITE classifier using RESCRIPt'.

How can I go about installing the "get-unite-data" plugin?
Alternatively, is there a specific location within my conda environments where I can find this plugin?

1 Like

Did you run qiime dev refresh-cache and then qiime rescript --help? After running those commands.

For the sake of clarity can you please, in your reply, paste all the commands you ran. That is, from activating your environment through installing RESCRIPt?

That is, you should be running the commands stated here for 2023.9.

Also, I realized my comment about replacing {ENV_VERSION} with 2023.9, was incorrect. That was for other RESCRIPt install options. :man_facepalming:

2 Likes

Did you run qiime dev refresh-cache and then qiime rescript --help? After running those commands.

No, but now I ran it and got the same output as the following command:

conda activate qiime2-amplicon-2023.9 

conda install -c conda-forge -c bioconda -c qiime2 \
    -c https://packages.qiime2.org/qiime2/2023.9/shotgun/passed/  \
    -c defaults   xmltodict 'q2-types-genomics>2023.5' ncbi-datasets-pylib rescript

qiime rescript --help

the commands:
  cull-seqs                    
  degap-seqs                   
  dereplicate                  
  edit-taxonomy                
  evaluate-classifications     
  evaluate-cross-validate      
  evaluate-fit-classifier      
  evaluate-seqs                
  evaluate-taxonomy            
  extract-seq-segments         
  filter-seqs-length           
  filter-seqs-length-by-taxon
  filter-taxa                  
  get-gtdb-data                
  get-ncbi-data                                      
  get-ncbi-data-protein                   
  get-ncbi-genomes             
  get-silva-data               
  merge-taxa                                         
  orient-seqs                  
  parse-silva-taxonomy         
  reverse-transcribe           
  subsample-fasta              
  trim-alignment               
*******************************************

activating my environment:

# conda environments:
#
base                  *  /Users/uw-user/miniconda3
qiime2-2023.7            /Users/uw-user/miniconda3/envs/qiime2-2023.7
qiime2-amplicon-2023.9     /Users/uw-user/miniconda3/envs/qiime2-amplicon-2023.9
qiime2-shotgun-2023.9     /Users/uw-user/miniconda3/envs/qiime2-shotgun-2023.9
rescript                 /Users/uw-user/miniconda3/envs/rescript
-------------------------------------------------------------------------------------
(base) uw-user@V74RMK29HK ~ % conda activate qiime2-amplicon-2023.9 

conda install -c conda-forge -c bioconda -c qiime2 \
    -c https://packages.qiime2.org/qiime2/2023.9/shotgun/passed/  \
    -c defaults   xmltodict 'q2-types-genomics>2023.5' ncbi-datasets-pylib rescript

qiime rescript --help

(qiime2-amplicon-2023.9) uw-user@V74RMK29HK ~ %

My apologies if things seem unclear! I was a bit mixed up myself. Should you require any additional information, please provide specifics, and I'll make sure to get it to you.

1 Like

Thank you @Beh_Yaad.

Note, I edited your post using some simple markdown to make the commands stand out more clearly.

You commands should look like this:

(base) user %   conda activate qiime-amplicon-2023.9

(qiime2-amplicon-2023.9) user %   conda install -c conda-forge -c bioconda -c qiime2 \
    -c https://packages.qiime2.org/qiime2/2023.9/shotgun/passed/  \
    -c defaults   xmltodict 'q2-types-genomics>2023.5' ncbi-datasets-pylib rescript

(qiime2-amplicon-2023.9) user %   qiime dev refresh-cache
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.

(qiime2-amplicon-2023.9) user %   qiime rescript --help

You should see get-unite-data in the resscript output? If not, I'd suggest removing qiime2-amplicon-2023.9 and reinstalling that version of QIIME 2. Then install RESCRIPt. Or you can wait until the next QIIME 2 release (2024.2) comes out, as RESCRIPt will be part of both the amplicon and shotgun releases.

Great, thank you!
As I mentioned earlier, I need to train a UNITE ITS classifier with RESCRIPt. The tutorial starts off with this command:

qiime rescript get-unite-data \
    --p-version 9.0 \
    --p-taxon-group eukaryotes \
    --p-cluster-id dynamic \
    --p-no-singletons \
    --verbose \
    --output-dir uniteDB

And Colin said that he ran this command within the qiime2-amplicon-2023.9 environment.

I'm wondering, if 'qiime2-shotgun-2023.9' comes with RESCRIPt pre-installed, does that mean I can use the shotgun version without having to switch between different environments?

1 Like

Hi @Beh_Yaad,

I realized, with the help of the other devs, that the get-unite-data action is flagged for the next QIIME 2 "amplicon" release. Therefore the only way to obtain the get-unite-data action is with the developer version. That is, running the following command:

pip install git+https://github.com/bokulich-lab/RESCRIPt.git

after running the conda install command for the 'amplicon' distribution. Which you've already done.

Once 2024.2 is released you'll not need to run the pip install command.

Let us know if this works.

2 Likes