Importing downloaded latest UNITE database for ITS classifier training

Can anyone please help me with importing the latest UNITE qiime2 ITS trainer?
I have been struggling with the file which has a .gz format:

qiime tools import \

--type 'FeatureData[Sequence]'
--input-path "C:\Users\User\Desktop\qiime2-ITS1\C5547B97AAA979E45F79DC4C8C4B12113389343D7588716B5AD330F8BDB300C9\C5547B97AAA979E45F79DC4C8C4B12113389343D7588716B5AD330F8BDB300C9.tar"
--output-path unite.qza

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path sh_taxonomy_qiime_ver7_dynamic_01.12.2017.txt
--output-path unite-taxonomy.qza
I used the above command and I got error
qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path sh_taxonomy_qiime_ver7_dynamic_01.12.2017.txt
--output-path unite-taxonomy.qza

I used the above command and I got an error, can anyone put me through.

Regards.

Betty

Hello Betty,

It's good to see you on the forums again. Thanks for posting the commands you ran.

It looks like the UNITE team distributes files in the .tgz format. That's a tar file compressed with .gz, so we will have to uncompress it to see the files inside.

:package: :soon: :card_file_box:

After downloading this file, try extracting it with a command like this:

cd path/to/the/dowloaded/file
tar -xf C5547B97AAA979E45F79DC4C8C4B12113389343D7588716B5AD330F8BDB300C9.tgz

When I run this command, it extracts the .tgz file and makes a new directory called sh_qiime_release_10.05.2021.

Where you able to extract that .tgz file and see that directory?

I can see this directory when I run ls -alh

(base) cbrisl@MacBook-Air Downloads % ls -alht
total 1947200
drwxr-xr-x+  84 cbrisl  staff   2.6K Dec 20 00:36 ..
drwx------@ 161 cbrisl  staff   5.0K Dec 20 00:35 .
-rw-r--r--@   1 cbrisl  staff    48M Dec 20 00:32 C5547B97AAA979E45F79DC4C8C4B12113389343D7588716B5AD330F8BDB300C9.tgz
drwxr-xr-x@  10 cbrisl  staff   320B May 10  2021 sh_qiime_release_10.05.2021
...

Inside that directory are the files you can import into Qiime2:

(base) cbrisl@MacBook-Air Downloads % ls -alht sh_qiime_release_10.05.2021 
total 213456
drwx------@ 161 cbrisl  staff   5.0K Dec 20 00:35 ..
drwxr-xr-x@  10 cbrisl  staff   320B May 10  2021 .
drwxr-xr-x@   8 cbrisl  staff   256B May 10  2021 developer
-rw-r--r--@   1 cbrisl  staff    33K May 10  2021 QIIME_ITS_readme_10.05.2021.pdf
-rw-r--r--@   1 cbrisl  staff   7.9M May 10  2021 sh_taxonomy_qiime_ver8_dynamic_10.05.2021.txt
-rw-r--r--@   1 cbrisl  staff   8.2M May 10  2021 sh_taxonomy_qiime_ver8_99_10.05.2021.txt
-rw-r--r--@   1 cbrisl  staff   5.6M May 10  2021 sh_taxonomy_qiime_ver8_97_10.05.2021.txt
-rw-r--r--@   1 cbrisl  staff    30M May 10  2021 sh_refs_qiime_ver8_dynamic_10.05.2021.fasta
-rw-r--r--@   1 cbrisl  staff    31M May 10  2021 sh_refs_qiime_ver8_99_10.05.2021.fasta
-rw-r--r--@   1 cbrisl  staff    22M May 10  2021 sh_refs_qiime_ver8_97_10.05.2021.fasta

Let me know if you were able to extract that .tgz file and could see the Qiime2 compatible files inside.

Colin :whale2:

1 Like

Hi Colin

Thanks a bunch. It pretty much works out.
Another issue I have is that I am struggling to train the classifier. I used this command line:

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite-taxonomy.qza
--i-reference-taxonomy unite.qza
--o-classifier unite-ver8_99_10.05.2021.qza
Output
There were some problems with the command:
(1/2) Invalid value for '--i-reference-reads': Expected an artifact of at
least type FeatureData[Sequence]. An artifact of type FeatureData[Taxonomy]
was provided.
(2/2) Invalid value for '--i-reference-taxonomy': Expected an artifact of at
least type FeatureData[Taxonomy]. An artifact of type FeatureData[Sequence]
was provided.
My question is that, which of these files should I import for the training?
I have these 2 files imported already to qiime2: unite-taxonomy.qza and unite.qza

I am working on ITS 1 and 2 data now on y virtual box.

Please advice.

I'm glad this worked out, and you got these files imported!

I think this next error should be pretty quick to address:

Looks like these two input files got switched. Try this:

--i-reference-reads unite.qza 
--i-reference-taxonomy unite-taxonomy.qza

Hi Collin

Thank you for the correction. I changed it just like you directed and the command ran but did not complete. This is the command I used:

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite.qza
--i-reference-taxonomy unite-taxonomy.qza
--o-classifier unite-ver8_99_10.05.2021.qza

It returned as "Killed"

I realised after searching the forum that quite a number of people have challenge training classifiers.
Can you please advise what this means?
Also, if you can share an already trained classifier with me, I wouldn't mind.
I must submit the fungi results on resumption in January 2022.

Regards.

Ah, OK. Thanks for telling me more.

The status of 'Killed' means that the computer or server you are running this on intentionally canceled the command, often this is because it ran out of RAM / memory. Training a classifier does take a lot of memory, so this is probably what happened.

How much memory does your computer or VM have? Could you run this command on a server or computer with more memory so that the command will be able to complete?

EDIT: I forgot to mention this! If you are ok with using an older version of UNITE and Qiime2, then you could use the pretrained UNITE v8.0 classifiers. This way you can avoid this fit-classifier-naive-bayes step that requires so much RAM, and move on to the classifying your ASVs! Let me know if you would like to try that.

Hi
I downloaded the dynamic version of the classifier and I used this command:

qiime feature-classifier classify-sklearn
--i-classifier unite-ver8-dynamic-classifier.qza
--i-reads dada2-single-end-rep-seqs.qza
--o-classification taxonomy.qza

I got this error:

Plugin error from feature-classifier:

The scikit-learn version (0.20.2) used to generate this artifact does not match the current version of scikit-learn installed (0.23.1). Please retrain your classifier for your current deployment to prevent data-corruption errors.

Debug info has been saved to /tmp/qiime2-q2cli-err-09bfhzi1.log

Verbose error log:
raceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in classify_sklearn
File "/home/qiime2/miniconda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
spec.view_type, recorder)
File "/home/qiime2/miniconda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/result.py", line 289, in _view
result = transformation(self._archiver.data_dir)
File "/home/qiime2/miniconda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/qiime2/miniconda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 64, in _1
% (sklearn_version, sklearn.version))
ValueError: The scikit-learn version (0.20.2) used to generate this artifact does not match the current version of scikit-learn installed (0.23.1). Please retrain your classifier for your current deployment to prevent data-corruption errors.

How do I resolve this error?
Kindly assist.

Thanks and regards.

The UNITE classifiers I linked above were trained with qiime2 2019.1, so you will have to install that older version of Qiime in a conda environment to use it.

Let me know if you need a hand installing qiime2-2019.1 or if you run into any more issues getting the classifier to run.

Hi

Thank you for the tip.

I will like to inform you that I am using a virtual machine with the latest qiime2 installed on my windows laptop.
How do I install qiime2-2019 in a conda environment?
Do you know any tutorial I can follow to install the qiime2-2019.1 on the virtual box?

Thank you for your help.

Betty

1 Like

You can download the qiime2-2019.1 image here!
https://data.qiime2.org/distro/core/virtualbox-images.txt

(You did mention you were using a VM, I should have remembered that.)

Hi Colin
Thanks for the support. Merry Xmas to you!

I downloaded the qiime2 2019.1 version and ran the classifier and I got this error:

qiime feature-classifier classify-sklearn \

--i-classifier unite-ver8-dynamic-classifier.qza
--i-reads dada2-single-end-rep-seqs.qza
--o-classification taxonomy.qza
--verbose

Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "</home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-338>", line 2, in classify_sklearn
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 225, in bound_callable
spec.view_type, recorder)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/result.py", line 287, in _view
result = transformation(self._archiver.data_dir)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 72, in _1
pipeline = joblib.load(os.path.join(dirname, 'sklearn_pipeline.pkl'))
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 598, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 526, in _unpickle
obj = unpickler.load()
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/pickle.py", line 1050, in load
dispatchkey[0]
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 352, in load_build
self.stack.append(array_wrapper.read(self))
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 195, in read
array = self.read_array(unpickler)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 141, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
MemoryError

Plugin error from feature-classifier:

See above for debug info.
Kindly help with this.

Regards.

Hello again Betty,

Did you check you errors for clues? :mag_right:

I saw it says MemoryError, which is probably because there's not enough memory on this machine.

How much memory / RAM do you have on your host Windows machine?
How much memory have you allocated to your VM?

You could also try reducing the number reads-per-batch, say using --p-reads-per-batch 1000, and see if that helps reduce the memory needed so the command can finish.

Hi Colin

Thanks for the information.

Yes, I read the debug info, I just want to double-check with you.
I am unable to run the command even after reducing the per read to 500.
Is there any other way out that you can advise?

My laptop is overstretched already, just need to get a better machine.

Thank you for your assistance.

Much regards.

1 Like

Huh... I wonder what's the minimum needed ram needed for this classier.

How much memory / RAM do you have on your host Windows machine?
How much memory have you allocated to your VM?

Hi Colin
Thank you for your help.
I was trying to use another machine to run the classifier but I am yet to do it.
I have an 8GB machine and I allocated 2084 MB for the VM.

Regards.

Hello Betty,

Thanks for telling me more.

Having 8 GB on the host machine and 2 GB for the VM is pretty limiting. Try closing all other programs on the host machine and increasing the memory allocated to the VM to 6 GB (or 6000 MB).

Let me know how it goes. I think the best solution will be to find a machine with more memory to run this step, so let me know if you find that too!

2 Likes

Dear Colin
Compliments of the new year to you, again.

I waited for resumption at my workplace and I tried the classifier again but I am still unable to do it.
I allocated 7GB to the VM but it is not working, I do not know why, because I read that I need about 6GB ram to run the classifier command.
I actually do not know what to do as I have to submit this result before the end of the month.
I now know that machine resources are a vital part of data analysis.

Any suggestion or help from your side?

Much thanks.

Betty

Hi @bettya - as @colinbrislawn has mentioned a few times above, the machine you're trying to run this analysis on might not be capable of supporting these commands. Do you have access to an institutional HPC? Or a department server? Many funding agencies also provide free or low-cost compute infrastructure for awardees.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.