help: training ITS classifier

Hi!
I am trying to train classifier for ITS sequences using Unite version 9 (PlutoF DOI), using computer with Ryzen 9 3900x processor and 64 gigs RAM and qiime2-2022.11

I used the code as per the instruction in "Training feature classifiers with q2-feature-classifier"

Fixed formatting errors:

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' sh_refs_qiime_ver9_99_s_25.07.2023_dev.fasta | tr -d ' ' > sh_refs_qiime_ver9_99_s_25.07.2023_dev.fasta_uppercase.fasta

Improting fasta to Qiime2

qiime tools import
--type FeatureData[Sequence]
--input-path sh_refs_qiime_ver9_99_s_25.07.2023_dev.fasta_uppercase.fasta
--output-path unite-ver9_sequence.qza

Importing Taxonomy files:
qiime tools import
--type FeatureData[Taxonomy]
--input-path sh_taxonomy_qiime_ver9_99_s_25.07.2023_dev.txt
--output-path unite-ver9_taxonomy.qza
--input-format HeaderlessTSVTaxonomyFormat

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads developer/unite-ver9_sequence.qza
--i-reference-taxonomy developer/unite-ver9_taxonomy.qza
--o-classifier unite-ver9-classifier.qza
Killed

All the codes ran well, except for the last one which terminated
multiple times with error killed after about 30 minutes of running the code.
I would appreciate the identifying the potential source of error and fixing it.
Thank you.

Hi @bishnu,

Can you run this command with the verbose flag?
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads developer/unite-ver9_sequence.qza
--i-reference-taxonomy developer/unite-ver9_taxonomy.qza
--o-classifier unite-ver9-classifier.qza
--verbose

My suspicion is that you are running out of memory durning this last step. How much memory do you have on your machine?

:turtle:

Hi, thank you so much for the suggestion. I will run the code you suggested. I am running the analysis in a computer with 64 gb memory.

1 Like

5 off-topic replies have been merged into an existing topic: memory usage for skl-nb classifier

Please keep replies on-topic in the future.

Hi, here is the output of running the code with --verbose
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite-ver9_sequence.qza
--i-reference-taxonomy unite-ver9_taxonomy.qza
--o-classifier unite-ver9-classifier.qza
--verbose
/home/bishnu/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Killed
sounds like some incompatibility issue with the classifier and the Qiime version I am using. Is there anyways I can check compatibility beforehand?

1 Like

Hey @bishnu,

This last line of the traceback confirms that you did in fact run out of memory on your last command. Unless any other mods are able to chime in with suggestions here, I would say you'll most likely need to find a way to run this on a machine with more memory. If you have access to an HPC, this would be my next recommendation.

Cheers :lizard:

1 Like

Hello Bishun,

One other option is to use a version of the database trained by another person or team.

In this case, I have this combination of Unite v9 and qiime2-2022.11 already built, if you would like to use it:

I also had to adjust the upercase letters, just like you did.

2 Likes

Hi @colinbrislawn ,
Thank you so much for suggesting the pretrained classifier; as I do not have sufficient memory to train the classifier, I will go for it.

Hi @lizgehret,
Thank you so much for helping identify the problem and suggesting the alternative. I do not have access to HPC, so I will go for the pretrained alternative.

1 Like