UNITE+INSDC Dataset for ITS fungi

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite+insdc_seqs_dynamic.2023.qza
--i-reference-taxonomy unite+insdc_taxonomy_dynamic.2023.qza
--o-classifier unite+insdc_dynamic_all_classifier.2023.qza
--p-classify--chunk-size 10000
--verbose
/programs/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)

I noticed that utilizing the 'p-classify--chunk-size' parameter helped divide the imported dataset into smaller chunks. However, after running these processes for over 48 hours, I have neither received any results nor seen progress in the process. Despite the fact that the representative sequences for UNITE+INSDC are considerably more abundant than UNITE alone, I'm wondering if such an extended running time is to be expected?

Hi @Shuai_Man, This step can take a long time. For example, training the Silva classifier takes us about four days.

Have you tried running htop or top (if htop isn't available) on the machine where the job is running? That will let you see if the job is using a lot of CPU. If you see one of the CPUs spiked at about 95-100%, the command should be working as expected.

Dear Dr. Caporsaso, thank you for your advice! I am currently using 'top' to monitor the machine and I do see the spike around 95% - 100 %. I hope this approach works well.

1 Like

qiime feature-classifier fit-classifier-naive-bayes \

--i-reference-reads unite+insdc_seqs_dynamic.2023.qza
--i-reference-taxonomy unite+insdc_taxonomy_dynamic.2023.qza
--o-classifier unite+insdc_dynamic_all_classifier.2023.qza
--verbose
/programs/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Killed

It appears that the process was killed and no further error messages were provided. Could this issue be related to a RAM limitation? Do you think using the --p-classify--chunk-size parameter might help, or should I consider switching to a machine with more memory?

Thank you.

Hi @Shuai_Man, I suspect this is a RAM issue, so moving to a machine with more RAM is probably the best option. Decreasing chunk size can also potentially help, but given how long this takes (and how frustrating it is to get an error after the job has been running for a few days - I imagine you're experiencing that frustration now) moving to a machine with more memory is probably the best option.

Oh, thank you! It does strike a nerve because I've been stuck on this step for two weeks. Currently, I've been using a large memory Gen2 machine with 80-112 cores and 512GB of RAM. The only option left is the extra-large memory machine with 96-112 cores and 1024GB of RAM. I will go and try that one. Thank you for the advice!

That seems pretty excessive in terms of memory requirements, but I haven't built an ITS classifier in a while and memory does increase with the complexity of the database.

@SoilRotifer, @BenKaehler, @Nicholas_Bokulich - does anything jump out to you here? Would you expect that it could take >0.5TB to train an ITS classifier?

To specify the resource I used:

Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): Full UNITE+INSD dataset for eukaryotes. Version 18.07.2023. UNITE Community. PlutoF DOI.

No this does not make sense. It has been a while but normally I train a UNITE+INSDC classifier on my laptop (16 GB) without issue. @Shuai_Man there is a lot of advice on the forum for reducing memory demands while training the classifier, this should work in your case as well.

@colinbrislawn released some pre-trained UNITE classifiers a short time ago. I suggest seeing if you can use one of these:

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.