fit-classifier-naive-bayes error

Hello, I downloaded the database for CO1 from MIDORI 2 (http://www.reference-midori.info/download.php).
I unzipped the files, imported into QIIME2-2023.2. I run below command in my mac, and it did not work.

qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads MIDORI2_UNIQ_NUC_SP_GB255_CO1_QIIME.qza \
--i-reference-taxonomy MIDORI2_UNIQ_NUC_SP_GB255_CO1_QIIME.taxon.qza \
--verbose \
--o-classifier CO1_classifier.qza

/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:102: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)

warnings.warn(warning, UserWarning)

zsh: killed qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads

I also run the same command in a server and got below error message:
Plugin error from feature-classifier:
Unable to allocate 120. GiB for an array with shape (20000, 807796) and data type int64
****Debug info has been saved to /tmp/qiime2-q2cli-err-djk90mzh.log

I was wondering what the warning message means and how I can train the classifier properly.

1 Like

Hi @eDNA,
This error is saying that you do not have enough memory to run this command. It takes a lot of memory to train a classifier!

Do you know how much RAM is available on your MAC? Do you have access to an HPC because that might be necessary to run this command?

1 Like

The memory on my mac is 16GB.

The number of sequences in the MIDORI2_UNIQ_NUC_SP_GB255_CO1_QIIME.qza file is 2809132.

How much memory is needed? Yes, I have access to an HPC and a server.
When I run the command in a server, I set

#SBATCH --mem=120G
#SBATCH --cpus-per-task=32

I got the error I mentioned in the post.

Hi @eDNA,
It looks like you will need more than 120G of memory!

I can't predict the exact amount of memory you will need.

However, This task doesn't have the option for multi-threading so I think this isn't doing anything helpful.

Hope that helps!
:turtle:

1 Like

Hi again @eDNA,
Another moderator chimed in with some helpful advice!
Have you dereplicated your reference database? Here is the rescript tutorial that may be helpful with that.

You could also skip the classifier and run feature-classifier classify-consensus-blast or feature-classifier classify-consensus-vsearch. For this you may want to consider setting either --p-maxaccepts 32 --p-maxrejects 128 or --p-maxaccepts 64 --p-maxrejects 256 as opposed to performing an exhaustive search.

Hope that helps!
:turtle:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.