Training Silva Classifier "Killed" (qiime2-2022.8)

mkweber · April 26, 2023, 3:15pm

I used this command to extract reads to prepare for training a silva classifier for a paired-end 18S sequencing:
qiime feature-classifier extract-reads --i-sequences silva-138-99-seqs.qza --p-f-primer TTAAARVGYTCGTAGTYG --p-r-primer CCGTCAATTHCTTYAART --o-reads ref-seqs.qza

I am training the SILVA classifier using the command for aligning 18S reads:
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy silva-138-99-tax.qza --o-classifier classifier.qza --verbose

The following messages are displayed:
UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.24.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)

...and the program is subsequently killed after several hours.

I am using the appropriate scikit-learn version (0.24.1) for qiime2-2022.8 and running my program on a remote server (ssh), so memory and RAM should not be the issue here.

Please help!!

colinbrislawn · April 26, 2023, 3:20pm

Hello Margo,

Welcome to the forums! :qiime2:

It still could be... fit-classifier-naive-bayes takes a lot of memory.

How much memory does that server have? Can you use a second ssh session or screen to monitor memory usage while that script is running?

Remote servers and supercomputer clusters can have other settings that can cancel (kill) jobs, like limits on run time. If you submit this with a slurm script, can you show us the settings for that script too?

Thanks!

Any clues you can share will be helpful.

mkweber · April 26, 2023, 3:37pm

Here's information about the server while the naive-bayes script is running:
total used free shared buff/cache available
Mem: 32116 27939 3284 0 893 3714
Swap: 975 311 664

mkweber · April 26, 2023, 3:38pm

sbatch -n 10 qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy silva-138-99-tax.qza --o-classifier classifier.qza**

colinbrislawn · April 26, 2023, 3:45pm

Can you report memory with -h or something to get stats in kilobits or Mbts?

While -n 10 would help with highly parallel commands, training is mostly single threaded. You may have better luck without this setting enabled!

mkweber · April 26, 2023, 3:52pm

Screen Shot 2023-04-26 at 11.49.57 AM

colinbrislawn · April 26, 2023, 4:32pm

Perfect, thanks!

32GB is a good amount of RAM, but depending the size and complexity of your input database, it may not be enough.

Also, slurm settings can change memory allocation.

-n, --ntasks=<number>
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default.

I always forget exactly how this works, however I have run into an issue where I ask for 10x threads only to realize that slurm has given me 1/10 of a node because it thinks I will be running 10x things on it and is trying to be clever.

If slurm is allocating 3.2 GB to your job, that's the problem

I'm sorry I'm not much help here. Now is the perfect time to reach out to your HPC support team because they will know exactly how to request all 32 GB of memory for your job.

mkweber · April 26, 2023, 8:16pm

So i just received some updated information about the server: it is a single, standalone box, not a node in an HPC cluster.

with that in mind, how should I proceed?

colinbrislawn · April 26, 2023, 8:57pm

Understood!

One of your commands mentioned sbatch, which is a slurm submission command. Is this machine running slurm? Who else is submitted jobs to it?

mkweber · April 26, 2023, 9:34pm

I ended up not running the slurm command, as i needed administrator permission to install the plugin.

Are there other ways to bypass this issue?

colinbrislawn · April 26, 2023, 9:45pm

You could try running the command again, while using the Linux top command to monitor memory usage while running.

We want to confirm if memory is the limitation, or something else is killing the script.

system · May 28, 2023, 3:46am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.