Classify-sklearn "Killed: 9" output

Hi all -

I am running feature-classifier classify-sklearn on a dataset of 143 samples using the provided silva classifier. After ~10 minutes, the function stops running with the output "Killed: 9" (this also happens when I use the 515/806 pre trained silva classifier).

I tried searching for kill codes and couldn't find anything.

Any ideas?

thank you!!

Hi @Jenna_Shelton! This sounds like an out-of-memory error, but to help diagnose, can you please attach the detailed error log (it should be the filepath mentioned just past the bottom of your screenshot). Alternatively, you can re-run with --verbose. If it is a memory error, you could try adding the pre_dispatch parameter, and set it to 1 (by default it is 2*n_jobs, or you could try reducing the default chunk_size, which might help. Thanks!

Hi @thermokarst - there was no error log associated with this “Killed: 9” message, which further stumped me! Also, this kill message does not pop up when I use the greengenes database with classify-sklearn instead of the silva database, if that helps at all? I’ll try re-running with the suggested memory-reducing parameters and get back to you, thanks!

2 Likes

Hi @thermokarst

Unfortunately, those options did not solve my problem. Here are some screenshots:

The --verbose option gave me no additional text, while the --p-pre-dispatch and the --p-chunk-size parameters also failed.

Any additional advice?

Thanks!!

Hey @Jenna_Shelton,

What are you running this on, and what kind of memory and disk space is available?

The above parameters should have helped, but maybe there is still too little memory available to use.

Hi @ebolyen,

I am running this on a 1 TB MacBook Pro with ~900 GB available, and with 16 GB 2133 MHz LPDDR3 memory specs.

Hi @Jenna_Shelton, thanks for posting your machine specs. It sounds like you are in fact running out of memory, which is a bummer. We have seen this with the SILVA database, it seems to need quite a bit more memory when compared to something like greengenes (FWIW, we develop QIIME 2 on machines spec’ed similarly to what you posted, and we have had the same problems crop up).

Moving forward, it sounds like you might need to run this step somewhere on a machine with more available memory. If you have an institutional cluster you could see if they could install QIIME 2 there for you (it is pretty straightforward). Another option, which might be quicker, is to launch one of our QIIME 2 Amazon Web Services instances (you will want to read up on AWS EC2 instances if you are unfamiliar with cloud computing, but basically you can rent some hardware from Amazon for pretty cheap, and we have ready-to-roll installations of QIIME 2 available there.

As a last-ditch effort, I am pinging @BenKaehler, the brains behind q2-feature-classifier, just to make sure there isn’t some other trick available. Sorry I don’t have better news for you :frowning:.

3 Likes

Hi @Jenna_Shelton, thanks @thermokarst.

I’ve just tried running these classifiers on my MacBook Pro that has identical memory to yours with the 2017.7 build.

I used the rep-seas.qza from the tutorial and the classifiers from the Data resources.

$ qiime feature-classifier classify-sklearn --i-classifier silva-119-99-515-806-nb-classifier.qza --i-reads rep-seqs.qza --o-classification blah.qza

and

$ qiime feature-classifier classify-sklearn --i-classifier silva-119-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification blah.qza --verbose

both ran to completion (with a few deprecation warnings). They both peaked out at less than 11GB of memory.

The rep-seas.qza that I used only contains 776 sequences. Is yours much larger than that? If so, you could try reducing chunk size further to, say, 776. If you get it running and performance is important you could increase that later.

The only other thing I can think of is that there might be something else on your system that is causing you to run out of memory. This seems unlikely, though, because my machine will happily use up ~40GB of memory (by using swap) before bad things start happening, and before that point you would observe that your machine becomes unresponsive (which is the first of the bad things).

So I guess check the version of qiime 2 that you’re using, try reducing the chunk size again, try a different machine (it wouldn’t have to be a very impressive machine if my laptop can handle it), and let us know how you go.

3 Likes

Hi @BenKaehler and @thermokarst,

Thanks for all of your help and suggestions!! My rep-seqs.qza file is 1.6 MB, so a lot larger than the one in the tutorial (145 samples with 10,000 - 90,000 sequences each).

I tried re-running classify-sklearn with a much lower chunk size (1,000), and it has been running for a half hour without “killing” itself, so I’ll check back in and see if that runs to completion. If not, I’ll try AWS or the cluster at my institution.

An off-topic reply has been split into a new topic: Splitting datasets for processing

Please keep replies on-topic in the future.

Hi, I got the same error (KILLED) when trying to build my own taxonomic classifier. No errors using greengenes (13_8_99). With SILVA, impossible to create the classifier using 99_otu_fasta or 97_utu_fasta in my computer (using only, 16s). However, I was able to create a classifier using 94_otu_fasta.

Here, I’m using virtual box with 8GB of memory.

Thanks,
FS.

Hi @fstudart, please take a peek at the suggestions posed by myself & @BenKaehler in this thread for strategies to move forward. Thanks!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.