Classify-sklearn "Killed: 9" output

Jenna_Shelton · August 23, 2017, 4:55pm

Hi all -

I am running feature-classifier classify-sklearn on a dataset of 143 samples using the provided silva classifier. After ~10 minutes, the function stops running with the output "Killed: 9" (this also happens when I use the 515/806 pre trained silva classifier).

I tried searching for kill codes and couldn't find anything.

Any ideas?

thank you!!

thermokarst · August 24, 2017, 3:21am

Hi @Jenna_Shelton! This sounds like an out-of-memory error, but to help diagnose, can you please attach the detailed error log (it should be the filepath mentioned just past the bottom of your screenshot). Alternatively, you can re-run with --verbose. If it is a memory error, you could try adding the pre_dispatch parameter, and set it to 1 (by default it is 2*n_jobs, or you could try reducing the default chunk_size, which might help. Thanks!

Jenna_Shelton · August 24, 2017, 12:38pm

Hi @thermokarst - there was no error log associated with this "Killed: 9" message, which further stumped me! Also, this kill message does not pop up when I use the greengenes database with classify-sklearn instead of the silva database, if that helps at all? I'll try re-running with the suggested memory-reducing parameters and get back to you, thanks!

Jenna_Shelton · August 25, 2017, 4:34pm

Hi @thermokarst

Unfortunately, those options did not solve my problem. Here are some screenshots:

The --verbose option gave me no additional text, while the --p-pre-dispatch and the --p-chunk-size parameters also failed.

Any additional advice?

Thanks!!

ebolyen · August 25, 2017, 5:35pm

Hey @Jenna_Shelton,

What are you running this on, and what kind of memory and disk space is available?

The above parameters should have helped, but maybe there is still too little memory available to use.

Jenna_Shelton · August 28, 2017, 4:37pm

Hi @ebolyen,

I am running this on a 1 TB MacBook Pro with ~900 GB available, and with 16 GB 2133 MHz LPDDR3 memory specs.

thermokarst · August 30, 2017, 11:12pm

Hi @Jenna_Shelton, thanks for posting your machine specs. It sounds like you are in fact running out of memory, which is a bummer. We have seen this with the SILVA database, it seems to need quite a bit more memory when compared to something like greengenes (FWIW, we develop QIIME 2 on machines spec'ed similarly to what you posted, and we have had the same problems crop up).

Moving forward, it sounds like you might need to run this step somewhere on a machine with more available memory. If you have an institutional cluster you could see if they could install QIIME 2 there for you (it is pretty straightforward). Another option, which might be quicker, is to launch one of our QIIME 2 Amazon Web Services instances (you will want to read up on AWS EC2 instances if you are unfamiliar with cloud computing, but basically you can rent some hardware from Amazon for pretty cheap, and we have ready-to-roll installations of QIIME 2 available there.

As a last-ditch effort, I am pinging @BenKaehler, the brains behind q2-feature-classifier, just to make sure there isn't some other trick available. Sorry I don't have better news for you .

BenKaehler · August 31, 2017, 1:16am

Hi @Jenna_Shelton, thanks @thermokarst.

I've just tried running these classifiers on my MacBook Pro that has identical memory to yours with the 2017.7 build.

I used the rep-seas.qza from the tutorial and the classifiers from the Data resources.

$ qiime feature-classifier classify-sklearn --i-classifier silva-119-99-515-806-nb-classifier.qza --i-reads rep-seqs.qza --o-classification blah.qza

and

$ qiime feature-classifier classify-sklearn --i-classifier silva-119-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification blah.qza --verbose

both ran to completion (with a few deprecation warnings). They both peaked out at less than 11GB of memory.

The rep-seas.qza that I used only contains 776 sequences. Is yours much larger than that? If so, you could try reducing chunk size further to, say, 776. If you get it running and performance is important you could increase that later.

The only other thing I can think of is that there might be something else on your system that is causing you to run out of memory. This seems unlikely, though, because my machine will happily use up ~40GB of memory (by using swap) before bad things start happening, and before that point you would observe that your machine becomes unresponsive (which is the first of the bad things).

So I guess check the version of qiime 2 that you're using, try reducing the chunk size again, try a different machine (it wouldn't have to be a very impressive machine if my laptop can handle it), and let us know how you go.

Jenna_Shelton · August 31, 2017, 10:58am

Hi @BenKaehler and @thermokarst,

Thanks for all of your help and suggestions!! My rep-seqs.qza file is 1.6 MB, so a lot larger than the one in the tutorial (145 samples with 10,000 - 90,000 sequences each).

I tried re-running classify-sklearn with a much lower chunk size (1,000), and it has been running for a half hour without "killing" itself, so I'll check back in and see if that runs to completion. If not, I'll try AWS or the cluster at my institution.

fstudart · September 5, 2017, 6:04pm

Hi, I got the same error (KILLED) when trying to build my own taxonomic classifier. No errors using greengenes (13_8_99). With SILVA, impossible to create the classifier using 99_otu_fasta or 97_utu_fasta in my computer (using only, 16s). However, I was able to create a classifier using 94_otu_fasta.

Here, I'm using virtual box with 8GB of memory.

Thanks,
FS.

thermokarst · September 5, 2017, 6:07pm

Hi @fstudart, please take a peek at the suggestions posed by myself & @BenKaehler in this thread for strategies to move forward. Thanks!