"Killed" Error When Running classify-sklearn in QIIME 2

Percy · March 15, 2025, 2:43pm

Hi everyone,
I’ve been encountering an issue with QIIME 2 while running the classify-sklearn command for taxonomy classification, and I’m hoping someone can help me figure this out. Thanks for your kindness for helping me doing the work.

1.My Setup:
CPU Intel i9-13900H (14 cores, 20 threads)
Memory 32 GB RAM
Disk: Over 100 GB free on Windows disk (D:)
Environment: WSL2 running Ubuntu 20.04 (with Python and QIIME 2 installed)

I ran the following command for taxonomy classification:

qiime feature-classifier classify-sklearn
--i-classifier /mnt/d/unite_ver10_dynamic_s_all_04.04.2024-Q2-2024.5.qza
--i-reads /mnt/d/rep-seqs.qza
--o-classification /mnt/d/taxonomy-ITS.qza
--verbose

However, the process gets abruptly terminated with the message: Killed(four times)

3 Observations:

My D: disk (used for WSL data) lost ~30 GB of space after the failure, but I can’t figure out what files were created or where the disk space went.
I don't know why it turned out killed,

I’d sincerely appreciate any insights you can provide into why this issue might be occurring and how to resolve it. If there are logs or troubleshooting steps I can follow to identify the cause, I’d be happy to provide any additional information needed.

Thank you so much for your time and help!

Best regards,
A student who trapped in qiime2

SoilRotifer · March 15, 2025, 10:56pm

I think you need to set WSL to have access to more memory. By default it only has access to half the system memory. See here.

Percy · March 16, 2025, 2:59pm

Thanks for your quick reply, I'll try your advice immediately

Percy · March 19, 2025, 11:55am

I apologize for interrupting you once again, but I continue to experience difficulties with my classifier workflow, and I would be extremely grateful for any further guidance you can offer.
I am running the following command on a rented Ubuntu cloud server (32 GB RAM, 16 cores):

qiime feature-classifier classify-sklearn
--i-classifier /root/unite_ver10_dynamic_s_all_19.02.2025-Q2-2024.10.qza
--i-reads /root/rep-seqs.qza
--o-classification /root/taxonomy-ITS.qza
--verbose

Despite ample system resources, the process is abruptly terminated with the message “Killed.”

Any insights into why this might be happening—or suggestions for additional troubleshooting steps—would be greatly appreciated.Thank you very much for your patience and continued assistance.
Kind regards,

colinbrislawn · March 20, 2025, 5:24am

Hello Percy,

Thank you for bringing your issue to the forums.

I built unite-train as an open-source example project and I'm pleased to see folks making use of it. I'm sorry this isn't working for you.

I also think it could be a RAM issue, so I've attempted to replicate the issue.

That's the largest version of that database including both singletons and fungi!

I've been running it for about 5 minutes, and it's already using, uh...

44 GB of memory

colinbrislawn · March 20, 2025, 5:25am

It used 52 GB when I ran it

(base) [cbrislawn@c0709a-s8 unite-train]$ /usr/bin/time -v \
>   qiime feature-classifier classify-sklearn \
>   --i-reads benchmarks/dada2-single-end-rep-seqs.qza \
>   --p-n-jobs 1 \
>   --i-classifier     results/${testfile}.qza \
>   --o-classification results/test/${testfile}.qza


Saved FeatureData[Taxonomy] to: results/test/unite_ver10_dynamic_s_all_19.02.2025-Q2-2024.10.qza
        Command being timed: "qiime feature-classifier classify-sklearn --i-reads benchmarks/dada2-single-end-rep-seqs.qza --p-n-jobs 1 --i-classifier results/unite_ver10_dynamic_s_all_19.02.2025-Q2-2024.10.qza --o-classification results/test/unite_ver10_dynamic_s_all_19.02.2025-Q2-2024.10.qza"
        User time (seconds): 457.35
        System time (seconds): 45.99
        Percent of CPU this job got: 97%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:34.67
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 51945816
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 2050
        Minor (reclaiming a frame) page faults: 494404
        Voluntary context switches: 23605
        Involuntary context switches: 7572
        Swaps: 0
        File system inputs: 1670808
        File system outputs: 137627880
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

"We are going to need a bigger boat"

Do you have access to HPC? HPC would have a machine with >64 GB of RAM.

The version without singletons is worth a try!
unite_ver10_dynamic_all_19.02.2025-Q2-2024.10 from GitHub

`unite_ver10_dynamic_all_19.02.2025-Q2-2024.10.qza` uses only 30 GB for me

(base) [cbrislawn@c0709a-s8 unite-train]$ /usr/bin/time -v \
>   qiime feature-classifier classify-sklearn \
>   --i-reads benchmarks/dada2-single-end-rep-seqs.qza \
>   --p-n-jobs 4 \
>   --i-classifier     results/${testfile}.qza \
>   --o-classification results/test/${testfile}.qza
Saved FeatureData[Taxonomy] to: results/test/unite_ver10_dynamic_all_19.02.2025-Q2-2024.10.qza
        Command being timed: "qiime feature-classifier classify-sklearn --i-reads benchmarks/dada2-single-end-rep-seqs.qza --p-n-jobs 4 --i-classifier results/unite_ver10_dynamic_all_19.02.2025-Q2-2024.10.qza --o-classification results/test/unite_ver10_dynamic_all_19.02.2025-Q2-2024.10.qza"
        User time (seconds): 530.85
        System time (seconds): 36.95
        Percent of CPU this job got: 168%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:37.01
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 30634632
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1729
        Minor (reclaiming a frame) page faults: 1246023
        Voluntary context switches: 58360
        Involuntary context switches: 3267
        Swaps: 0
        File system inputs: 4639032
        File system outputs: 80959848
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Percy · March 21, 2025, 9:32am

Thank you so much for your suggestion. Unfortunately, I don't have access to an HPC system or >64 GB of RAM at the moment—I'm limited to using a cloud server (maybe trying a larger one is necessary).

However, when I tried the version without singletons (unite_ver10_dynamic_all_19.02.2025-Q2-2024.10.qza), it didn't work with my 32 GB server; perhaps more RAM is necessary.

Additionally, how can I perform taxonomy classification on rare sequences without consuming too much RAM? Retaining these rare sequences is important for my analysis, so I'm in a bit of a bind.

Do you have any additional recommendations?

By the way I explored the use of the BLAST+ method for taxonomy classification of rare sequences. This approach fit my memory usage, making it suitable for my available resources.

Thank you again for your support and insights.

Best regards,
Percy

colinbrislawn · March 21, 2025, 10:26pm

That's what I was going to suggest next! I find the search based classifiers like blastn and vsearch work okay for my needs.

Unite with singletons and without singletons changes what's in the database, and I suppose how well it works on your input data. But all sequences from your input features will be retained with any Qiime2 classifier.

Percy · March 23, 2025, 12:42pm

Thank you, Sir. Additionally, I have borrowed a server with sufficient capacity and will inform you if the classifier works. Anyway, thanks for your kind help again!

system · April 23, 2025, 6:42pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.