problem with feature-classifier classify-consensus-vsearch

Ely · February 1, 2023, 9:56am

Hi everyone!
I recently started working with qiime2. I use the qiime2-2022.8 version on a conda environment on my computer that has the following specification: 64GB RAM, HD SSD 4TB.

In the past few days, I tried to analyze a set of data composed by the combination of two set of raw sequences that have been sequenced separately (in two different times).
After the step of denoise with dada2, I had a problem with the taxonomy assignment using feature-classifier classify-consensus-vsearch and silva 138.

Here is the command:
qiime feature-classifier classify-consensus-vsearch --i-query denoised_sequences.qza --i-reference-reads silva-138-99-seqs.qza --i-reference-taxonomy silva-138-99-tax.qza --p-threads 18 --o-classification taxonomy.qza --o-search-results tophits.qza --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/qiime2/adminlinux/data/aeb7935e-88a0-4436-998d-80bf2b1c78c6/data/dna-sequences.fasta --id 0.8 --query_cov 0.8 --strand both --maxaccepts 10 --maxrejects 0 --db /tmp/qiime2/adminlinux/data/a7432d0f-b5f7-409f-9daf-cd33db5de53f/data/dna-sequences.fasta --threads 18 --output_no_hits --blast6out /tmp/q2-BLAST6Format-qgbkc739

vsearch v2.21.1_linux_x86_64, 62.6GB RAM, 24 cores

I tried several time to submit this command but I had always the same results:

636194718 nt in 436680 seqs, min 900, max 3983, avg 1457
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 28760 of 28837 (99.73%)
Killed

Someone can help me to understand what is the problem of this command e why of the "killed"session?
Is there any error message that could help me to understand?

thank you!!

crusher083 · February 1, 2023, 10:06am

Hello, welcome to QIIME 2 Forum :qiime2:

To make a classification, Silva database should be loaded into RAM, and then the sequences are searched against the database, which is ~6GB of RAM per instance, but I don't know how it scales.
In your command there is request for 18 simultaneous processes (threads). The amount of RAM might be insufficient.

Reduce the number of threads to 10 and control RAM load with htop or any other utility.

Cheers
Valentyn

Ely · February 1, 2023, 10:49am

Thanks Valentyn,
I will try to use only 10 threads.
cheers,
Eliana!

Ely · February 7, 2023, 8:06am

I tried the same step with 5 or 10 threads but unfortunately it didn't works.
do you have any other suggestions?

Nicholas_Bokulich · February 14, 2023, 7:32am

Hi @Ely ,
Maybe just try 1-4 threads? One thread will take longer, but consume much less RAM (which is why your job is being killed when you use multiple threads).

Ely · February 14, 2023, 7:57am

Hi Nicholas,

I tried with 1 thread and after 3 days the process has killed.
I solved the problem using the qiime version 2022.2 and using 10 threads.
I still don't know why the qiime-2022.8 gave me this type of problem but at least I managed to finish this set of analysis.

Thanks for your help!!!
Eliana.

Nicholas_Bokulich · February 14, 2023, 9:19am

Hi @Ely ,

Thanks for following up with these details. That's very interesting that you only observed this issue with 2022.8 but not 2022.2. We will investigate further.

system · March 17, 2023, 3:20pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.