feature-classifier extremely slow

James · October 27, 2022, 5:00am

Hello,
I am using a cluster server to run qiime2 feature-classifier
Core：56
Memory：184G

Command：
qiime2_exec bash -c
"qiime feature-classifier classify-consensus-vsearch
--i-query uniquesToFasta.qza
--i-reference-reads /work/{whoami}/silva-138-99-seqs.qza \ --i-reference-taxonomy /work/{whoami}/silva-138-99-tax.qza
--p-perc-identity 0.99
--p-query-cov 0.8
--p-threads 56 \
--o-classification taxonomy-vsearch-SILVA_138.qza
--o-search-results top-vsearch-SILVA_138.qza
--verbose"

But it took 8 hr to classify 1450 unique sequence.
Is that normal for regular analysis? Can I do any improvement to speedup the process?
I truly appreciate for any suggestions and helps.
Thank you.

Nicholas_Bokulich · October 27, 2022, 6:25am

Hello @James ,
This is indeed very unusual. 1450 sequences is not a large amount and should take under an hour to classify. You can see some runtime benchmarks in this paper:

The issue is this

Even though you have 56 cores, the amount of memory per core is not very high (~3G/core), so you are overloading the RAM on the individual jobs, causing individual jobs to take a very long time.

You would probably complete the job much faster on, say, 10-20 cores, to give enough RAM to individual jobs.

classify-consensus-vsearch can also take a little longer than other classification methods, as it is performing global alignment. You can adjust the max-accepts and max-rejects parameters to reduce the number of alignments that are performed. The classify-sklearn classifiers might be a little bit faster...

Good luck!

system · November 27, 2022, 12:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.