But it took 8 hr to classify 1450 unique sequence.
Is that normal for regular analysis? Can I do any improvement to speedup the process?
I truly appreciate for any suggestions and helps.
Thank you.
Hello @James ,
This is indeed very unusual. 1450 sequences is not a large amount and should take under an hour to classify. You can see some runtime benchmarks in this paper:
The issue is this
Even though you have 56 cores, the amount of memory per core is not very high (~3G/core), so you are overloading the RAM on the individual jobs, causing individual jobs to take a very long time.
You would probably complete the job much faster on, say, 10-20 cores, to give enough RAM to individual jobs.
classify-consensus-vsearch can also take a little longer than other classification methods, as it is performing global alignment. You can adjust the max-accepts and max-rejects parameters to reduce the number of alignments that are performed. The classify-sklearn classifiers might be a little bit faster...