Long runtime feature-classifier classify-sklearn

Guillaume · November 8, 2021, 2:52pm

Dear everyone,

I'm trying to run a taxonomic classification with feature-classifier classify-sklearn, thank to the silva database, and I encounter long runtime issues.

For example, I'm currently testing on a subset of 8 first sequences coming from the "moving picture" tutorial (from file rep-seqs.qza), and it has been running for more than 3 hours without success. I have 8 cpus and 32Gb of RAM available and all ressources are used non-stop without any errors.

here is the command I used :

qiime feature-classifier classify-sklearn \
--i-classifier /home/qiime2/tutorial/silva-138-99-515-806-nb-classifier_qiime.qza \
--i-reads /home/qiime2/tutorial/sequence_table_subset_8ASV.qza \
--o-classification /home/qiime2/tutorial/taxonomy_silva.qza \
--p-n-jobs -1

I understand from this benchmark that it should take that long : benchmark `classify-sklearn` parallelization · Issue #97 · qiime2/q2-feature-classifier · GitHub

I'm not sure where to look for to know what the problem could be, any advice would be very helpful !

Thank you in advance for your help,

Guillaume

Keegan-Evans · November 9, 2021, 12:22am

@Guillaume,

It sounds like the job is running but taking a long time? It is hard to give an exact estimate how long it will take to run a particular job on given machine.

Setting --p-n-jobs -1 seems like the best way to ensure that the job will run as quickly as possible, however it can can actually cause things to run slower. You may want to try setting --p-n-jobs to -2 or -3 and see if this helps things.

You could also play with --p-reads-per-batch. By default it is set at 20000 or #of-query-sequences/n-jobs, which ever is less, but you can play with setting this manually.

Long runtimes unfortunately are part of using classify-sklearn and it sounds like that is what you are encountering

Guillaume · November 9, 2021, 11:09am

Hi Keegan,

Thank you very much for your help!

Unfortunately I had already been testing several combination for --p-n-jobs and --p-reads-per-batch without much success.

My run for analysing 8 sequences has now been running for more than 24h, I'm not sure if I should expect it to end at some point.

I have tested with only 2 sequences, it took 10mn, with 4 sequences it took 4 hours, it seems like there is some sort of unexpected exponential runtime explosion with the number of query sequences.

I don't know if three is any relation with this problem but previously I had a space disk error, which was solved by doing :
export TMPDIR='/home/qiime2/tmp'
export JOBLIB_TEMP_FOLDER='/home/qiime2/tmp'

(I'm working on an AWS instance with the latest QIIME2 AMI)

Keegan-Evans · November 9, 2021, 10:19pm

@Guillaume,

Can you try running it again with --p-n-jobs 1? Each job takes 4-8 GB of RAM just to load the SILVA classifier, meaning that with all 8(or even 7) cores running a job, you are going to end up with a lot of RAM being used/some kind memory error occurring. Though it is strange that the process is not being killed and the error reported

If this doesn't help things, try running it locally instead of on AWS, it should not be taking that long.

Guillaume · November 19, 2021, 9:52am

Sorry for my late response.
I've tried with --p-n-jobs 1 and it seems to work. Indeed it seems like a ram problem when too many cpus are used
thanks a lot for your help !

Keegan-Evans · November 19, 2021, 2:44pm

@Guillaume,

Of course! Glad it work for you!

system · December 20, 2021, 8:45pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.