multithreading for deblur denoise-16S

splaisan · November 24, 2020, 8:17am

Could the team please develop multithread versions of the slow steps of a regular 16S analysis.
I have launched 'deblur denoise-16S' yesterday on a 2.7GB gza file and it is now running 18h on part of a siongle cpu.
The data is not nearly all of the real data but about 1/10th of it. It seems the qiime pipeline will run for days and use only one core while I have 88 at reach.
I know that the dev team is small and the time counted but deblur github's doc says it can run multithreaded and I suspect that other plugins offer the same.
Thanks in advance for considering these improvements which would change the game for large datasets.

Nicholas_Bokulich · November 24, 2020, 8:46am

Hi @splaisan,

So is the QIIME 2 plugin — see the "--p-jobs-to-start" parameter for deblur denoise-16S.

In general, whenever an action can be parallelized, the parallelization option is exposed in QIIME 2 to capitalize on those benefits. But please by all means ask if in doubt, or alert us to actions that could be parallelized but are not.

splaisan · November 24, 2020, 1:27pm

Thanks Nicholas,
I overlooked this parameter as I was looking for a parameter with 'threads' in its name.
Does the number of jobs refers to max the number of samples present the qza or to parallel threads across all samples in the qza?
ie. I have 12 samples in my object, each of which has 1M read pairs
Can I specify '–p-jobs-to-start 88' to use all my cores or will it stop at 12 being th enumber of input fastq samples?

wasade · November 30, 2020, 4:56pm

Hi @splaisan,

It will max out on the number of samples. However, you can control the number of jobs, and the number of threads per job, if running deblur directly (see deblur workflow --help).

Best,
Daniel

system · January 1, 2021, 12:03am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.