multithreading for deblur denoise-16S

Could the team please develop multithread versions of the slow steps of a regular 16S analysis.
I have launched 'deblur denoise-16S' yesterday on a 2.7GB gza file and it is now running 18h on part of a siongle cpu.
The data is not nearly all of the real data but about 1/10th of it. It seems the qiime pipeline will run for days and use only one core while I have 88 at reach.
I know that the dev team is small and the time counted but deblur github's doc says it can run multithreaded and I suspect that other plugins offer the same.
Thanks in advance for considering these improvements which would change the game for large datasets.

Hi @splaisan,

So is the QIIME 2 plugin — see the "--p-jobs-to-start" parameter for deblur denoise-16S.

In general, whenever an action can be parallelized, the parallelization option is exposed in QIIME 2 to capitalize on those benefits. But please by all means ask if in doubt, or alert us to actions that could be parallelized but are not.

Thanks Nicholas,
I overlooked this parameter as I was looking for a parameter with 'threads' in its name.
Does the number of jobs refers to max the number of samples present the qza or to parallel threads across all samples in the qza?
ie. I have 12 samples in my object, each of which has 1M read pairs
Can I specify '–p-jobs-to-start 88' to use all my cores or will it stop at 12 being th enumber of input fastq samples?

Hi @splaisan,

It will max out on the number of samples. However, you can control the number of jobs, and the number of threads per job, if running deblur directly (see deblur workflow --help).

Best,
Daniel

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.