Specifying how many threads to run on with qiime2 feature-classifier extract-reads

aalex · June 5, 2019, 10:08pm

Hello!

I've been trying to run the feature-classifier extract-reads so I can prepare to train my classifier on some COI reference sequences. The problem I ran into is that it took over a week, and had not yet been completed. For previous versions of Qiime2, I noticed that some people could add an extra parameter ( --p-n-threads) to specify how many threads should be used. However, in trying to do that now (with my current version of Qiime2, qiime2-2019.1) I get an error stating: no such option: --p-n-threads.

Is this no longer a feature in the version I possess? How can I accommodate this?

colinbrislawn · June 5, 2019, 11:49pm

Hello Andrea,

Great question!

When I look at the documentation for feature-classifier extract-reads, I can see that this particular plugin does not have the --p-n-threads (you didn't miss it!), and so I guess it doesn't support multithreading.

Maybe you could make this multithreaded yourself, by dividing your FeatureData[Sequence] into several parts, then running extract-reads all at once on all those different parts?

Colin

ebolyen · June 6, 2019, 3:00am

And then make sure to merge again!

That said, I suspect the reason this doesn't have the n-threads is that it's mostly IO-bound, so adding CPUs doesn't make it read from the hard drive any faster, so I don't know if splitting and merging is worth the effort here.

aalex · June 11, 2019, 5:12pm

I did something to confirm whether or not this operation is IO-bound (to the best of my ability).

Using the command:
> sudo iotop
And I don't observe that anything is being significantly used, and that using
> top
shows that nearly 100% of the CPU is being used. I'm taking this as an indication that it is CPU-bound. Is there somewhere I can confirm this in the documentation before I try splitting and then later merging my files?

colinbrislawn · June 11, 2019, 5:56pm

I think this is solid evidence that this step is CPU bound. Good detective work!

Estimating bottlenecks is hard, which is why we don't usually mention how much RAM, CPU, or IO is needed for a specific step. So your first hand observation better than the best documentation

ebolyen · June 11, 2019, 6:42pm

I created an issue for this. No real timeline whatsoever, but it looks like something we should be able to do.

system · July 13, 2019, 12:42am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.