Beta-phylogenetic-alt errors when run in parallel

dansmith01 · July 3, 2018, 8:35pm

I'm getting an error when I try to run unweighted unifrac in parallel.

The environment I'm using is a c3.2xlarge (8 cpu) EC2 instance of AMI 482e6430 (QIIME 2 release 2018.6). I've confirmed that there are 8 processor entries in the /proc/cpu file. The following commands on a fresh AMI produced the error message.

wget https://s3.amazonaws.com/jplab/projects/hmp/hmp500.json
wget https://s3.amazonaws.com/jplab/projects/hmp/hmp500.tre
qiime tools import --input-path hmp500.json --output-path hmp500.biom --type FeatureTable[Frequency] --source-format BIOMV100Format
qiime tools import --input-path hmp500.tre --output-path hmp500.tre --type Phylogeny[Rooted] --source-format NewickFormat
qiime diversity beta-phylogenetic-alt --i-table hmp500.biom.qza --i-phylogeny hmp500.tre.qza --p-metric unweighted_unifrac --output-dir qiime_out --p-n-jobs 8

Error Message:

Plugin error from diversity:

  The value of n_jobs cannot exceed the number of processors (1) available in this system.

Debug info has been saved to /tmp/qiime2-q2cli-err-exchymn0.log

Contents of /tmp/qiime2-q2cli-err-exchymn0.log

Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-312>", line 2, in beta_phylogenetic_alt
  File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
    output_types, provenance)
  File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_diversity/_beta/_method.py", line 100, in beta_phylogenetic_alt
    'processors (%d) available in this system.' % cpus)
ValueError: The value of n_jobs cannot exceed the number of processors (1) available in this system.

The command works if I replace beta-phylogenetic-alt with beta-phylogenetic and I can see it using multiple cores. However, for the larger data sets I'll be working with, I'd much rather use the faster -alt version if at all possible.

If you have any suggestions please let me know. Thanks!

ebolyen · July 3, 2018, 8:39pm

Hey @dansmith01,

@wasade, correct me if I'm wrong, but I think I recall that unifrac used the number of physical cores as its max.

With Intel processors you get hyperthreading which doubles the number of logical cores. So for an i7 processor (almost certainly what the c3.2xlarge is using) you'll have 4 CPUs which behave like 8.
So in your case, you need to set --p-n-jobs to 4 instead of 8.

dansmith01 · July 3, 2018, 9:02pm

Hi @ebolyen,

I tried reducing --p-n-jobs to 4, and am still getting the same error. If I understand the error message correctly, the script seems to only see 1 core on the system.

The value of n_jobs cannot exceed the number of processors (1) available in this system.

wasade · July 3, 2018, 9:03pm

@ebolyen, it does detect hyperthreading correctly on our metal. My guess is the number of available processors is not getting detected properly, and it looks like that detection is happening here using psutil.

@dansmith01, would it be possible to send the output from the following command?

$ python -c "import psutil; print(psutil.cpu_count(logical=False))"

Best,
Daniel

dansmith01 · July 3, 2018, 9:11pm

Sure thing, @wasade.

(qiime2-2018.6) qiime2@ip-172-31-43-115:~$ python -c "import psutil; print(psutil.cpu_count(logical=False))"
1

wasade · July 5, 2018, 6:45pm

@dansmith01, well that seems to be the problem, and that is unexpected. I've opened an issue with q2-diversity. In the near term, if you're using < 10,000 samples, a single thread should return results in under an hour depending on the complexity of the tree. One caveat with that performance guess though is that it is based off binaries that were compiled on the hardware the binary ran on, so it is possible there were some compile-time hardware specific optimizations. That being said, a single core will be much faster than beta-phylogenetic irrespective of hardware specific optimizations.

Best,
Daniel

dansmith01 · July 6, 2018, 2:48am

Thanks for the update, @wasade. I got multithreaded beta-phylogenetic-alt working on a local host, so I'll use that for now, and subscribed to the github issue so I'll know when to give ec2 another go.

system · August 6, 2018, 8:48am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.