I’m getting an error when I try to run unweighted unifrac in parallel.
The environment I’m using is a c3.2xlarge (8 cpu) EC2 instance of AMI 482e6430 (QIIME 2 release 2018.6). I’ve confirmed that there are 8 processor entries in the /proc/cpu file. The following commands on a fresh AMI produced the error message.
wget https://s3.amazonaws.com/jplab/projects/hmp/hmp500.json
wget https://s3.amazonaws.com/jplab/projects/hmp/hmp500.tre
qiime tools import --input-path hmp500.json --output-path hmp500.biom --type FeatureTable[Frequency] --source-format BIOMV100Format
qiime tools import --input-path hmp500.tre --output-path hmp500.tre --type Phylogeny[Rooted] --source-format NewickFormat
qiime diversity beta-phylogenetic-alt --i-table hmp500.biom.qza --i-phylogeny hmp500.tre.qza --p-metric unweighted_unifrac --output-dir qiime_out --p-n-jobs 8
Error Message:
Plugin error from diversity:
The value of n_jobs cannot exceed the number of processors (1) available in this system.
Debug info has been saved to /tmp/qiime2-q2cli-err-exchymn0.log
Contents of /tmp/qiime2-q2cli-err-exchymn0.log
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
results = action(**arguments)
File "<decorator-gen-312>", line 2, in beta_phylogenetic_alt
File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in _callable_executor_
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_diversity/_beta/_method.py", line 100, in beta_phylogenetic_alt
'processors (%d) available in this system.' % cpus)
ValueError: The value of n_jobs cannot exceed the number of processors (1) available in this system.
The command works if I replace beta-phylogenetic-alt
with beta-phylogenetic
and I can see it using multiple cores. However, for the larger data sets I’ll be working with, I’d much rather use the faster -alt
version if at all possible.
If you have any suggestions please let me know. Thanks!