q2-phylogenize error: BURST failed with error code

Hi Everyone!
I'm also having some issues with this plugin. I was able to install the plugin with no problems. However, when I try to run:

looking for file: /tmp/qiime2-temp-8rktczvl/input.biom
located biom file: /tmp/qiime2-temp-8rktczvl/input.biom
Calling BURST with arguments: -r /home/fstudart/anaconda3/envs/qiime2-2019.10/lib/R/library/phylogenize/extdata/16s_renamed.frn -fr -q /tmp/qiime2-temp-8rktczvl/input_seqs.txt -i 0.985 -o /tmp/qiime2-temp-8rktczvl/output_assignments.txt
sh: 1: /usr/local/bin/burst12: not found
Quitting from lines 40-110 (phylogenize-report.Rmd)
Error in pz.error(paste0("BURST failed with error code ", r)) :
BURST failed with error code 127
Calls: ... read.abd.metadata -> process.16s -> run.burst -> pz.error
In addition: Warning messages:
1: In dir.create(pz.options("out_dir")) :
'/tmp/qiime2-temp-8rktczvl' already exists
2: In dir.create(pz.options("out_dir")) :
'/tmp/qiime2-temp-8rktczvl' already exists
3: In dir.create(pz.options("out_dir")) :
'/tmp/qiime2-temp-8rktczvl' already exists
4: In strsplit(conditionMessage(e), "\n") :
input string 1 is invalid in this locale
5: In pz.warning(paste0("sample column not found: ", opts("sample_column"), :
sample column not found: SampleID; assuming row names are sample IDs
6: In system2(file.path(opts("burst_dir"), opts("burst_bin")), args = burst_args) :
error in running command

Execution halted
Traceback (most recent call last):
File "/home/fstudart/anaconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_phylogenize-0+untagged.17.g4354584.dirty-py3.6.egg/q2_phylogenize/_phylogenize.py", line 150, in _run
Rcmd], check=True)
File "/home/fstudart/anaconda3/envs/qiime2-2019.10/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['R', '-e', 'sapply(c("phylogenize", "graphics", "stats", "methods","grDevices", "biomformat"), function(.) library(character.only=TRUE, .));phylogenize::set_data_internal(); phylogenize::render.report(output_file="index.html", report_input="phylogenize-report.Rmd", input_format="biom", biom_file="input.biom", burst_dir="/usr/local/bin", in_dir="/tmp/qiime2-temp-8rktczvl", out_dir="/tmp/qiime2-temp-8rktczvl", ncl=1, type="16S", which_phenotype="prevalence", which_envir="Saliva", dset_column="dataset", env_column="Quality", sample_column="SampleID", burst_cutoff=0.985, assume_below_LOD=TRUE, single_dset=TRUE, minimum=3, treemin=5, relative_out_dir=".", working_dir="/tmp/qiime2-temp-8rktczvl")']' returned non-zero exit status 1.

In my case, I'm using Linux 19.10 through VirtualBox (host OS: Windows 10).
I'm also including the output of library(phylogenize):

library(phylogenize)
Loading required package: phylolm
Loading required package: ape
Loading required package: settings
Loading required package: Matrix
Loading required package: tidyverse
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
:heavy_check_mark: ggplot2 3.2.1 :heavy_check_mark: purrr 0.3.3
:heavy_check_mark: tibble 2.1.3 :heavy_check_mark: dplyr 0.8.3
:heavy_check_mark: tidyr 1.0.0 :heavy_check_mark: stringr 1.4.0
:heavy_check_mark: readr 1.3.1 :heavy_check_mark: forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
:heavy_multiplication_x: tidyr::expand() masks Matrix::expand()
:heavy_multiplication_x: dplyr::filter() masks stats::filter()
:heavy_multiplication_x: dplyr::lag() masks stats::lag()
:heavy_multiplication_x: tidyr::pack() masks Matrix::pack()
:heavy_multiplication_x: tidyr::unpack() masks Matrix::unpack()
Loading required package: ggtree
ggtree v1.14.6 For help: ggtree

If you use ggtree in published research, please cite the most appropriate paper(s):

  • Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36, doi:10.1111/2041-210X.12628

  • Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, accepted. doi: 10.1093/molbev/msy194

Attaching package: ‘ggtree’

The following object is masked from ‘package:tidyr’:

expand

The following object is masked from ‘package:Matrix’:

expand

The following object is masked from ‘package:ape’:

rotate

Loading required package: biomformat
Loading required package: functional
Loading required package: future
Loading required package: furrr

I'm not an expert, but how it seems that I'm missing the burst dependencies, right? Ho can I download them?

Thanks in advance,
FS

1 Like

Hi @fstudart,
I am pinging the developer @pbradz to see if he can offer guidance.
Thanks!

Thanks Nicholas, and thanks Fernando for trying out phylogenize!

You’re correct, that error typically means you need to install BURST. You can get that from the Knights lab’s GitHub here: BURST. After downloading the binaries you can either put them in the directory phylogenize suggested or pass a different directory containing the BURST binaries as an option for the plugin. Let me know whether this solves your problem.

Best, Patrick

2 Likes

Hi Patrick!
I was able to install the BURST binaries, and now it seems to be running. It has been like this for almost one hour.
So far, the output is showing this:
firmicutes
removing zero-variance genes (1): 0.00% removed
actinobacteria
removing zero-variance genes (39): 0.09% removed

I know this is not completely related to my post, but, if allowed (I apologize for that), I have a follow-up question:
As I have two environments: BAL (bronchoalveolar lavage) and bronchial brushings, would I have to run phylogenize twice? The Quality column has two categories: BAL and Brushings.

–p-which-envir BAL --p-env-column Quality

and after this, I’d run now one more time like this:
–p-which-envir Bushings --p-env-column Quality

Thanks very much all your support,
FS

Hi Fernando, it looks like phylogenize is working as intended! To get genes associated with each environment you will need to run it twice as you mentioned. If you have enough memory you can potentially speed up the run by allocating more cores (by default it uses one, so that users are less likely to get out of memory errors). Good luck!

1 Like

Hi Patrick,

Thanks for developing this great plugin. Very useful output.
Sorry for bugging you again. One more question: The genes associated with each env are based on the ASV table and rep-seqs file, using the metadata file to identify which samples are related to each env.
As there is no direct comparison between different envs, this means that a specific gene may be significantly associated with more than one env, right?

Thanks,
FS

Hi Fernando, great, very glad to hear it was useful!

As far as the associated genes go, yes, you’re correct, a given gene can be associated with more than one environment. If you’re using the “prevalence” phenotype, then there is no direct comparison between environments as you said. If you’re using “specificity”, you might still expect to see this when there are more than two environments, since specificity compares one environment against all others – so if there are similar environments their scores should also be similar.

If there are only two environments in the sample, I would not expect to see many of the same genes associated with specificity – but it is technically possible because of the regularization phylogenize uses. Because phylogenize adds a sparsity constraint that is optimized separately per environment, different taxa may effectively get “shrunk” to the baseline. This means specificity for environment A may not be perfectly anticorrelated with specificity for environment not-A, in which case you could definitely get some genes that were an okay match to both specificity phenotypes.

I would expect to see this most often when sample sizes are low and the differences between environments are subtle, which is when it will be the hardest to detect a signal and when sparsity would be the highest. A useful diagnostic might be to look at your phenotype distributions on the tree. If you go to the “only mapped/observed taxa” tab and, mousing over the taxa, you see that only a few of them have different values from the rest, then I would be more cautious about interpreting the results. You can also get a sense for what types of results you’re getting by looking at the tree heatmaps.

Hope this is helpful!

Hi Patrick,

Thanks very much!

FS

1 Like