qiime2R error GreenGenes2 tree

liekekuiper · October 25, 2023, 2:41pm

When running the following script in R:

library("qiime2R")
tree = read_qza("2022.10.taxonomy.asv.nwk.qza") greengenes2 tree

I keep getting this error. How would could we avoid this?

Error in scan(file = file, what = "", sep = "\n", quiet = TRUE, skip = skip, :
could not allocate memory (2048 Mb) in C function 'R_AllocStringBuf

colinbrislawn · October 25, 2023, 4:24pm

Good afternoon,

This sounds like a memory / RAM limitation.

How much memory does your computer or VM have?
Do you have access to any computers or HPC with more memory?

liekekuiper · October 25, 2023, 6:07pm

I have assigned 64GB to this, so that shouldn't cause any issue... But you haven't encountered this before?

colinbrislawn · October 25, 2023, 6:10pm

Something strange is going on, because...

64GB is plenty, but it's clearly not being found.

Are you running this on a VM or an HPC slurm queue? Do any other clues come to mind?

liekekuiper · October 25, 2023, 7:03pm

A HPC slurm queue; The only thing I could imagine is there being an issue with the artifact, but that doesn't make any sense as 1) it is the designated tree and 2) I was able to use the artifact in all steps so far...

colinbrislawn · October 26, 2023, 1:27pm

Ah, are you submitting jobs to the slurm queue or running with a workflow manager like Snakemake or NextFlow?

Can you post the code / template you are using to submit jobs?

(I ask because I myself have carefully crafted a submission scripts to run 64 threads in parallel, only to realize that I was actually getting 1/64th of the CPU I wanted. Slurm is tricky )

liekekuiper · October 27, 2023, 11:41am

Yes, you are right, slurm can be full of surprises...
I used the following scripts:

sbatch physeq.sbatch

With physeq.sbatch being:

#!/bin/bash

#SBATCH -N 1

#SBATCH -c 1

#SBATCH --time 1:00:00

#SBATCH --mem=64G

#SBATCH -J physeq

set -x

set -e

Rscript --vanilla physeq.R

And physeq.R being:

library("phyloseq")

library("qiime2R")

read in phyloseq objects from qiime

physeq<-qza_to_phyloseq(

features="features.gg2.qza",

tree="2022.10.phylogeny.asv.nwk.qza",

taxonomy = "gg2.taxonomy.qza",

metadata = "../Metadata.txt")

write_phyloseq(physeq, type = "all", path = getwd())

Resulting in:

set -e
Rscript --vanilla physeq.R

Error in scan(file = file, what = "", sep = "\n", quiet = TRUE, skip = skip, :

could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'

Calls: qza_to_phyloseq -> read_qza -> read.tree -> scan

Execution halted

And as shown above I receive the same error when reading in read_qza("2022.10.taxonomy.asv.nwk.qza")

Do you have any clue what I am doing wrong?

colinbrislawn · October 29, 2023, 4:01am

Your slurm setup looks okay!

#SBATCH -N 1
#SBATCH -c 1

With one Node and one processor Core, this job will use a single CPU core on a single node for its execution, fine for simple R.

#SBATCH --mem=64G

And this gives you 64 Gigs. Plenty!

You are doing everything right.

Have you asked the HPC team about this? They would have more tools to troubleshoot...

That tree has trees >20M vertices. You may need more RAM.
(The HPC could help with that too!)

wasade · October 30, 2023, 11:12pm

Hi @liekekuiper,

What actions in phyloseq would you like to perform?

It's plausible that it's underlying tree data structure isn't suited for the size of the Greengenes2 tree. If that's the case, subsetting the phylogeny to the feature table prior to load into R may work. qiime phylogeny filter-tree could work for that though I haven't evaluated it directly for use with the current tree.

Best,
Daniel