Memory problem using dada2 in amazon web services AMI

ese · May 8, 2020, 2:03pm

I am trying to use qiime as installed in aws (version qiime2-2019.10, I also tried version 2020, and gives the same error). When I try to run the dada2 I always get a message saying that it cannot allocate memory, even using just a few samples for testing (6 samples 200000 16S rRNA seqs). I used the same command in a Mac, and it was no problem. Does this mean that the dada2 cannot be used with qiime2 in AWS? Also deblur gives another error (‘Cannot index database file %s’ % db), so denoising seems not possible to run using the community AMI in AWS. Command and verbose message:

qiime dada2  denoise-paired --i-demultiplexed-seqs /home/qiime2/seqs.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 240 --p-trunc-len-r 220 --p-max-ee-f 2 --p-max-ee-r 2 --p-trunc-q 2 --p-chimera-method consensus --p-min-fold-parent-over-abundance 1 --p-n-threads 0 --p-n-reads-learn 10000000 --p-hashed-feature-ids --o-table /home/qiime2/data/red_table.qza --o-representative-sequences /home/qiime2/repseqs.qza --o-denoising-stats /home/qiime2/denoisestats.qza --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpq0_zr8zd/forward /tmp/tmpq0_zr8zd/reverse /tmp/tmpq0_zr8zd/output.tsv.biom /tmp/tmpq0_zr8zd/track.tsv /tmp/tmpq0_zr8zd/filt_f /tmp/tmpq0_zr8zd/filt_r 240 220 0 0 2.0 2.0 2 consensus 1.0 0 10000000

R version 3.5.1 (2018-07-02) 
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4 
1) Filtering Error: cannot allocate vector of size 95.4 Mb
Execution halted
Warning message:
system call failed: Cannot allocate memory 
Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 257, in denoise_paired
    run_commands([cmd])
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
    subprocess.run(cmd, check=True)
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmpq0_zr8zd/forward', '/tmp/tmpq0_zr8zd/reverse', '/tmp/tmpq0_zr8zd/output.tsv.biom', '/tmp/tmpq0_zr8zd/track.tsv', '/tmp/tmpq0_zr8zd/filt_f', '/tmp/tmpq0_zr8zd/filt_r', '240', '220', '0', '0', '2.0', '2.0', '2', 'consensus', '1.0', '0', '10000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py", line 328, in __call__
    results = action(**arguments)
  File "</home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-459>", line 2, in denoise_paired
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
    output_types, provenance)
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 272, in denoise_paired
    " and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

  An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

thermokarst · May 8, 2020, 2:17pm

Hi @ese!

What "Instance Type" did you allocate for these EC2 instances? You can find that info on the AWS Console -> EC2 -> Instances.

No, it doesn't mean that - we use AWS to run these steps pretty often - the important thing is to allocate an EC2 instance with sufficient capacity:

It sounds to me like perhaps you allocated an EC2 instance with not nearly enough memory to do anything "useful" with it - I will provide additional guidance once you respond with the "Instance Type" you have used.

Thanks!

ese · May 8, 2020, 2:33pm

Hi Matthew. It has been my first experience with AWS, so I might have done something wrong. I am using the free tier offered by the company, and the instance type is t2.micro, which was the one offered for that. What instance type is recommended?
Thanks

thermokarst · May 8, 2020, 2:39pm

Hi @ese!

Cool! It is a pretty cool service, they have a lot of different features.

According to the docs, a t2.micro has 1 CPU, and 1 GB of memory, which is pretty low. By comparison, my phone (which is at least 3 years old) has 8 CPUs and 6 GB of memory.

At minimum a "moving pictures" (link) style workflow will need 4 GB of memory, but realistically you will want 8 GB or 16 GB. One thing to keep in mind, those levels of service cost more money per hour, but, because they run faster, you don't have to run as long, so there is usually a sweet-spot that you can identify. Also worth noting - none of the money that you pay to AWS goes to us (this is a question that has come up before). Renting hardware from Amazon is a transaction strictly between you and Amazon.

Keep us posted!

ese · May 8, 2020, 2:55pm

Thanks, I’ll do that

system · June 8, 2020, 8:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.