Dada2: qiime2 virtual machine vs RStudio performance

Hi dear users and developers,
I’ve noticed a performance problem in VM qiime2.

Especially Dada2 on both paired-end and single illumina reads (8 samples x ~200k reads per sample). This step took about 3-5 days (when testing PE, different paramteres) and about 2 days (when testing SE) which is far too long (8GB host’s RAM, from which 6GB dedicated to VM).

The same analysis was tested under windows (4 GB RAM) in Rstudio and took a few hours.

I was looking for an answer in this forum, but only found that some users has similar issues (too much time consuming analysis on VM).

I wonder if there is a possibility to run DADA2 outside the VM (RStudio, Windows for instance) and then move the results from DADA2 analysis to Qiime2 and run the rest of steps (especially classifier etc.) ?

edit:
Tested Qime2 VM’s: 2017:10, 2017:11, 2017:12 = each of them gave similar results

edit2:
Maybe the problem is not connected with Q2 VM instance itself, but rather with a) Windows settings, b) Virtual Box settings c) Allocating RAM resources ?

Hi @Jaroslaw_Grzadziel! There are a number of factors that could be contributing to the differences in performance you’re seeing:

  • You’re running QIIME 2 in a VM, so that will slow things down quite a bit over running the analyses in a native environment (that’s a general drawback of using VMs). You might try checking that the guest operating system has enough RAM and CPU allocated, but things will still be quite slower than a native installation.

  • In the 2017.12 release of QIIME 2, some performance enhancements were made, but the version of DADA2 available in QIIME 2 is still 2x-10x slower than running DADA2 directly from R or RStudio (see these issues for details: 1, 2).

Here are a couple of ideas to speed things up:

  • Run QIIME 2 in a native installation on Linux or macOS (i.e. not in a VM). If that’s not an option, you could use the QIIME 2 Docker image to run QIIME 2 on Windows with minimal performance overhead.

  • Use the --p-n-threads option with the qiime dada2 denoise-* methods to run things in parallel.

  • Use DADA2 from within R or RStudio (i.e. outside of QIIME 2), save the results, and import the feature table to continue analyses in QIIME 2.

Hope this helps!

2 Likes

Thank you @jairideout,

^ That issue explains everything, I was not aware of that.

^ This is the ideal solution, I will definately do this (when the new PC arrives)

^ This is the best solution for me, thank you very much for explanations and very helpful support.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.