How to Choose among the Methods?

VHertzb · April 26, 2017, 3:48pm

For my own microbiome projects I’ve been using R packages, specifically dada2, phyloseq, and I am now starting to venture into DESeq2 territory and beyond. I made this choice from among the options of R, QIIME2, and mothur for 2 reasons: 1. I am more comfortable with R than Python or the command line (and I {heart} RStudio); 2. In the dada2 paper the dada2 pipeline performs better than the alternatives. I’ve been teaching our nursing doctoral students in R, and I am wary of switching environments. That said, I do go in and play with QIIME2 occasionally.

Now I notice that QIIME2 has incorporated a dada2 step, so I’m thinking that QIIME2 should be getting close to the performance of dada2 as far as identification of true sequence variants rather than “mere” OTU’s.

We have several other studies here that are in the process of generating a lot of microbiome data. Although I am not the statistician on some of them, I am on one big one, and so what I recommend carries weight. So far these have been using grad students in genetics to do some of the processing and they have been trying out different pipelines. However we are now at a point where we must choose a direction and be able to justify that choice.

Anyway, I am now in a position of recommending a direction. I have chosen for my own work to go the dada2/phyloseq/etc route, but I am not the only one with considerations here. Given all of that, are there any other considerations for choice of one pipeline over the other? For instance, dada2 uses Illumina’s Qscores for base-calling, do the other methods (although I imagine that QIIME2 does so now as well).

gregcaporaso · April 27, 2017, 4:14pm

Hi @VHertzb,
Replies to your questions are below.

This step (qiime dada2 denoise-single or qiime dada2 denoise-paired, both of which can be run in multi-threaded mode) are literally calling the DADA2 R package. The developer of that (@benjjneb) wrote the R scripts that are included in the q2-dada2 plugin. So, this should be performing exactly the same as the DADA2 R package. Let us know if you're observing something different as that would be unexpected.

This was one of our first QIIME 2 plugins, so the functionality has been available for a while. We used the denoise-single method at our workshop in November, though at that time we didn't have paired-end support yet, so the method was just called denoise.

You're obviously going to get my biased opinion here, but QIIME 2 offers some pretty great advantages.

First, the fact that it is interface agnostic means that different users will be able to use the interface type that they'll be able to work most effectively with. For example, biologist end users can used the QIIME Studio GUI, power users can use the command line, and data scientists can use the API in Python terminals or Jupyter notebooks.

Next, the integrated provenance tracking is going to make it much easier for users (and their bosses) to track what they did and report their methods, and that will help with reproducibility of microbiome bioinformatics. To refresh yourself on this, take a look at this taxonomy plot and click the 'Provenance' tab on the top right. Click on the boxes and the circles within the boxes in that network diagram, and you'll see that every QIIME 2 step, beginning with importing of fastq files, has been automatically tracked.

Finally, we're integrating the latest and greatest methods (e.g., DADA2) and building interactive visualizations using the latest web technologies (e.g., that taxonomy plot that I linked you to). The idea here is that this can become a platform where methods developers can make their new methods and visualizations accessible as QIIME 2 plugins, to get the latest and greatest tools to users as quickly as possible, and not have to spend time worrying about building interfaces, tracking provenance, etc, as QIIME 2 takes care of that for you. Our plugin developers are starting to see the power in this as we progress through our alpha release stage.

And, regarding our alpha release stage, you might be interested in this post: Should I be using QIIME 2 while it's in alpha?.

All of this said, we do still expect that users are going to want to get their data out of QIIME 2 still for customized analysis in R or with tools that are not available as QIIME 2 plugins. We always want to support that, and have recently adding a new tutorial on exporting data that facilitates that. So, I don't want to give the impression that there will be one "master pipeline" that will replace all of the others. Because every microbiome study is different, there are probably always going to be some custom steps in the process of getting an analysis to publication.

Yep, QIIME 2 does that since it's using DADA2 directly.

system · May 28, 2017, 10:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.