For my own microbiome projects I’ve been using R packages, specifically dada2, phyloseq, and I am now starting to venture into DESeq2 territory and beyond. I made this choice from among the options of R, QIIME2, and mothur for 2 reasons: 1. I am more comfortable with R than Python or the command line (and I {heart} RStudio); 2. In the dada2 paper the dada2 pipeline performs better than the alternatives. I’ve been teaching our nursing doctoral students in R, and I am wary of switching environments. That said, I do go in and play with QIIME2 occasionally.
Now I notice that QIIME2 has incorporated a dada2 step, so I’m thinking that QIIME2 should be getting close to the performance of dada2 as far as identification of true sequence variants rather than “mere” OTU’s.
We have several other studies here that are in the process of generating a lot of microbiome data. Although I am not the statistician on some of them, I am on one big one, and so what I recommend carries weight. So far these have been using grad students in genetics to do some of the processing and they have been trying out different pipelines. However we are now at a point where we must choose a direction and be able to justify that choice.
Anyway, I am now in a position of recommending a direction. I have chosen for my own work to go the dada2/phyloseq/etc route, but I am not the only one with considerations here. Given all of that, are there any other considerations for choice of one pipeline over the other? For instance, dada2 uses Illumina’s Qscores for base-calling, do the other methods (although I imagine that QIIME2 does so now as well).