Hello everyone,
Background
I'm analyzing microbiome sequencing data of gastric cancer patients in South Korea. Basically, we have sequencing data from four samples (normal tissue, tumor tissue, gastric fluid, and stool) for each patient (N=17). Thanks to QIIME 2, I have been able to make many interesting -- and biologically sound -- observations. For example, when I made taxonomy bar plots, I observed that certain bacteria are clearly differentially abundant between normal tissue vs. tumor tissue (sorry I can't share the plots since this work has not been published yet).
The issue
Encouraged by the above observation, I proceeded to perform differential abundance analysis between normal tissue vs. tumor tissue using ANCOM in QIIME 2. Contrary to my expectation, however, ANCOM returned no significant hits, which was very surprising to me because my taxonomy bar plots were telling a very different story. Before anyone asks, I did filter the ASV table so that it only contains samples from normal tissue and tumor tissue. In addition, I did add pseudocount to the ASV table and also tried collapsing the ASV table at different taxonomic ranks (e.g. genus).
Even though the ANCOM result was disappointing, because I was so convinced that there was differential abundance between normal tissue vs. tumor tissue (based on the taxonomy bar plots), I decided to manually perform paired testing (i.e. the Wilcoxon signed-rank test) for the top five most abundant taxa in the samples. This time, the top two most abudant taxa returned as statistically significant with p-value = 0.000839 and p-value = 0.003845. Of note, those p-values are not adjusted for multiple testing.
My current hypothesis for why ANCOM did not produce any significant hits earlier is because ANCOM did not use paired testing. In other words, I think if I were somehow able to perform ANCOM with paired testing, ANCOM would have returned significant hits (i.e. increased statistical power). This is where my train of questions leaves from the station.
The questions
Q1. Is it possible to perform ANCOM with paired testing?
I've been searching in this forum for the answer, but I'm getting apparently mixed signals (probably because some posts are older than others). Here is the list of relevant posts I found in the forum:
- Pairwise ANCOM and Gneiss, filtering and interpretation
- Pairwise testing of ANCOM results
- Taxa abundance analysis
- Compare more taxa simultaneously using pairwise-difference
Generally speaking, so far, I'm getting the impression that:
-
Currently, it is NOT possible to perform ANCOM with paired testing within QIIME 2. For example, according to the post Taxa abundance analysis, @mortonjt wrote (I can't seem to directly quote his remark for some reason):
@jairideout is right - ANCOM doesnât support pvalues or pairwise comparisons. But you can get the W-statistic, which is a proxy for the statistical significance.
-
Some say paired testing is supported in ANCOM-II in R, but I could not find any reference or documentation which says this is possible. For example, @Nicholas_Bokulich wrote in the post Compare more taxa simultaneously using pairwise-difference:
Instead, use ANCOM. The ANCOM action currently in QIIME 2 does not allow paired testing, but you can use ANCOM or DESeq2 in R to performed paired tests.
Q2. Are there alternative tools for differential abundance analysis that support paired testing?
While looking for answers for Q1, I came across a number of different tools that could potentially be used instead of ANCOM. Of course, I understand that these tools do not necessarily output the same type of differential abundance analysis as ANCOM.
Here is the list of alternative tools I found so far:
-
There is this
qiime longitudinal pairwise-differences
command which I didn't know before, but apparently runs paired testing for one specific taxon at a time. I reckon this runs the Wilcoxon signed-rank test under the hood, similar to what I did manually above. However, it seems like you're not supposed to abuse this command and run this for all taxa in your dataset because @Nicholas_Bokulich wrote in the post Compare more taxa simultaneously using pairwise-difference:Not possible in QIIME 2. We do not make this more convenient because the Wilcoxon test used in that action is not really appropriate for compositional microbiome data (has a high false-positive error rate).
-
There is a QIIME 2 plugin called Gneiss which performs differential abundance analysis using what's known as "balances". From my understanding, this is fundamentally different from ANCOM but is still useful for exploring differential abundance between two or more groups. However, it seems like Gneiss does not support pairwise comparison either according to the post Pairwise ANCOM and Gneiss, filtering and interpretation. In this post, @mortonjt wrote:
With Gneiss, if you have n groups, you can run n-1 tests (by keeping one category as a reference) - but not pairwise comparisons (see explanation here ). I believe this is also the case with ANCOM2 since it is also using linear models underneath the hood.
-
There are some other tools that got mentioned along my journey like DESeq2 and LEfSe. However, I wanted to ask the QIIME 2 community before digging in deeper on those tools. Plus, it seems like DESeq2 is primarily designed for differential expression analysis from RNAseq data and LEfSe is not a QIIME 2 plugin.
Conclusion
If you read this far, I really appreciate you taking time for me. I hope I'm not the only one who's curious about differential abundance analysis with paired testing. Looking forward to hearing everyone's thoughts!