Dear all,
I am comparing the microbiome of a species at different stages of life and I am trying to find out which features are differentially abundant between the different stages (=groups). I have been trying both ANCOM and Gneiss in Qiime2 and I have some doubts about the approach to these analyses.
Should I compare all the groups together or filter the table for 2 groups at a time (stage 1 and 2, then stage 2 and 3, and so on…) and run the differential abundance testing on each pairwise comparison? From the forum I understood that ANCOM cannot do the pairwise comparisons (unless using ANCOM2, which I haven’t tried yet), while Gneiss can. Am I right? or also in Gneiss should I run the analysis on each pair of samples?
For both ANCOM and Gneiss, I have first filtered my table for features which are present at least 10 times or present in more than one sample. Is that “enough”? in the ANCOM paper it is said that “to avoid sparsely observed OTUs, which tend to introduce noise, we investigated only those OTUs that were prevalent in at least 25% of the sample”, therefore, only for the ANCOM analysis, alternatively to the other filter, I have filtered the table for features present in at least 25% of the samples. Is this filter necessary for ANCOM?
it is not clear to me how to choose which balance (y0, y1, y2…) is important. Reading the tutorial and also on the forum, it is said that it can be understood from the heatmap and the tree of the regression summary, from the dendrogram heatmap, and also from the coefficient pvalues, but which is the best way to choose it?
in the analyses i have performed, there seems not to be overlap between the genera identified by ANCOM and those indentified by Gneiss. Isn’t that worrying?
When doing ANCOM analysis with multiple groups, you can run something like Kruskal-Wallis, which will test to see if all of the groups are equal. If you want to want to do pair-wise comparisons you are right - you'll need to manually verify all pairs. With Gneiss, if you have n groups, you can run n-1 tests (by keeping one category as a reference) - but not pairwise comparisons (see explanation here). I believe this is also the case with ANCOM2 since it is also using linear models underneath the hood.
There is not going to be a clear cut answer here -- because no one knows how to properly answer this question. We recommend a minimum of 10 samples because you cannot fit a good straight line with less than 10 observed samples, leading to the errors observed in previous posts
If you have multiple covariates, than the filtering criteria will likely be higher.
Did you look at the coefficients / pvalues from the regression summary?
Gneiss and ANCOM have very different assumptions / interpretations. Gneiss is more of a exploratory analysis trying to find meaningful clusterings. ANCOM will attempt to find individual driving taxa - assuming that few taxa are actually changing.
So no, we don't always expect these tools to agree.
Here I have filtered my table for those two groups I want to compare and run gneiss on that filtered table, so there are only two groups to compare (the other ones present in the metadata appear but without pvalues). Just to understand if I have got the right approach:
Do you think I did right to choose the y2 balance based on the coefficient pvalues?
Do the pvalues for a specific balance need to be significant in both groups at the same time in order to consider that balance as important?
Do you think that, based on the taxa summary, I can say that there are differentially abundant taxa in this comparsion?
I think that is reasonable - the coefficient is quite large (5.8 log fold change) and your corrected pvalue is sufficiently small. The boxplots show pretty clear cut separation.
The last thing that I would double check to make sure that there aren’t too many zeros with those 9 nine taxa - those could be interesting.