Specifically the abundances of Firmicutes. I can't work out if it's higher in T3 or T4. I drew a bar chart using the percentiles given in the hope that it would help but I'm still none the wiser (apologies for messiness - I'm not a fan of rulers ).
From this, it doesn't look like there is any difference between the two treatments - but maybe I'm missing something in the way that ANCOM calculates this?
Hi @xchromosome,
You are correct to have concerns about this. This isnāt the first time weāve seen ANCOM identify features significantly different with very low W values. See this post for more details. However, I should also mention that ANCOM is much more powerful for larger datasets, meaning if you were to compare your data at the Genus level instead of the Phyla you would have much more meaning results. You are only comparing 7 Phyla which means the max W value you can get is 6, which doesnāt allow for alot of certainty. This may be why you are seeing differences that may not actually be true.
Thank you to you and @colinbrislawn for the replies. I wasn't sure which taxonomic level to look at for my analyses, so I have done all of them at levels 2, 5, 6 and 7. I read the post you linked to - so I understand why ANCOM isn't really suitable for phylum level comparisons. What about level 5/family?
Sorry, this is probably a silly question but how exactly would you recommend I do this? I was under the impression that t-tests couldn't be used for microbiome data!
...um, stats tests don't care about where the data came from (microbiome or otherwise) but they do have assumptions that need to be met for the result to be valid. For example, the t-test assumes a normal distribution, and if your data distribution isn't normal (microbiome or not) then you got to pick a different test.
I'm not a card-carrying statistician, so I think I'll let the experts answer your stats questions!
Oh, I know! Sorry, maybe I didn't explain that right! What I meant was that I thought t-tests weren't really suitable for comparing relative abundances like you would generate from microbiome analysis, and that that was the advantage that ANCOM had over them. I've always struggled with stats though.
So, given that a W value of 2 (as in my result above) is invalid, can I ask what you think of this result, which is the same table and treatment comparison but at family level instead, with W scores of 30 - 38? ANCOM-D37-T3-vs-T4-collapsed-level-5.qzv (455.6 KB)
Good point! Compositional data has its own issues that can violate expectations of some tests. So let's dive into these new ANCOM results.
Here's how Jamie describes the ANCOM volcano plot:
You have no points your top right corner... meaning that nothing is super different between groups.
Have you tried this test using all your features without grouping them at a higher taxa level? Grouping can hide cool trends of specific microbes, so performing this test yet a third time using all feature might work well.
Thanks for that. I've just had a read at that thread where Jamie explained the volcano plot and noticed that the example in question looked a bit different to mine. It seems that poster was comparing multiple groups, so there is an F-score on the x axis in that case:
In this post here, Jamie said that for the example given with only 2 groups,
"Considering the volcano plot, it looks like there isnāt an F-statistic being run. When there are only 2 categories, just the clr mean difference is calculated (which is essentially a log fold change). If you have negative log fold change, that is indicative of decrease (since log(x) < 0 for 0 < x < 1), whereas a positive log fold change is indicative of increase (since log(x) > 0 for x > 1)."
(Sorry, can't find how to link the quote from that post!)
So from what I can gather, it seems like features appearing in the top left of the graph are just as significant/valid/important as those in the top right, in this case where there are only two groups? Is that correct?
Haha, I had a similar issue when trying to Google gneiss for the first time
Thanks - I will check that out. I'm struggling to get my head around the concept of an "average microbe" and how the increasing/decreasing relative to that links in with treatment groups. Maybe that paper will help it click!
Haha no problem! I really appreciate you taking the time to help. I will wait patiently for an expert to arrive.
Hi @xchromosome, part of the problem has to do with the nature of relative data ā it is not possible infer absolute differences from relative data. This means it isnāt possible to infer increase / decrease from relative data. But it maybe possible to infer which microbes increase / decrease the most, and simulations show that ANCOM maybe able to do this. See our paper here: https://www.nature.com/articles/s41467-019-10656-5
Are you seeing separation in beta diversity? If PERMANOVA doesnāt give you significant results, maybe differential abundance wonāt give you the insights you want.
Thanks for your input. Yes, I had highly significant differences in beta diversity, although both the PERMANOVA and PERMDISP tests showed significance (the PERMDISP less often and less strongly though). There were large differences in alpha diversity as well. Iām looking at antibiotic treatment vs control so I am expecting some taxa to be different between the groups. Iāve used gneiss as well but I found it less useful.
Thanks for the link to the paper. Iāve read it and tried to understand as much as I can from it, although Iām struggling a bit to be honest! Is the basic principal the concept of finding out the true abundance of one species (say x) and using that to estimate the true abundances of all the others based on their ratio to species x?
Can I just ask one more thing? One of the taxa that keeps popping up in all my different comparisons between various groups is āAzospirillum sp. 47_25ā. It shows up when looking at genera and species level. Species fine, but why is it listed as a genus? Why not just Azospirillum?
Not quite, if you don't have the proper experimental setup (i.e. flow cytometry, ...), and only relative abundances from sequencing, then no, you cannot estimate the relative abundances. The point of the paper is that you can't infer absolute differences, but you can infer relative differences and rank the microbes according to how much they have changed across conditions.
That I'm not as sure of - maybe @Nicholas_Bokulich would have a better idea.
That must be a peculiarity with your database, not something QIIME 2 is doing. I agree ā it should not be listed as a genus. You should check, and even possibly amend, your database to confirm that is where that genus name is slipping in.