Statistical analysis - looking for core microbiom

Mehrbod_Estaki · April 26, 2018, 5:43am

First off my apologies to you and @colinbrislawn, I didn't mean to repeat his post, I think we may have just posted our replies at the exact same time cause I never saw his while I wrote mine. But I'm glad we had the same idea and even thread in mind

I'm not entirely sure I understand what you mean by "shared" bacteriology, and especially how that's different than your second question of core microbiome. That being said - and please correct me if I'm wrong - I think you're referring to (dis)similarity of the community between the two infection groups, which the PERMANOVA test you referred to can certainly help with. In your case I would first create a PCoA plot using the qiime diversity beta function and visualize that using the qiime emperor plot tool. The choice of your distance matrix really depends on the experiment and the question being asked. Here's a good brief summary of the different ones available in qiime2 and what they measure. You can use the same distance matrices to run the PERMANOVA test you referred to which would compliment the PCoA figure nicely. Here's a great explanation of how the PERMANOVA tests work in general. The key thing to remember in these types of analysis is that they are dependent on the whole community of your sample as a whole, and are not univariate tests. As in they are not going to tell you which microbes alone are different between the two groups. For that type of testing you want to look at something like ANCOM or gneiss tools available in qiime2.
Another interesting approach, though I don't think it is necessarily what you are looking for nor is it available in qiime2 currently, is using a machine learning decision tree tool like random forest. In this scenario you can train a model using a subset of your data to see if it can categorize an unknown sample into either infectionA or infectionB site. If the model is accurate in its categorizing then you can identify key bacteria that were important in deciding whether the sample belong to infectionA or infectionB. A great tutorial on this method if you fancy giving it a try is available here. Though as I mentioned it sounds like you can get what you want with a simple PCoA+PERMANOVA test.

Let us know if that helps!

Edit: Actually I just discovered that qiime2 indeed does have a type of supervised classification like the random forest link above. This is the qiime sample-classifier classify-samples. Neat!