Note: Sample A (generally) is supposed to have significantly less bacterial numbers (both species and abundance) than samples B and C.
My interpretation for the unweighted unifrac boxplot is that based off the p-values in the unweighted unifrac, there is no significant difference as far as presence/absence is concerned and all samples have similar number of different species of bacteria (without importance to their respective abundance).
My understanding for the weighted unifrac boxplot is that there is significant difference in abundances between B-A and B-C (p-values)
Am I on the right track?
In both plots, sample A has the highest median distance, does that mean samples within A are significantly different to one another.
Almost, Unweighted and Weighted Unifrac don't look at "similar number of different species of bacteria", that sounds like alpha diversity or community richness. They investigate if the microbiomes have the same microbes or phylogenetically similar microbes, but not the number of different species of bacteria. Given that Unweighted Unifrac is a phylogenetic measure that uses absence/presences data and Weighted Unifrac is a phylogenetic measure uses abundance data, comparing the differences in the findings reveals that abundance seems to be a driver in the differences between these microbiomes and not the "who is there"(presence/absence). I think this is a really smart approach to better understanding your data!
I also think this finding makes sense in the context of this note of abundances, but investigating alpha diversity, not beta diversity, is the way to looking at differences in how many bacteria are there.
Yes, you are on the right track! If you wanted to investigate this further you could look at Jaccard and Bray Curtis to see how phylogenetic similarity might be effecting your findings! And if your want to investigate community richness try running alpha-group-significance
From the box plots, you can see that A has a lot of intra-variation. Although there is not a statistical test for intra-variation, it could be part of the reason there is not a lot of significance when comparing distances. It's easier to know that things are not the same if there isn't a lot of intra-variation causing overlap between the distances boxplots. For example, the A and C comparison is not significant, probably because the intra-variation in A is as big as the Inter-variation between A and C.
But looking more at your data, I find it odd that the median distance from A to A is significantly higher than the median distance of B to A. I.e. The intra-variation of A is higher than the inter-variation between A and B. How I am interpreting your data is that A and B are significantly different microbiomes, because B is more similar to A than A is to A, which doesn't make a lot of sense to me? I can think of a couple of reasons this might happen. Do you know why this would be the case with your data? If not, can you give me a little info on these 3 groups and what you are comparing, so that I can help debug?
I am trying to correlate translocation of bacteria from feces (B) and intestines (C) to tumors (A). I am trying to identify trends and any information I can get to find out what's happening in this phenomenon. What I am understanding from your comment is that there is no particular trend in terms of this translocation and potentially that there is similarity between fecal bacteria (B) to tumor microbiome (A).
...abundance seems to be a driver in the differences between these microbiomes and not the "who is there
Sorry, just to clarify so I understand correctly, are you saying this is happening in this case i.e. abundance seems to be driving these differences or is this something about Unifrac in general?
Sorry for the incredibly long answer! I hope it is helpful
This is so interesting!
Are there samples in tumor, intestines and feces groups that are sampled from one individual?
If they are samples from the same subject that would probably violate permanova's assumption of independence. I say "probably" because it is debated whether 2 separate microbiomes located on the same person are independent, but if you expect for there to be a transfer of microbe between the microbiome than it is definitely dependent and permanova is not the correct tool for this data.
There might be a trend but I think that there is another factor that isn't explained by these microbiome sites that is causing similarities between your microbiome sites. If one subject was sampled for all of your groups, the tumor microbiome of that subject might be more similar to the same subjects fecal microbiome than the tumor microbiome is to other tumor microbiomes. This could explain what we are seeing from your plots where fecal seems to be more similar to tumor in some cases than tumor is to itself.
I think this possibly supports your hypothesis of transferring microbes because if there are microbes that are transferring between these microbiomes sites that would increase the microbiome's similarity to each other and you would expect that to happen within a subject. This reminds me of a paper about PDAC that found transfer from the gut microbiome to the tumor microbiome and is really interesting: Tumor Microbiome Diversity and Composition Influence Pancreatic Cancer Outcomes - PubMed. You might already be familiar with this paper but I thought I would share anyway!
Also a little shameless plug, I develop a qiime2 plugin called q2-FMT, which is currently in alpha release, that helps assess engraftment of microbes in the recipient after a fecal microbiota transplant. I think with some minor tweaks this could work to track translocation of bacteria between different sites.
Basically, you would have to assign one or more of the microbiomes sites as your "donor" and the others as the recipient. This could help you track what microbes are being transferred between your sites, and how similar those sites are to each other.
By looking at the differences between the Unweighted Unifrac and Weighted Unifrac results we can see that abundance is a driver in finding significant differences. Because Unweighted Unifrac is not significant and Weighted Unifrac is significant and because Weighted Unifrac factors in abundances while Unweighted Unifrac doesn't, we can likely point to abundance as a main factor as to why weighted is significant. If you want to understand this more I would look into differences between Weighted and Unweighted Unifrac, there are a lot of sources out there that can explain the differences more thoroughly.
Thank you @cherman2 for your incredibly insightful response!
This particular dataset takes samples from 8 different mice, so we essentially had 8 tumors, 8 fecal samples, and 8 intestinal samples. Can we apply permanova in this case? Of course, we were under the assumption that fecal microbiome across mice would be pretty similar, but the study was conducted in two phases (n=4 mice per phase) more than a year apart. Is it possible that there may be intra-sample similarities i.e. mouse 1's tumor-feces-intestines may have good correlation but when we incorporate different mice, the analyses shows up as not similar?
That sounds exactly like what I have been looking for. I will try running q2-FMT and get back to you on how it goes.
Since your data is matched by a subject. I.e. one mouse was sampled for all three of these groups permanova can not be used. A good question to ask when thinking about dependence is: Is there a mouse that is in more than one of the groups I am comparing? If the answer is yes your samples are not dependent and permanova is not a good option. For your study, it seems like One mouse was sampled for all 3 of the groups you are comparing and therefore your data is not independent.
It is possible that the 2 different sequencing runs or a year between sampling may also be a confounding variable
I am not sure I understand this question. It sounds like you are wondering if a mouse has good correlations between its microbiomes but this is not really the question that permanova asks. Permanova tests if the microbiomes are different not if they are the same (thats a lot harder to prove).
The intestines samples in the metadata file are supposed to be the donor while the tumor samples are supposed to be the recipient. Is there a problem with how the metadata file columns are set up? I put the SampleID of the donors (Intestinal samples) in the InitialDonorSampleID column next to their respective recipients. That is why the InitialDonorSampleID next to the intestine samples is empty.