Kruskal-wallis all group v.s pairwise

arar · September 20, 2019, 7:04am

Hello
Thank you for this great forum.
I need an explanation, i made Shannon,simpson,ACE and simpson evenness alpha diversity metrics. The kruskal-wallis(all grou) for all of them was significant (<0.05) but the kruskal-wallis(pairwise) for all of them was non significant. I am confused how to interpret my result,whether i consider it significant or not?
I read another post and i know that kruskal-wallis (all group) do statistical analysis between all group and kruskal-wallis(pairwise) between each two of them.

jwdebelius · September 20, 2019, 7:45am

Hi @arar,

This is puzzling. What do the trends look like when you look at the distribution of your data?

My suspicion is that it has to do statistical power and a balance between group size and the number of groups. Your H-value/p-values suggest to me that you may have a lot of groups with a few samples in each of them. If this is the case, there may be a reason the total ranks would be different but the individual ranks don't quite shake out that way. Is there some way you can futher group or nest your data to combine it to give you more samples in fewer groups? What you lose in resolution you may gain in power and be able to confirm the group result.

Best,
Justine

arar · September 20, 2019, 8:12am

What do the trends look like when you look at the distribution of your data?
I can't understand this question but if you mean the distribution of bata diversity, this is the graph

Each sample represent a different sea location

Yes, rigth i have 9 groups each contain 2 samples. , i can group samples based on an metadata column to have 3 groups, each one containing 8 samples
These for simpson-eveness:

These for simpson richness:

but also i still nead to interpret results without grouping (9 groups each contain 8 samples). i consider it significant or not?

Notice:when i run ancom, only 9 taxa out of 6000 found to be differently abundant between groups (9 groups each contain 2 samples) but the result was 100 taxa out of 6000 found to be differently abundant between groups (3 groups each contain 8 samples)

jwdebelius · September 20, 2019, 8:35am

Hi @arar,

Where did the extra 6 samples come from? 9 x 2 = 18; 3 x 8 = 24.
But, also, is this group of 3 a cruder grouping of your 9 sites. So, like, maybe I have sites in the Baltic at locations A-I, but I know that A,B,C are in the north off the Swedish coast, D,E,F are in middle by the Finish coast, and G,H, and I are by Denmark so I'd group them into a superset of "Swedish", "Finnish" and "Danish" or something equilivant.

A test on an unrelated metadata group isn't going to answer your question.

I was hoping you could share the boxplots of the alpha diversity to help understand what that data looks like.

It's a third option: too underpowered to answer this question. A kruskal-wallis test typically wants at least 5 samples per group to be functional. You could try a different model, but this one isn't appropriate.

This would support a community-wide difference, (do you also see this in beta diversity? Did you filter your samples to exclude low abundance ASVs that are present in only one sample?) But, again, probably not enough samples for comparison.

Best,
Justine

arar · September 20, 2019, 9:26am

Sorry
My sample collected from 3 marine water bodies (the 3 groups), we collected 2 samples from 11 locations along the 3 water bodies,these water bodies are connected with each other.

These plots for eveness:

These plots for richness:

When i run beta group significnce, i didn't get significance between each 2 group in case of (11 location )but get signifocant result in case of (3groups)

jwdebelius · September 20, 2019, 9:48am

Hi @arar,

It looks like you have very clear trends in terms of your body of water, but that location is a lost cause. (If you squint, there are a couple, but it's really just not appropriate for what you're doing.) I would stick with your larger grouping of three bodies of water or treat location (if it is a distance) as a continuous covariate and maybe look at a regression.

Best,
Justine

arar · September 20, 2019, 10:15am

Please what do you mean by

Do you mean i need to find the differential abundant taxa with q2 gnesis?

If i am not able to increase the number of replicates, that is mean i can't say there is asignifican't difference or not.
Before analysis we thought that there will not be a big difference between locations

jwdebelius · September 20, 2019, 11:25am

You see very clear differences in alpha diversity based on your sampling site (And probably in your PCoA). So, I think that's clear and easy to interpret.

No, I mean that you can do a regression for alpha diversity with q2-longitudinal or your favorite regression software. (There's an example of this in the Parkinson's Mice tutorial. In particular, you may be able to leverage that and an adonis for your data.)

No, you're not. But, I also can't easily say off two timepoints whether or not there's a significant difference between my microbiome and yours. We'd need a bigger sample size to that comparison.

Best,
Justine

arar · September 20, 2019, 11:38am

As i think q2 longtudinal is for data that changeover time but that isn't the case of my data, all samples are collected at the same time but from different location

So o can depend on the box plot for saying there is a significant difference between the 11 locations

jwdebelius · September 20, 2019, 12:00pm

Please look through the diversity section of the tutorial. to see an example of an alpha diversity regression on categorial data.

No. You can say there's a difference between the 3 big groups. You can say nothing about your 11 groups.

arar · September 20, 2019, 12:13pm

Ok, thank you for your effort i will see the toturial

arar · September 21, 2019, 5:39am

Hello
I came back:slightly_smiling_face:

When i revised PcoA graph again i found the PcoA1 is only responsible for 18.12% of variability and pcoA2 responsible for 11.93%
Is that enough to consider there are a dissimilarity between samples

Another question is ancom a statistical test that also need ore than 2 replicates?

jwdebelius · September 21, 2019, 11:00am

If you can explain 20+% of your variation with a PCoA, it's not a bad day. Remember, the PCoA is a data compression technique that takes n dimensional data and turns it into something our two and three dimensionally challenged brains can handle. It's linked to, but not entirely linked to, your permanova results which tell you about the statistical significance.

Best,
Justine

Nicholas_Bokulich · September 22, 2019, 2:08pm

2 posts were split to a new topic: does ancom test need more than 2 replicates?

system · October 23, 2019, 8:08pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.