Group significance changes and interpretation

Matilda_H-D · January 3, 2018, 6:33am

Hi everyone,

Happy new year!

I have a couple of fairly general (sets of) questions or issues I've been mulling over. I will try to post as separate topics to make the forum easier to search. Here is the first one:

I have recently been running qiime diversity beta-group-significance (using both permanova and anosim methods) on a dataset and have been thinking about how to decide whether a given category is 'important' or has enough explanatory power to be discussed.

I also came across this thread on the Qiime 1 forum which discussed the difference between adonis and permanova tests, both of which were included in the equivalent command in Qiime 1 (group_significance.py). At the time @jairideout confirmed that the two tests are very similar but that adonis was preferred in Qiime 1 because:

Adonis is a more robust version of PERMANOVA because it can handle numeric variables (i.e. mapping file categories/columns) in addition to categorical variables. I've also found that Adonis results are easier to interpret because an R^2 value is given as part of the output, whereas with PERMANOVA you only get a pseudo-F statistic.

These are my questions:

I don't have a lot of experience in statistics so I feel pretty out of my depth and I'm not sure I really understand the difference between pseudo-F and R-squared that Jai referenced in the linked thread. My understanding is that R-squared in adonis corresponded to the percentage of variation in the distance matrix explained by the variable, and that pseudo-F is something else, but I'm not sure what ... so I'm not sure how to interpret my pseudo-F statistic results. Is the pseudo-F a measure of the difference in variance between the groups? If so, I agree that that does seem less useful for interpretation than the R-squared value. Can we calculate the R-squared value ourselves, maybe, or could anyone explain the value of the information contained in the pseudo-F statistic to help me better interpret my data?
Why, given the advantages mentioned in the linked thread, was adonis dropped in favour of permanova in Qiime 2?
Are we, as a community, any closer to having a consensus on what constitutes a large enough effect size to consider a variable potentially important in influencing community composition? For example, I work on oral microbiomes (dental plaque and calculus samples), and with the adonis test in Qiime 1 I would usually treat a variable that explained 5% or more of the variation as potentially interesting, because we rarely got any variables explaining more than 20-odd percent of variation. Would love to know how others are approaching this in Qiime 2. I'm waiting till I understand the pseudo-F statistic better to decide!

Sorry for long post and questions! I've been thinking about these things a lot and I hope that any answers might be helpful to other users as well.

Thanks!

Nicholas_Bokulich · January 3, 2018, 5:42pm

Hi @Matilda_H-D,
Happy new year!

the wikipedia page on permanova has a good description of how the pseudo-F is obtained. It is the result of an F-test to compare within-group to between-group variance. As far as I know, whether an F-value is significant or not depends on the critical value of F, which will depend on sample size etc, and thus is not too useful for interpretation — instead the P-value and R-value will be key for interpretation (but you may want to consult a statistician to get more input on interpreting the pseudo-F value!)

This issue is on our radar and we plan to add adonis in the near future. "Dropped" is not the right metaphor, since QIIME2 is being rebuilt from the ground up! Rather, building other core functions has taken priority but 2018 will bring many new surprises...

I am not sure there is a good answer to this — I think that there will be many factors affecting this, e.g., sample size, experimental variables, sample types, experimental questions, so sufficient R-values may depend largely on an individual experiment and there probably isn't a good rule of thumb. Let's see what others think!

I hope that helps!

Matilda_H-D · January 8, 2018, 3:31am

Thanks for the response @Nicholas_Bokulich!

I looked at the Wikipedia page on F-tests -- what I understand now is that the value of F will be larger if the sum of distances between groups is greater than the sum of distances within groups, is that right? i.e. if I see a high F-value I will be thinking, 'there is a greater distance between the groups than within the groups'.

I look forward to seeing new stats methods come up in future Qiime 2 updates!

Nicholas_Bokulich · January 8, 2018, 6:17pm

Roughly speaking, yes. I think your takeaway message makes sense (at least for comparing pseudo-F between tests of different factors on the same data; or, better yet, in a multi-way adonis test as we discussed above), but SSw and SSa have different denominators in the pseudo-F equation, so the number of samples and number of groups will influence the size of F. Looks like The number of samples will increase this value and the number of groups will decrease it.

I hope that helps!

system · February 9, 2018, 12:17am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.