Is there a difference between the inference drawn from true M^2 value from procrustes analysis in QIIME2 and Procrustes Sum of Squares (m12 squared) from PROTEST (vegan)?
Thank you,
Shrez
Is there a difference between the inference drawn from true M^2 value from procrustes analysis in QIIME2 and Procrustes Sum of Squares (m12 squared) from PROTEST (vegan)?
Thank you,
Shrez
Does anyone have any comment on this question?
Thank you,
Shrez
Hello, very nice question!
Let's investigate!
Peres-Neto & Jackson (2001) don't make things easy, as they give no definition, but here's what they refer to as m_{12}:
vegan.protest.R
applies procrustes
test multiple times with permutations and extracts goodness-of-fit == sum of squares
, although using a trick to avoid computational overhead
Then it selects a number of M^2 from permutations that are MORE OR EQUAL than \sqrt{(1 - m_{12})} and calculates p-value.
Output from vegan.protest()
Procrustes Sum of Squares (m12 squared): 0.624
Correlation in a symmetric Procrustes rotation: 0.6132
Significance: 0.084
From vegan
documentation:
Function
protest
performs symmetric Procrustes analysis repeatedly to estimate the significance of the Procrustes statistic. Functionprotest
uses a correlation-like statistic derived from the symmetric Procrustes sum of squaresss
as r=\sqrt{(1−ss)}, and also prints the sum of squares of the symmetric analysis, sometimes called m_{12}^2.
From:
Significance of the M^2 statistic is often tested by permutation(see Jackson, 1995), whereby the row assignments in one matrix are randomly permuted a large number of times to create the null distribution.
Let's now look at qiime2.diversity.procrustes_analysis
.
From scipy.spatial.procrustes
documentation:
Procrustes ([1], [2]) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize M^2=\sum(data1−data2)^2, or the sum of the squares of the pointwise differences between the two input datasets.
QIIME does the same, but calculates a number of M^2 from permutations that are LESS than original M^2.
Qiime2
true M^2 value 0.162607866953173
p-value for true M^2 value 0.16666666666666666
number of Monte Carlo permutations 5
library(lattice)
library(permute)
library(vegan)
X = rbind(c(2, 3, 3, 5, 6, 2),c(7, 1, 3, 6, 7, 7), c(2, 2, 5, 6, 4, 2))
Y = rbind(c(2, 2, 3, 5, 3, 2),c(1, 6, 7, 2, 4, 1), c(7, 1, 3, 6, 7, 7))
protest(X, Y, permutations = 6)
Call:
protest(X = X, Y = Y, permutations = 6)
Procrustes Sum of Squares (m12 squared): 0.1626
Correlation in a symmetric Procrustes rotation: 0.9151
Significance: 0.5
Permutation: free
Number of permutations: 5
M^2 (QIIME2) = m_{12}^2 (vegan). So it is a naming inconsistency (duh, I hate them, they're especially common in stats).
There are differences in permutational testing, and I do understand QIIME2
code logic in testing, but I don't understand vegan
logic in testing.
It would be cool to have vegan devs to elaborate, so I opened an issue [QUESTION] PROTEST implementation · Issue #559 · vegandevs/vegan · GitHub
Cheers,
Valentyn