Hello, very nice question!
Let's investigate!
References
Peres-Neto & Jackson (2001) don't make things easy, as they give no definition, but here's what they refer to as m_{12}:
vegan
vegan.protest.R
applies procrustes
test multiple times with permutations and extracts goodness-of-fit == sum of squares
, although using a trick to avoid computational overhead
Then it selects a number of M^2 from permutations that are MORE OR EQUAL than \sqrt{(1 - m_{12})} and calculates p-value.
Output from vegan.protest()
Procrustes Sum of Squares (m12 squared): 0.624
Correlation in a symmetric Procrustes rotation: 0.6132
Significance: 0.084
From vegan
documentation:
Function
protest
performs symmetric Procrustes analysis repeatedly to estimate the significance of the Procrustes statistic. Functionprotest
uses a correlation-like statistic derived from the symmetric Procrustes sum of squaresss
as r=\sqrt{(1−ss)}​, and also prints the sum of squares of the symmetric analysis, sometimes called m_{12}^2.
From:
Significance of the M^2 statistic is often tested by permutation(see Jackson, 1995), whereby the row assignments in one matrix are randomly permuted a large number of times to create the null distribution.
QIIME2
Let's now look at qiime2.diversity.procrustes_analysis
.
From scipy.spatial.procrustes
documentation:
Procrustes ([1], [2]) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize M^2=\sum(data1−data2)^2, or the sum of the squares of the pointwise differences between the two input datasets.
QIIME does the same, but calculates a number of M^2 from permutations that are LESS than original M^2.
Implementation differences
- I created a quick notebook, which visualizes procrustes procedure and runs permutational testing as per
Qiime2
true M^2 value 0.162607866953173
p-value for true M^2 value 0.16666666666666666
number of Monte Carlo permutations 5
- I run an R code snippet with the same values:
library(lattice)
library(permute)
library(vegan)
X = rbind(c(2, 3, 3, 5, 6, 2),c(7, 1, 3, 6, 7, 7), c(2, 2, 5, 6, 4, 2))
Y = rbind(c(2, 2, 3, 5, 3, 2),c(1, 6, 7, 2, 4, 1), c(7, 1, 3, 6, 7, 7))
protest(X, Y, permutations = 6)
Call:
protest(X = X, Y = Y, permutations = 6)
Procrustes Sum of Squares (m12 squared): 0.1626
Correlation in a symmetric Procrustes rotation: 0.9151
Significance: 0.5
Permutation: free
Number of permutations: 5
Results
M^2 (QIIME2) = m_{12}^2 (vegan). So it is a naming inconsistency (duh, I hate them, they're especially common in stats).
There are differences in permutational testing, and I do understand QIIME2
code logic in testing, but I don't understand vegan
logic in testing.
It would be cool to have vegan devs to elaborate, so I opened an issue [QUESTION] PROTEST implementation · Issue #559 · vegandevs/vegan · GitHub
Cheers,
Valentyn