Procrustes and protest analysis

Is there a difference between the inference drawn from true M^2 value from procrustes analysis in QIIME2 and Procrustes Sum of Squares (m12 squared) from PROTEST (vegan)?

Thank you,
Shrez

2 Likes

Does anyone have any comment on this question?
Thank you,
Shrez

Hello, very nice question!
Let's investigate!

References

Peres-Neto & Jackson (2001) don't make things easy, as they give no definition, but here's what they refer to as m_{12}:
image

vegan

vegan.protest.R applies procrustes test multiple times with permutations and extracts goodness-of-fit == sum of squares, although using a trick to avoid computational overhead :slight_smile:
Then it selects a number of M^2 from permutations that are MORE OR EQUAL than \sqrt{(1 - m_{12})} and calculates p-value.

Output from vegan.protest()

Procrustes Sum of Squares (m12 squared):        0.624 
Correlation in a symmetric Procrustes rotation: 0.6132 
Significance:  0.084 

From vegan documentation:

Function protest performs symmetric Procrustes analysis repeatedly to estimate the significance of the Procrustes statistic. Function protest uses a correlation-like statistic derived from the symmetric Procrustes sum of squares ss as r=\sqrt{(1−ss)}​, and also prints the sum of squares of the symmetric analysis, sometimes called m_{12}^2.
From:

Significance of the M^2 statistic is often tested by permutation(see Jackson, 1995), whereby the row assignments in one matrix are randomly permuted a large number of times to create the null distribution.

QIIME2

Let's now look at qiime2.diversity.procrustes_analysis.
From scipy.spatial.procrustes documentation:

Procrustes ([1], [2]) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize M^2=\sum(data1−data2)^2, or the sum of the squares of the pointwise differences between the two input datasets.

QIIME does the same, but calculates a number of M^2 from permutations that are LESS than original M^2.

Implementation differences

  • I created a quick notebook, which visualizes procrustes procedure and runs permutational testing as per Qiime2
true M^2 value 0.162607866953173
p-value for true M^2 value 0.16666666666666666
number of Monte Carlo permutations 5
  • I run an R code snippet with the same values:
library(lattice)
library(permute)
library(vegan)

X = rbind(c(2, 3, 3, 5, 6, 2),c(7, 1, 3, 6, 7, 7), c(2, 2, 5, 6, 4, 2))
Y = rbind(c(2, 2, 3, 5, 3, 2),c(1, 6, 7, 2, 4, 1), c(7, 1, 3, 6, 7, 7))
protest(X, Y, permutations = 6)
Call:
protest(X = X, Y = Y, permutations = 6) 

Procrustes Sum of Squares (m12 squared):        0.1626 
Correlation in a symmetric Procrustes rotation: 0.9151 
Significance:  0.5 

Permutation: free
Number of permutations: 5

Results

M^2 (QIIME2) = m_{12}^2 (vegan). So it is a naming inconsistency (duh, I hate them, they're especially common in stats).

There are differences in permutational testing, and I do understand QIIME2 code logic in testing, but I don't understand vegan logic in testing.
It would be cool to have vegan devs to elaborate, so I opened an issue [QUESTION] PROTEST implementation · Issue #559 · vegandevs/vegan · GitHub

Cheers,
Valentyn

3 Likes