Hello  I’ve been exploring the q2longitudinal
plugin to see if I can find a way to avoid stratifying before running ANCOM to identify differentially abundant features. I’m a bit confused about a couple of outputs. I’m probably pushing some of these techniques to their limits re sample size. Apologies in advance if this borders on interpretation of results vs interpretation of techniques  but hopefully might be helpful to others working through q2longitudinal
with questionable sample sizes.
For context  this is a followup from this post where @jwdebelius suggests trying LME models to deal with timepoints as nested variables.
I’m working in a system that doesn’t allow for many replicates  the best we can do is return several times, however, the source community isn’t stable enough for temporal replicates to be treated as true replicates. I’m testing the effects of flow rate (fast vs slow) and substrate (rock vs glass) on community structure. There are 4 timepoints (deployments), each time point consists of a single ‘source’ community sample, and a combination of flow rate and substrate factors. The experimental designs are unbalanced due to technical challenges:
Factors:
Deployment (4 levels)
Source (2 levels; fast, slow)
Fill (2 levels; rock, glass)
Design
deployment 1: 1 source; fast (2x glass, 4x rock); slow (2x glass, 2x rock)
deployment 2: 1 source; fast (2x glass, 4x rock); slow (2x glass, 2x rock)
deployment 3: 1 source; fast (2x glass, 4x rock); slow (2x glass, 2x rock)
deployment 4: 1 source; fast (1x glass, 2x rock); slow (1x glass, 2x rock)
The data I’m using for discrimination are the Bray Curtis distances calculated on the 4th root of relative abundance. Following this study showing that Bray Curtis calculated on proportions performs well at discriminating between different communities at a variety of effect sizes. I also calculated the Bray Curtis distances on the 4th root of relative abundance for only the 0.01% most abundant features. A Mantel test suggested a strong correlation between the distance matrices calculated on both the full and thresholded communities (Spearman Rho >0.98, p = 0.001), and so I proceeded with the thresholded data to minimize the effect of rare features most likely to be influenced by uneven library sizes.
A pairwise PERMANOVA indicated there was a significant difference between deployments (all paired deployments p <0.05), PERMDISP did not indicate a significant difference in dispersion between samples. ADONIS (deployment * source * fill), indicated a significant effect of deployment (R2 = 0.20, p=0.001), source (R2 = 0.15, p=0.001), and fill (R2 = 0.08, p=0.001) as well as a higher order significant interaction between deployment:source (R2 = 0.06, p = 0.002). Taken together, my understanding is that I should really stratify by deployment before proceeding with differential abundance testing. However, ANCOM only controls FDR <5% for sample sizes >10, and I was concerned about a high FDR penalty, and low statistical power if I stratified.
(sorry for the length of the preamble…) So I decided to use the q2longitudinal plugin to identify features predictive of state (deployment), remove these from my original distance matrix, and find out if there was still a significant difference between deployments, and a significant effect of deployment to justify not stratifying by deployment.

I ran
firstdistances
to visualize the change in distances between each deployment for each ‘subject’ (I defined a subject as a specific combination of fill and source repeated each deployment with 3 subjects missing from each source in the final deployment). Fed this intoqiime longitudinal linearmixedeffects
I received the following error:
Linear model will not compute due to singular matrix error. This may occur if input variables correlate closely or exhibit zero variance. Please check your input variables. Removing potential covariates may resolve this issue.
I’m pretty sure this is a result of the fact that I have no replicates for the source community at each time point as suggested here. I’ll rerun without the source samples, but I’m worried about the low number of replicates and the fidelity of the model. 
Using
qiime longitudinal featurevolatility
without the source samples, I identified features important to predicting state (deployment) using a Random Forest Regressor. The model had reasonable accuracy (r2 = 0.823, p = 0.002), so I have some confidence in the features identified. I then filtered the 30 most ‘important’ features from my distance matrix.
Here’s where I don’t understand:
The follow up pairwise PERMANOVA not only showed that there was still a significant difference between deployments, but in all pairwise comparisons between deployments, the teststatistic increased, and p value decreased. Doesn’t this suggest that there is a greater difference between the deployments as suggested by BC after removing the features most predictive of state? Similarly, ADONIS was the same as before (actually, exactly the same, which strikes me as odd…). The same new distance matrix with the 30 ‘most important’ samples removed was used for both PERMANOVA and ADONIS, and the PERMANOVA results were not what I expected, but different, so I don’t think I somehow confused the input tables (I also made sure I didn’t accidentally keep the important features, and that they were excluded during filtering). Am I completely misinterpreting the results of featurevolatility
and the effect of the ‘important features’ on contributing to the differences between states?
Thanks for all the insightful discussions!