tempering volatility by removing important features

Hello - I’ve been exploring the q2-longitudinal plugin to see if I can find a way to avoid stratifying before running ANCOM to identify differentially abundant features. I’m a bit confused about a couple of outputs. I’m probably pushing some of these techniques to their limits re sample size. Apologies in advance if this borders on interpretation of results vs interpretation of techniques - but hopefully might be helpful to others working through q2-longitudinal with questionable sample sizes.

For context - this is a follow-up from this post where @jwdebelius suggests trying LME models to deal with timepoints as nested variables.

I’m working in a system that doesn’t allow for many replicates - the best we can do is return several times, however, the source community isn’t stable enough for temporal replicates to be treated as true replicates. I’m testing the effects of flow rate (fast vs slow) and substrate (rock vs glass) on community structure. There are 4 timepoints (deployments), each time point consists of a single ‘source’ community sample, and a combination of flow rate and substrate factors. The experimental designs are unbalanced due to technical challenges:
Factors:
Deployment (4 levels)
Source (2 levels; fast, slow)
Fill (2 levels; rock, glass)
Design
deployment 1: 1 source; fast (2x glass, 4x rock); slow (2x glass, 2x rock)
deployment 2: 1 source; fast (2x glass, 4x rock); slow (2x glass, 2x rock)
deployment 3: 1 source; fast (2x glass, 4x rock); slow (2x glass, 2x rock)
deployment 4: 1 source; fast (1x glass, 2x rock); slow (1x glass, 2x rock)

The data I’m using for discrimination are the Bray Curtis distances calculated on the 4th root of relative abundance. Following this study showing that Bray Curtis calculated on proportions performs well at discriminating between different communities at a variety of effect sizes. I also calculated the Bray Curtis distances on the 4th root of relative abundance for only the 0.01% most abundant features. A Mantel test suggested a strong correlation between the distance matrices calculated on both the full and thresholded communities (Spearman Rho >0.98, p = 0.001), and so I proceeded with the thresholded data to minimize the effect of rare features most likely to be influenced by uneven library sizes.

A pairwise PERMANOVA indicated there was a significant difference between deployments (all paired deployments p <0.05), PERMDISP did not indicate a significant difference in dispersion between samples. ADONIS (deployment * source * fill), indicated a significant effect of deployment (R2 = 0.20, p=0.001), source (R2 = 0.15, p=0.001), and fill (R2 = 0.08, p=0.001) as well as a higher order significant interaction between deployment:source (R2 = 0.06, p = 0.002). Taken together, my understanding is that I should really stratify by deployment before proceeding with differential abundance testing. However, ANCOM only controls FDR <5% for sample sizes >10, and I was concerned about a high FDR penalty, and low statistical power if I stratified.

(sorry for the length of the preamble…) So I decided to use the q2-longitudinal plug-in to identify features predictive of state (deployment), remove these from my original distance matrix, and find out if there was still a significant difference between deployments, and a significant effect of deployment to justify not stratifying by deployment.

  1. I ran first-distances to visualize the change in distances between each deployment for each ‘subject’ (I defined a subject as a specific combination of fill and source repeated each deployment with 3 subjects missing from each source in the final deployment). Fed this into qiime longitudinal linear-mixed-effects I received the following error:
    Linear model will not compute due to singular matrix error. This may occur if input variables correlate closely or exhibit zero variance. Please check your input variables. Removing potential covariates may resolve this issue.
    I’m pretty sure this is a result of the fact that I have no replicates for the source community at each time point as suggested here. I’ll re-run without the source samples, but I’m worried about the low number of replicates and the fidelity of the model.

  2. Using qiime longitudinal feature-volatility without the source samples, I identified features important to predicting state (deployment) using a Random Forest Regressor. The model had reasonable accuracy (r2 = 0.823, p = 0.002), so I have some confidence in the features identified. I then filtered the 30 most ‘important’ features from my distance matrix.

Here’s where I don’t understand:
The follow up pairwise PERMANOVA not only showed that there was still a significant difference between deployments, but in all pairwise comparisons between deployments, the test-statistic increased, and p value decreased. Doesn’t this suggest that there is a greater difference between the deployments as suggested by BC after removing the features most predictive of state? Similarly, ADONIS was the same as before (actually, exactly the same, which strikes me as odd…). The same new distance matrix with the 30 ‘most important’ samples removed was used for both PERMANOVA and ADONIS, and the PERMANOVA results were not what I expected, but different, so I don’t think I somehow confused the input tables (I also made sure I didn’t accidentally keep the important features, and that they were excluded during filtering). Am I completely misinterpreting the results of feature-volatility and the effect of the ‘important features’ on contributing to the differences between states?

Thanks for all the insightful discussions!

1 Like

Hi @hsapers ,

Just some quick ideas:

I would discourage this approach. first-distances is usually used with unique individuals… I see your point re: fillxsource combinations, but this cannot handle replicates (i.e., you would need to arbitrarily assign how replicates relate to each other). It is a bit of a moot point because you do not have replicates for the source samples:

exactly

how do you filter from the distance matrix? Do you mean feature table?

did you re-normalize (e.g., rarefy) after removing these features? removing the most abundant/informative features will likely leave a latent signal, i.e., because those deployments will have massive gaps that any abundance-sensitive metric could not fail to notice. Renormalize (e.g., re-sample) after filtering.

I hope that helps!

2 Likes

Thanks @Nicholas_Bokulich !

how do you filter from the distance matrix? Do you mean feature table?

Yes - sorry - filtered from the feature table and then re-calculated Bray-Curtis distances:

    qiime feature-table filter-samples \
        --i-table longitudinal/ASV_fourth_2-5_man_car_01.qza \
        --m-metadata-file final_tables/importance_ASVs.tsv \
        --p-exclude-ids \
        --o-filtered-table final_tables/ASV_fourth_2-5_man_car_01_not_important.qza

But I failed to consider the impact of removing samples on the transformed data I was using to calculate BC. Of course I can’t just remove those samples from a table of fourth root of relative abundance data without recalculating :woman_facepalming:

1 Like

did you re-normalize (e.g., rarefy) after removing these features? removing the most abundant/informative features will likely leave a latent signal, i.e., because those deployments will have massive gaps that any abundance-sensitive metric could not fail to notice. Renormalize (e.g., re-sample) after filtering.

Re-normalized (4th root rel-abund) after filtering and re-ran all analyses with the same outcome - increased test statistic, all pairwise comparisons significant. I’m wondering if feature-volatility may be biased to identifying more prevalent features as these would have more impact individually on state prediction where the differences between states may be driven by rare features. If filtering out the ‘important features’ identified by feature-volatility removes a few highly variable, prevalent features, this could increase the effect of the numerous, low abundance, variable features - actually increasing the apparent differences between state (driven by numerous, low abundance features that PERMANOVA would be more sensitive to)?

I could try working with a more severely thresholded set of features (current is 0.01% most abundant), or iteratively run through feature-volatility. I think the latter would just keep pulling out the more prevalent features and increasing the noise of the low abundant features. Former feels a bit like throwing out valid data…

PERMANOVA will depend on the beta diversity metric you are using. Bray-Curtis is obviously sensitive to abundance but for other metrics this statement is definitely true.

Importance scores can be difficult to interpret — abundance will impact these, but not entirely. Another possibility is that the most important features are primarily in a single deployment, and so removing these only exaggerates differences with other groups.

For these reasons I think that this approach might be a bit flawed. Definitely throwing out data as you say, but also a bit unusual (particularly as random forest classifiers and bray curtis dissimilarity are really quite tangential and sensitive to different features).