pairwise differences on Axis 3

Hello,
I have samples of two types (A/B), before and after a treatment (pre/post).

(There is no reason to hypothesize A would have changed more than B, or in a different direction; both were subject to the same treatment and likely had different starting communities)

I ran pairwise-differences on my PCoA axes to see if there were any predictable directional differences in pre/post. Axis 3 was significant.

I generated a biplot and there are clues as to which taxa may be driving this, but as I found in the forum, the most important taxa are determined by their vector magnitude on PC1.

I ran ANCOM-BC with pre/post and A/B as interactive and cumulative independent variables and have a list of significant taxa. I can run pairwise differences on the abundances or relative abundances of these taxa and generate boxplots. (It would probably be better to run ANCOM-BC2 in R where it is possible to run a paired analysis and may try this but it is intimidating for a beginner)

I was hoping there may be a more straightforward way to find which taxa are driving the difference between pre/post, and possibly contributing to interactions between pre/post and A/B.

I was thinking it may be a good idea to run Spearman on Axis 3 vs taxonomic (relative) abundances. Is there a good way to do that in QIIME?

Is there a way to generate a biplot (and list of important taxa) using Axis 3 as the axis of importance?

Do you have any other suggestions for how to find the significantly important taxa and visualize their impact? The sample classifier heatmaps are interesting and may be of use, but I would imagine RandomForest methods aren't as robust as ANCOM-BC and pairwise differences.

*Edit:
I also tried longitudinal feature-volatility, and only one taxon was in the important features result, with importance = 1. When I set feature-count to 10, a different resulting single taxon with importance = 1. Not sure if I am doing something wrong here:

qiime longitudinal feature-volatility
--i-table filtered_table.qza
--m-metadata-file QIIME_map.txt
--p-state-column pre_post
--p-individual-id-column pair
--p-feature-count 'all'
--output-dir volatility_032223

Any suggestions are greatly appreciated! Thank you, Nate

1 Like

Hi @nathaniel_hubert,

I think rather than the biplot and correlation you might want to look into either complex tensor factorization in Gemelli or rPCA in DEICODE. The advantage of these over traditional metrics is that the features are embded in the ordination, so you can use them to figure out what side of the ordination features are associated with... or to even build ALRs if you're into amalgamated microbial statistics. I happen to really like amalgamated values

If it's helpful or interesting, I'll mention that I recently did something similar and our preprint is out. (Manuscript is still under review). I used Gemelli to look for directional changes between tissue types between two survival groups. Sub tissue for time point and survival group for treatment, and I think its similar.

Best,
Justine

4 Likes

Thank you @jwdebelius !
I am looking into those approaches now.
Very cool paper, and cool approach.

I haven't heard of these methods, but this tutorial looks very straightforward:

Just curious, is the method I proposed appropriate? Can it be done in QIIME (i.e., determine which taxa correlate with Axis 3)?

I am also wondering if there is an error in the feature-volatility commands I shared? I tested it with states that differ predictably and again only one taxon in the resulting important features.

Thank you very much! Nate

Hi @nathaniel_hubert,

I never wnat to say "never do this" because there are reasons you might end up doing this. However, the canonical and most recommend solution is a biplot, whch places features and samples int he same space. You essentially get a feature loading out, whcih tells you about their position in PCoA space. There shoudl be a biplot method in q2-diversity that you could apply, so that would also bea n option.

I've not run feature volitility, so I'm not sure. But, there are lots of reasons you might only get 1 organism:

  1. You might not have enough statistical power
  2. Your difference might be due to a variety of organisms across samples and you need to look at an aggregate statistc rather than trying to find a single bug to blame.
  3. You might need to filter your data more stringently because you're paying too much of a correction penalty for sparse data
  4. There might only be 1 interesting organism.

Best,
Justine

2 Likes

Thank you, Justine!
I really appreciate your time and guidance.
Will let you know how it goes.
Nate

1 Like