Weight loss study - using QIIME2 for microbiome analysis

Dear QIIME2 Team,

my name is Tim, I am a physician/researcher at the NIH. I am currently analyzing microbiome data from our weight loss study in QIIME2. I use the newest QIIME2 version, which I natively installed on my MacBook Pro using miniconda. I already completed the moving pictures tutorial. However, I am not sure if QIIME2 can do the analyses I would like to do.

Study design:

  • 15 participants underwent 6 weeks of in-patient caloric restriction with a liquid diet
  • Stool samples were taken at Baseline, weekly from Week 1 to 6 during the weight loss period, and at the post-weight loss period (in total 8 time points)
  • Meta data includes weight lost and stool calories

Research question:

  • Do certain bacteria/features at baseline predict weight loss?
  • Do changes in bacteria/features during caloric restriction correlate with concomitant changes in stool calories?

Question to you as QIIME2 experts:

  • Is there a way to do these analyses in QIIME2? I already found the longitudinal analysis tool but I am not sure if I can also use it for this kind of analysis?

Thanks so much for helping out :slight_smile:
Tim

2 Likes

Hi @TimNIH,
QIIME 2 can do anything if you put a bit of elbow grease in. (take that statement with a pinch of salt: it has yet to achieve enduring world peace or clean my car)

Easy peasy:

  1. Filter your feature table to contain only baseline samples.
  2. add a metadata column for those samples, which indicates the degree of weight loss, or even multiple elements of this, e.g., % weight loss at each time point, etc. You may also want to take the log of those values, depending on what the distribution looks like.
  3. Use qiime sample-classifier regress-samples-ncv or regress-samples to predict each weight loss metric of interest, one at a time. See the online tutorials for some examples.

The main issue here is going to be that you have only 15 samples, so this is far too low for very robust predictions, but it is worth a shot.

Alternatives:

  1. use regress-samples-ncv to predict subject weight as a function of microbiome at each timepoint (instead of comparing baseline microbiome to terminal weight)
  2. you could correlate weight loss with baseline microbiome using songbird (see below).

This is a bit more challenging, since the bacterial measurements you have are relative abundances, so normal correlation metrics (e.g., spearman, pearson) are invalid. You could try out songbird to regress stool calories vs. microbiome.

q2-longitudinal will have 2 potentially useful features, given your experimental design:

  1. use feature-volatility to discover which features change most as a function of time
  2. Select the most important features, and run linear-mixed-effects to evaluate the change in abundance of these features over time and optionally in relation to different treatments.

I hope that helps!

3 Likes

Hey Nicholas, thanks soooo much for your quick help. I will try this out!

1 Like

Hey Nicolas, I started the analysis with qiime sample-classifier regress-samples using weight loss rate as the variable of interest. However, I don’t really understand the accuracy_results.qvz file (attached). There are only three subjects out of 15. All have data but not everybody appears on this plot. Do you know why?

I also don’t really know how to select the correct values for
–p-n-estimators
–p-random-state
I used 20 and 25 now, respectively. But the regression line changes a lot when I change those values. Do you have a good idea how to select the right values for this analysis? I couldn’t really find anything on Google…

Thanks so much for helping me out :slight_smile:
Tim

accuracy_results.qzv (272.2 KB)

You used regress-samples instead of regress-samples-ncv. The former splits your dataset into a training set and test set (4:1 by default) and only tests the test samples. The latter does the same but K times over, such that each sample is in the test set exactly once — so try the latter and you will get a prediction for each.

More is better to an extent but increases runtime. Start with the default, and if all works reasonably well, try increasing to maybe 500 and see if that helps.

pick a number, any number. This just sets a random seed so that results are repeatable.

setting too few estimators will result in high variability. Which random state you select should not impact results too much unless if you have too few samples leading to weak results, which is probably the case here (as I warned you, 15 samples is really too low for this type of analysis but still worth a look).

1 Like