which method you recommand to do regression analysis of OTU abundance on continuous variable?

Wang_cs001632 · January 17, 2020, 2:04pm

We plan to find species associated with the disease. We can choose to divide patients into mild and severe groups and look for differential species, or we can use disease indicators as continuous variables for regression. Is it inherently better to do regressions than to find differential species? Can you recommend a regression method suitable for OTU for us?

Nicholas_Bokulich · January 17, 2020, 8:40pm

Hi @Wang_cs001632,
There are a few things you can do with QIIME 2.

Check out q2-sample-classifier (see tutorials on qiime2.org) — the regression methods there can be useful for indicating how predictive the microbiome is of your continuous variable, and rank features by their predictive power. The classifiers can do the same if you group your samples into mild and severe groups (and presumably there is a control group as well?).

See also q2-songbird (at https://library.qiime2.org/) to find features that are significantly associated with your continuous variable. One strength of songbird over the current capacity of sample-classifier is that you can use multi-variate models… e.g., maybe you want to control for sex and age when testing for microbial associations with disease?

You could also use q2-gneiss (also described in the qiime2.org tutorials) — same benefits as q2-songbird but songbird is the newer of the two and more highly recommended (by the developer of both).

Don’t do simple correlation though — pearson/spearman correlations are not appropriate for compositional data (like microbiome data) and should not be used to correlate microbial counts or relative frequencies with a continuous variable.

Good luck!

Wang_cs001632 · January 18, 2020, 2:27am

Thank for your timely reply!

Wang_cs001632 · January 18, 2020, 12:02pm

Since it is a new method, songbird have no formally published citation right now. As plan B, q2-sample-classifier with linear regresslion is best choice? or ridge regression?

Nicholas_Bokulich · January 18, 2020, 3:45pm

It is published, their online docs may just need to be updated:
https://www.nature.com/articles/s41467-019-10656-5

Stick with the default (random forest regression)