Why does taxa data not follow a normal distribution?

11114 · May 14, 2019, 10:58am

I want to use genus level taxa data as a variable for regression analysis or ANOVA.
(The number of samples(N) is almost 700)
However, all levels of data seems to follow the Poisson distribution and Naturally, the null test is also rejected in the normality test.

Anyone with the same problem ?

jwdebelius · May 14, 2019, 11:05am

Hi @11114,

Welcome! I'm guessing this is your first time working with microbiome data. Have you gone through the tutorials which address feature-based testing?

The inherent sparsity of microbiome data has been widely addressed (I suggest McCurdie et al and the later response by Weiss et al as follow starting points.) There was a recent post that also gave a bunch of links to other posts about the structure of microbiome data and normalization/rarefaction.

I'd also check out q2-decoide, q2-composition, and q2-gneiss as starting points for your analysis.

Best,
Justine

P.S. I moved this over to "General Discussion" because its a better fit