Adjustment for sequencing depth vs. Rarefaction

I know rarefaction (Normalization) is a commom practice in 16S microbiome anlaysis. But sometimes, the rarefied data looks really strange and confusing to me. For example, if most samples, let’s say 60%, has taxon A accounted for 98% for the total original reads in each sample. Suppose they were sequenced under same depth, how can we differentiate the samples with 50000 clean reads and the samples with 500 clean reads after rarefaction if we need to find out whether an exposure is related to the abundance of taxon A.

I recently read a paper, they used negative binomial model to fit the rarefied reads of taxa data (Abundance, rather than relative abundance). The negative binomial model is a common model used in microbiome study for count data. I think they did not encounter such situation, because under this situation, most samples will have similar abundances of taxon A, and thusly making the variance of independent variable become really small.

I was wondering whether it is feasible to investigate an exposure with the abundance of a given taxon without rarefying the data, but just adjust for the original sequencing depth in a regression to mitigate the library size on the association between the exposure and the abundance?

Or if there is any other alternative solution to deal with this situation?

Thanks in advance!

Hi @fanwayne,

I think the two best resources I can offer that address this question are from @mortonjt. (Who I’ll also mention here incase he wants to weigh in specifically.) His reference framework paper set up the idea for me in a way that I (mostly) understand (Also available as q2-songbird). But, he also wrote a blog post that breaks it down nicely. So, maybe those could help as resources to start with?


Hi @jwdebelius. It looks like a good choice to circumvent this issue by comparing the ratio of taxa. Many thanks for your advice.

1 Like