How to normalize bacterial abundance for unequal sample size?

jwdebelius · December 10, 2018, 9:25am

Which analyses did you perform on what level?

Second, is there a reason to believe what you're seeing is untrue and that group C isn't dominated by single OTUs? If you run a PCoA based on weighted metrics, does it make sense that group C clusters more tightly (shares more weighted metrics)? You could see this in boxplots or in a PCoA, depending on other factors in the data set.

Third, if you randomly select a subset of Group C, does the pattern hold? If 10 samples from Group C still show the taxa is dominant, that might suggest an actual effect, adjusted for sample size, rather than a group size effect.

I also want to add some background on normalization methods in microbiome, because its complicated (what isn't) and it's been a big piece of discussion the last couple of years... Waste not, want not: why rarefying microbiome data is inadmissible (PMID: 24699258) was one of the first papers to address issues with rarefaction for taxonomy-based analyses, as well as propose one of the first widely accepted models. However, importance of rarefaction in diversity metrics was reinforced in Normalization and microbial differential abundance strategies depend upon data characteristics (PMID: 28253908), which is also worth a read. Finally, I recommend Microbiome Datasets Are Compositional: And This Is Not Optional (doi: https://doi.org/10.3389/fmicb.2017.02224). I think QIIME2 has tried to address the issue around compositionality in the tests it uses.

Best,
Justine