The way I understand things, normalization (such as in DeSeq2, EdgeR, etc.) serves two purposes: 1) Model the “real” abundance in the original samples from the read counts, 2) Make the abundance distributions conform to the needs of statistical analysis by removing heteroskedasticity, dependence, dispersion, etc.
It has been stated many times here that it is very difficult to reproduce the fold-change you get from DeSeq2 by extracting the normalized counts, but you can come close. Taken at face semantic value, “fold change” sounds like it should refer to the ratio of the “real” abundance; the fold change of “actual” expression or “actual” community representation.
So if the normalized (or normalized + VST, or normalized + MLE) abundance better represents the “real” abundance, then shouldn’t I use the normalized counts for ALL of my analysis steps:
- Alpha diversity
- Beta diversity
- F2B ratio
- IgA sorting analysis
- Other regression analysis
NOTE: Cross post from here and [here]Ihttps://bioinformatics.stackexchange.com/questions/8846/normalization-for-microbiome-16s-sequence-analysis) due to no action.