The way I understand things, normalization (such as in DeSeq2, EdgeR, etc.) serves two purposes: 1) Model the “real” abundance in the original samples from the read counts, 2) Make the abundance distributions conform to the needs of statistical analysis by removing heteroskedasticity, dependence, dispersion, etc.
It has been stated many times here that it is very difficult to reproduce the fold-change you get from DeSeq2 by extracting the normalized counts, but you can come close. Taken at face semantic value, “fold change” sounds like it should refer to the ratio of the “real” abundance; the fold change of “actual” expression or “actual” community representation.
So if the normalized (or normalized + VST, or normalized + MLE) abundance better represents the “real” abundance, then shouldn’t I use the normalized counts for ALL of my analysis steps:
IgA sorting analysis
Other regression analysis
NOTE: Cross post from here and [here]Ihttps://bioinformatics.stackexchange.com/questions/8846/normalization-for-microbiome-16s-sequence-analysis) due to no action.
@mortonjt Thank you so much for replying. As I said, I’ve gotten no response to this question. And congratulations for trying to tackle the issue of compositional data head-on. I look forward to digesting the article and trying out your ratio methods.
In the short term, would you say that doing the standard routine (alpha, beta, igaseq, regression, etc.) on some sort of normalized data rather than raw data would be an improvement?
Definitely some sort of normalization is required for most analyses, but the same sort of normalization may or may not be appropriate for all methods.
In QIIME 2 we handle this by having each method (mostly) perform the normalization that is required. So a couple examples:
alpha/beta diversity methods have their own normalization (rarefaction in the core-metrics pipelines; see q2-breakaway for a more sophisticated method for attempting to estimate the true alpha diversity if rarefaction is upsetting)
differential abundance methods have their own normalization procedures on-board (e.g., see ANCOM or @mortonjt’s methods)
It would be awesome to see other normalization methods implemented in QIIME 2 and we have some open issues — if you are interested in getting involved please let us know!
I tried googling for QIIME2 normalization and found the normalize_table.py script and differential_abundance.py script. Both have options to use DeSeq normalization. There are also some 3rd party scripts implementing percentile normalization that have been developed:
I couldn’t find one that does ANCOM.
Is ANCOM normalization implemented in QIIME2?
I’m not fully versed, but it seems to me that the more biological/clinical microbiome literature is well behind the improved methods being developed.
I did a small amount of Google sleuthing just now and found that there are quite a few papers developing percentile methods or other ways to better normalize compositional data.