I read that for differential abundance analysis, it's recommended to do rarefaction at specific sequencing depth (feature counts). Is this true / always the case?
I'm asking because after browsing around on this forum, I got the impression that many/most people use the abundance table without rarefaction.
What about normalization instead of rarefaction?
Did the reviewer have any comments regarding applying normalization using tools such as Deseq etc before using ANCOM/LEfSe?
According to this source:
"Please note, data normalization is mainly used for visual data exploration such as beta-diversity and clustering analysis. It is also used for comparative analysis using statistical methods without known normalization procedures that work best (univariate statistics and LEfSe)."
"
When should I opt to rarefy my data?
Whenever the sequencing depth of your samples differ too much (i.e. >10X), it is recommended to perform rarefaction before normalizing your data. Note, users should also consider to remove the shallow sequenced samples as such gross difference could be due to experimental failure. For more details, please refer to the paper by Weiss, S et al"
In generally, rarefaction is avoided before differential abundance. I would avoid LefSe because I tend to think of Kruskal-Wallis as prone to a high false positive rate. (I also hate their figures). I feel like Weiss et al and McMurdie and Holmes showed pretty clearly that rarefaction isn't great.
In generally, my rule of thumb for rarefaction is that I (currently) apply it before doing diversity analysis (although if my sequencing depth variation is big, sometimes I'll also thrown in a depth term in my model for richness) unless I'm doing Aitchinson or DECOIDE, which do their own normalization.
For ANCOM, DECOIDE, Songbird, PhILR, Phylofactor, and Gneiss, I don't rarefy. Usually there's a normalation either built directly into the method, or there's a normalization step in the pipeline (most frequently some kind of log transform).
The one other thing I'll note is that although it affects my composition, I often pre-filter my data before doing differential abundance - I assume that if I have a feature in one sample, I dont have enough of a distribution to perform statistical testing and I tend to drop low abundance/low prevalence stuff.
I see... by any chance, do you know if it's recommended to use any prior normalization method before LefSe? (I heard about DeSeq's Variance Stabilization but I haven't actually tried it... would it be a good option?)
How / where in QIIME2 do you include this depth information during your analysis?
I've also been doing rarefaction before diversity analysis, but sometimes my sequencing depth variation is huge and I'm worried about that, especially after reading papers like that of McMurdie and Holmes (as you mentioned).
I don't know, like I said, I tend to stay away from LefSe (and actually DeSeq2 for most data). I like other methods more.
I tend to use tools outside of QIIME because I have a tendency to run multiple models. But, I think there's an anova function in q2-longitudinal that serves this purpose. I find richness tends to be asymptotically normal so they work pretty well.