Hi, I have a question regarding the use of rarefacted tables. I know that such tables are commonly used only for diversity analysis while are not used as base for other analyzes like differential abundance or network construction. By reading the paper of Swiis and colleagues (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335496/) it seems that the use of rarefacted tables is not a bad choice cause it not lead to an increase of FDR while it should be adopted in cases in which groups with large (~10×) differences in the average library size are present.
I would like to ear some considerations from the community
There's been a fair bit of discussion around rarefaction here (or maybe Im just involved in a lot of them). It's maybe good to go look into those threads since they'll give you a wide variety of view points. Here are a few to start that I found pretty quickly.
Hi Justine,
thanks for your reply. I read the posts you suggest but my doubts still remain. In particular, a didn’t find some clear explanation about why should be not admissible to use a rarefacted table as a base for downstream analyzes like differential abundance testing or network construction. In the paper published by Swiis, it is only reported that rarefaction could lead to loss of sensitivity but not in an increased probability of false positive results. By searching the forum, I found a post (Filter Feature Table) in which @Nicholas_Bokulich strongly discourage to perform rarefaction before differential abundance testing but no reasons were provided. Furthermore, I performed my last analysis on both rarefacted and not rarefacted tables obtaining similar results. what do you think about?
McCurdie et al and Weiss et al clearly demonstrates an increased false positive rate with kruskal wallis testing on rarified data. Id suggest reading both papers more closely. Furthermore, as discussed in several of the posts I linked, microbiome data is compositional (I suggest reading the links in the first answer). Analyses that don’t consider compositionality are therefore methodologically inappropriate and therefore should be avoided where possible. You may have gotten lucky with incorrect methods (a stopped clock is right at least twice a day), but its still not an appropriate method. Id recommend q2-composition as a good starting place for correct analysis of differential abundance data.