I was struggling with how to rarefy an optimal sample size. Even rarefying to 10000 reads (first red line), 169 samples would be removed. However, if I select a smaller one, the diversity would be underestimated. I wonder whether there is an alternative way to normalize the data rather than rarefying, so that I can keep all samples or at least most samples while computing alpha diversity and beta diversity.
You are correct, there is a tradeoff between keeping more samples and keeping more depth, which has been discussed here and here. I don't think there's a perfect solution.
I have not tried this, but you could try the SRS tool!
Thank you for the suggestion. There is indeed no perfect solution for this issue. However, I check the alpha diversity, especially the Shannon index, and found that Shannon does not change after rarefaction (figure below). It suggests that the Shannon diversity reaches the plateau and I can compare them after rarefaction. Besides, I have no idea how to deal with the beta diversity. Does the rarefying impact a lot?
Thank you for the reply. I carefully read these papers, and I summarize them a little bit.
Regarding alpha diversity, if the diversity does not change with library size, we can compare them after rarefaction.
Regarding beta diversity, it seems that proportion always outperforms other methods, thus proportion is recommended. On the other side, if I'd like to investigate the influence of other factors to the microbiome community, instead of comparing sample-wide distance, does proportion still work well?
Regarding differential taxa, there are numerous ways to compare them (DOI: 10.1038/s41467-022-28034-z). There is no one optimal option for all datasets.