Hi everyone,
I’ve rarefied my fungal community dataset to even sequencing depth and I’m preparing for beta diversity analysis (PERMANOVA, PCoA). Before running it, I used betadisper in R to calculate each sample’s distance to its group centroid (by Treatment) and flagged potential outliers as those with a distance > mean + 2×SD. Some of these samples look visually isolated on PCoA plots or have unusual taxonomic profiles.
Is this a reasonable approach for detecting outliers in beta diversity analysis? Is it recommended to remove such outliers even from the rarefied dataset? Are there best-practice guidelines in QIIME2 for handling outliers reproducibly?
I am also not a statistician but I tend to agree with @colinbrislawn that I leave outliers be. My rational here is that unless I can prove that the data is wrong in some way, its a real signal and I am not sure how I would justify its removal to reviewers. (ex: I mislabeled a sample)
I would also say that if you are seeing outliers from sampling to a single sampling depth, I might try using q2-boots and seeing if that fixes your outliers. q2-boots provides rarefaction-based diversity metrics (q2-boots samples to an even sampling depth x times and then averages the results) instead of sampling once, giving more robust diversity metrics less susceptible to outliers caused by even sequencing depth sampling.