Best practices for detecting and removing outliers in beta diversity (after rarefaction)

Hi everyone,
I’ve rarefied my fungal community dataset to even sequencing depth and I’m preparing for beta diversity analysis (PERMANOVA, PCoA). Before running it, I used betadisper in R to calculate each sample’s distance to its group centroid (by Treatment) and flagged potential outliers as those with a distance > mean + 2×SD. Some of these samples look visually isolated on PCoA plots or have unusual taxonomic profiles.

Is this a reasonable approach for detecting outliers in beta diversity analysis? Is it recommended to remove such outliers even from the rarefied dataset? Are there best-practice guidelines in QIIME2 for handling outliers reproducibly?

Also, if someone makes this part of the comment clear.Continuing the discussion from Outliers in beta diversity analyses:

Thanks in advance for your advice!

-salma

Hello Salma,

I’m not a statistician, but usually I keep all the data and use statistical tests that are less sensitive to outliers.

I’m interested in how other approach this!

1 Like

Hi @Salma_Sarker,

I am also not a statistician but I tend to agree with @colinbrislawn that I leave outliers be. My rational here is that unless I can prove that the data is wrong in some way, its a real signal and I am not sure how I would justify its removal to reviewers. (ex: I mislabeled a sample)

I would also say that if you are seeing outliers from sampling to a single sampling depth, I might try using q2-boots and seeing if that fixes your outliers. q2-boots provides rarefaction-based diversity metrics (q2-boots samples to an even sampling depth x times and then averages the results) instead of sampling once, giving more robust diversity metrics less susceptible to outliers caused by even sequencing depth sampling.

I hope this helps!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.