"z-score" for sparse data (taxon distribution)

thedam · September 1, 2021, 2:34pm

Hey, is there any "z-score" for sparse data like our count tables? I'd like somehow to compare how far a sample is from a group of samples;

For example:
I have my ASV table, and let's say I take a taxon on genus level:
Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium.
In my data, many samples have relative frequency equals to 0 of this taxon, but some of them have values like 0.002, 0.004 etc

Now I'd like to somehow measure how many kind of "standard deviations" my sample is far from the rest. I guess using a normal distribution is pointless...

Is there any good way to make such comparison?

crusher083 · September 1, 2021, 2:45pm

Hi, Damian!

It is a wide topic because except for normalization it touches upon outlier detection, and they are both non-trivial for microbiome data. There would be more than one way to accomplish this, depending on what you're looking for.
I'd start by reading publications on the normalization of microbiome data:
Methods for normalizing microbiome data: An ecological perspective
Normalization and microbial differential abundance strategies depend upon data characteristics

Cheers