"z-score" for sparse data (taxon distribution)

Hey, is there any "z-score" for sparse data like our count tables? I'd like somehow to compare how far a sample is from a group of samples;

For example:
I have my ASV table, and let's say I take a taxon on genus level:
Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium.
In my data, many samples have relative frequency equals to 0 of this taxon, but some of them have values like 0.002, 0.004 etc


Now I'd like to somehow measure how many kind of "standard deviations" my sample is far from the rest. I guess using a normal distribution is pointless...

Is there any good way to make such comparison?

1 Like

Hi, Damian!

It is a wide topic because except for normalization it touches upon outlier detection, and they are both non-trivial for microbiome data. There would be more than one way to accomplish this, depending on what you're looking for.
I'd start by reading publications on the normalization of microbiome data:
Methods for normalizing microbiome data: An ecological perspective
Normalization and microbial differential abundance strategies depend upon data characteristics

Cheers

2 Likes