Normalizing Presence/Absence Data

kirti.p · May 30, 2024, 10:46pm

Hello! I'm analyzing presence/absence microbiome data (my OTU table is a series of ones and zeros). I understand rarefaction isn't appropriate for presence/absence data; what are the standard methods for standardizing/normalizing this type of data? Or do I not need to worry about standardizing/normalizing at all? I've seen a few papers that discuss it, but there's nothing I can find that the community seems to agree on.

Thank you so much for your help!

colinbrislawn · June 2, 2024, 6:59pm

Good afternoon,

I have not seen much literature about normalization methods for presence/absence data, though that's probably because I have not seen people use presence/absence data much either!

I presume this is because the data we normally get is relative abundance / compositional. You can use a threshold to turn this into 1s and 0s, but that's throwing out data for no obvious gain.
(Why do you want to use binary data?)

Some alpha and beta diversity metrics use binary data internally, transforming fractions into bits internally, Unweighted UniFrac being the most common example. Here, we ignore relative abundance to highlight trends in rare microbes.

I'm afraid this may not be very helpful. Can you tell us more about your data?

kirti.p · June 4, 2024, 4:15pm

Thank you for your response! Our data is run on a microarray (Axiom Microbiome Array, the next generation microarray for high-throughput pathogen and microbiome analysis - PMC), so the original data we received is binary.

For my alpha and beta diversity matrices, I'm using observed OTUs and Jaccard, respectively.

colinbrislawn · June 5, 2024, 1:27pm

Cool! Unweighted UniFrac may also work for your microarray data, and it's addition of phylogenetic context is super powerful if you can get it working.