Usefulness of Presence-absence analysis?

I’m curious as to whether anyone has strong opinions on doing presence/absence analyses, and if so, how to do them well.

I have a dataset in which I separated samples into several different sterile tubes, then stored them at different temperatures and storage durations. A committee member is asking me to conduct presence/absence analysis to identify whether specific taxa are blooming or dying during storage. However, how does one guarantee that blooming/dying isn’t an artifact of sequencing depth? Also, what would be a significant finding? Would it relate to a specific detection limit?

Any thoughts would be greatly appreciated!

Silly question: did you sequence the samples immediately prior to the different temp/storage conditions to know how much of each taxa there before? I’d think that the likelihood of identifying a blooming/dying taxa isn’t just related to sequencing depth, but by initial cell counts and/or proportions of other community members.

Yes, we have “baseline” data if you will (before storage). As well as for each temperature/storage duration combo.

Cool. What do you see in terms of the variability in relative abundances for those baselines for the taxa you’re interested in? Are they all present in your samples in any abundance, at least?
Other question, did you take multiple timepoints between start/end, or is this a “went into starge at t=0, and stopped experiment” thing?

Hello Sarah,

I think you are asking all the right questions. I’m also reticent to use presence/absence data without answering all of these. Like you hinted at, presence/absence depends on validating a level of detection and is also influenced by normalization method.

identify whether specific taxa are blooming or dying during storage

Their use of -ing verbs makes me think they want to see what’s changing, not what’s here or gone. I think the ancom plugin would be a good way to answer this:

A committee member

… is not a statistician, are they. :roll_eyes:
I would diplomatically counteroffer them the ANCOM stat test, as I think that’s a good fit for their biological question.


Thanks for the reply, Colin. The issue, in general, my committee member brings up is that differential abundance (including ANCOM) does not take into account any taxa with relative abundance of zero, thus if things are blooming/dying will not be captured with this method.

That said, ANCOM2 (and DESeq2 and LEfSe) all say the same thing: no taxa are differentially abundant before and after storage.

When I look at presence/absence using a rarefied feature table, I can only detect very small changes in only a few ASVs. I’m convinced these changes are too small withstand any statistical testing. I’m now at the point of proving this to my committee member.

In terms of detection limits, is there a recommended way of doing it?

Hello Sarah,

Ah OK! Well if you started with differential abundance testing and didn’t find much changing, finishing up with presence/absence seems very reasonable to me.

Oh you are way farther along this process than I thought! I was going to suggest building rarefied tables and lists of unique ASVs, and you already have all of that. :+1:

I think you are ready to present these results. The argument could be a mix of stats and common sense; very sparse data is hard to test, and who’s really worried about these handful of dust ASVs anyways?

Thanks for launching this discussion. I think you have answered this question well and hopefully your committee agrees!


1 Like

That’s a stats question! :balance_scale: :face_with_monocle:


Wow, I’m over my head here. Let’s see if we can get a card-carrying statistician to ‘qiime-in’ on this! Any recommendation? @mortonjt @Amy_Willis

As far as I can tell neither Catherine Lozupone or Susan Holmes has a recommended method for LOD…

zeros are tricky, because you don’t know the process that generates them.

Before digging into differential abundance - do you see if there is a large difference in beta diversity?
If I were doing this, I’d start with DEICODE to see if there are differences due to the observed abundances (this also corrects for sequencing depth).

If you don’t see differences in beta diversity - then diff abundance probably won’t help.