Dividing relative abundances in ASV table

Dot · April 6, 2022, 4:25am

Is there a simple way to divide relative abundances in QIIME2 (i.e. to determine fold-change in relative abundance between two timepoints)?

Our study consists of pre- and post-intervention stool samples for which we have 16S v4 sequencing data, and I am trying to compare all the “pre” relative abundances to “post” relative abundances of each sample. One way I was trying to do this was to export and divide the two ASV tables, after raw counts have been converted to relative abundances. However, the problem I am running into is how to account for zeros in the table (i.e. there are many instances in my table in which the raw ASV count is zero for a particular ASV in the “pre” sample, but not zero for the same ASV in the “post” sample, so when I divide the post/pre values to determine fold change in relative abundance, the resulting value is undefined). Should I add random noise or add a constant value like “1” to the raw ASV counts? If so, is there a way to do this within QIIME2 or is this something I need to do outside of QIIME2?

(The reason I'm not using a method like ANCOM here, which I’ve used in the past, is that I’m not interested in seeing which microbes are significantly differentially abundant between the two timepoints but rather obtaining the fold-change values of all ASVs to then use as input in other statistical models, as well as visualizations like NMDS. I hope this makes sense! Alternatively, if there are there other methods that might work better here, please let me know! I am new to these types of analyses so I am not sure I am going about this the right way.)

Thank you!

colinbrislawn · April 9, 2022, 5:07pm

Good afternoon Dot,

I think this would be a great question to bring to a statistician. I think a colleague of yours with a background in stats could help clarify your question and the limitations of the data set you have.

I'm not a statistician, but hopefully I can provide some clarity.

So instead of getting a table of significant (alpha = 0.05) fold-changes, you want a table all fold-changes (alpha = 1.0). This is still a calculation of differential abundance, even if you are not using an alpha threshold to establish significance.

This is a problem with sparsity whenever comparing two groups. Your suggestion of adding a 'pseudocount' of 1 is not uncommon, and has its own problems.
See this paper

and this sassy follow-up showing that using 1 or 0.01 or 0.00001 as your pseudocount makes a big differance!
https://www.nature.com/articles/nmeth.2897.pdf?origin=ppub

Yes, see Microbiome Datasets Are Compositional: And This Is Not Optional and the ALDEx2 package, which may be helpful for your study as it can handle sparse data.

Dot · April 12, 2022, 5:43pm

Hi Colin,

Thanks so much for your reply, including the links and resources! This is very helpful.

I checked out ALDEx2 and I'm not sure it will work for this particular issue because it seems like it allows me to see the differential abundances of features overall between two groups (pre-intervention samples and post-intervention samples), but I'm not sure how to extract how each ASV changes between each participant's pre and post samples (but perhaps I am missing something). So for now I think will have to deal with pseudo counts since I want to see the fold change value of each individual ASV for each subject (e.g. for subject 1, the fold change between their sample at timepoints 1 and 2 were: ASV1 = 1.5, ASV2 = 0.8, ASV3 = 3, etc.). If this even makes sense to do, one of the things I'm trying to use this for is to make a "fold change ASV table" to then calculate Bray Curtis distances and then perform an NMDS ordination so then each dot on the plot will represent a fold change between two samples from the same participant.

But from what you have linked I am gathering there seems to be no solid consensus on what a "good" pseudo count value is, and there might be a different decision that works best for each particular dataset. I will discuss this further with a statistician but I'm wondering if it would be sufficient to use something that seems to work best for our dataset, then justify it accordingly?

Thank you for your help!

colinbrislawn · April 12, 2022, 7:47pm

Correct. In general, calculating fold-change is a controversial topic right now.

OK great!

I'm not sure what you mean by this? How would you tell that it's working well for your data? Do you have controls you could use for validation?

Dot · April 12, 2022, 8:35pm

Really good to know!

Hmm I guess I was just thinking about "working well" in terms of not artificially inflating fold changes by using a pseudo count that was too small, and not adding a pseudo count that was too large relative to the other "real" ASV counts I have. But I don't have a solid metric for this. I'm sorry for being unclear, I am still working through how to think about this

Although I don't know if these will help with this specifically, we have sequenced PCR blanks and dna extraction blanks (no template controls).

colinbrislawn · April 12, 2022, 11:31pm

Well said. That's a really good description of why this problem is so hard.

I don't have a solid metric either, which is why I asked. This question is surprisingly tricky!

Let's see if a statistician has more perspective and some other paths forward.

Dot · April 13, 2022, 3:55am

Thank you for all your time and help, I really appreciate it!