Batch Correction - Dealing With Negative Values for Diversity Analysis

Nicholas_Bokulich · May 23, 2018, 3:58pm

Hi @Zachary_Bendiks,
Sounds like you are in a pickle!

First off — I applaud your due diligence in appropriately structuring your runs and testing for a batch effect proactively.

No. If qiime1 could handle negative values, it sounds like a bug. What does an abundance of -5 really even mean!? (note that ComBat is designed for microarray data as far as I know, not for microbiome data where negative abundance is troubling) Most downstream methods in QIIME2 are assuming that the data in a FeatureTable[Frequency] artifact are counts, not some type of normalized data, and so negative values will break some of those methods' assumptions.

I do not know enough about ComBat and whether it performs some type of count normalization in addition to batch normalization — is the output still considered "count" or is it something like mean-centered value? So the difference between -5 and -10 could actually be meaningful; converting all negative values to 0 might not be kosher, nor could adding N to all values where N = the min value.

@cduvallet might have some advice on working with ComBat data.

Depending on your experimental setup, there might be.

The brand new q2-perc-norm plugin is designed for normalization of case-control studies (and was compared to ComBat in the original paper).

Your batch effects may also be from contaminants that are unique to each run. If you have negative controls, you could try decontam (not yet a QIIME2 plugin but we hope to add soonish). See this post for some more details:

Let us know what you think! And let's see if @cduvallet and @benjjneb have any other advice.