diversity metrics and batch effects

cduvallet · December 5, 2019, 4:04am

The output of percentile-normalization is definitely not "counts", and should not be used in metrics that require counts. For example, Chao1 uses the number of singletons and doubletons to calculate alpha diversity, and so would be inappropriate for use with percentile-normalized data.

Unfortunately, we haven't gone through and identified which of the many metrics available are appropriate to use with percentile-normalized data, so you'll have to go through and see how each one is calculated and whether they make assumptions about the data that might not be applicable here.

I think one of the most important things to note with the percentile-normalized output is that we add random noise to the zeros (to prevent pile-up of ranks, see more here), so any metric that uses zero as a meaningful value will not work for percentile-normalized data.

If you have questions about any specific metrics that you've thought about but can't figure out, feel free to post again and we'll see if we can figure it out together (make sure to tag me and/or @seangibbons so we see the post).

Another option to deal with batch effects (beyond the ones suggested in the post linked by @Nicholas_Bokulich ) is just doing your analyses on a per-batch basis and then comparing results across batches. (e.g. calculating beta-diversity just on samples within each batch, and excluding all cross-batch calculations).