Is percentile-normalized data appropriate for UniFrac?

ptalebic · July 11, 2020, 4:36pm

Hi Claire,

Sorry I was following the post and one question poped into my mind.

I was wondering if you could let me know whether I could use percentile-normalized data for metrics such as weighted and unweighted Unifrac. The only method available in qiime 2 is rarefaction and I am not sure if its appropriate for such metrics.

cduvallet · July 11, 2020, 8:06pm

Hi @ptalebic! Unfortunately, I don't think percentile-normalized data will work for either weighted or unweighted unifrace.

Unweighted unifrac uses the presence/absence of OTUs (or ASVs or whatever your features are) to calculate beta diversity. Because percentile normalization can't handle zeroes very well, the algorithm adds non-zero noise to any OTUs with zero count (see my blog post for more on that). That means that basically all of your values will be non-zero after percentile normalization, and so the unweighted unifrac calculation will not be meaningful.

Weighted unifrac also won't be meaningful, since the data that comes out of percentile-normalization isn't actually an abundance -- it's the percentile that this OTU in this sample falls relative to this OTU in all control samples. So using that as an abundance doesn't really make sense. (But maybe @seangibbons has additional thoughts on this? [Hi Sean!])

Finally, rarefaction also won't work because the percentiles that come out of the normalization aren't discrete values, they are continuous from 0 to 100. Rarefaction only works with counts, so it's not applicable here.

Sorry to be a bummer! Hopefully @seangibbons can provide additional insight into what metrics might be useful, or perhaps others in the QIIME 2 community have found some good ones.

ptalebic · July 11, 2020, 8:40pm

Thank you so much for the great explanation.

seangibbons · July 11, 2020, 10:46pm

Yup, I agree completely with Claire [Hi Claire!]. Percentile normalization will erase any information on how abundant one OTU is relative to another within a sample, so weighted distance beta-diversity metrics (like weighted UniFrac or Bray Curtis) will be less meaningful and difficult to interpret. And the fuzzy zero issue makes unweighted beta-diversity metrics derived from percentile-normalized data problematic, as Claire described.

jwdebelius · July 13, 2020, 3:39am

Hi @ptalebic,

I think currently Rob Knight's group uses rarefaction for UniFrac distance based on the 2017 paper by Weiss et al. Rob developed UniFrac, so while it's a necessary evil, it's a current recommendation. An adonis model would let you include the sequencing depth as a term, so you could adjust for that in some of your statistics.

If you want to go rarefaction-less, you may want to look at DECOIDE, which avoids rarefaction but loses the phylogenetic inference.

Best,
Justine

system · August 13, 2020, 9:39am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.