Sorry I was following the post and one question poped into my mind.
I was wondering if you could let me know whether I could use percentile-normalized data for metrics such as weighted and unweighted Unifrac. The only method available in qiime 2 is rarefaction and I am not sure if its appropriate for such metrics.
Hi @ptalebic! Unfortunately, I don’t think percentile-normalized data will work for either weighted or unweighted unifrace.
Unweighted unifrac uses the presence/absence of OTUs (or ASVs or whatever your features are) to calculate beta diversity. Because percentile normalization can’t handle zeroes very well, the algorithm adds non-zero noise to any OTUs with zero count (see my blog post for more on that). That means that basically all of your values will be non-zero after percentile normalization, and so the unweighted unifrac calculation will not be meaningful.
Weighted unifrac also won’t be meaningful, since the data that comes out of percentile-normalization isn’t actually an abundance – it’s the percentile that this OTU in this sample falls relative to this OTU in all control samples. So using that as an abundance doesn’t really make sense. (But maybe @seangibbons has additional thoughts on this? [Hi Sean!])
Finally, rarefaction also won’t work because the percentiles that come out of the normalization aren’t discrete values, they are continuous from 0 to 100. Rarefaction only works with counts, so it’s not applicable here.
Sorry to be a bummer! Hopefully @seangibbons can provide additional insight into what metrics might be useful, or perhaps others in the QIIME 2 community have found some good ones.
Yup, I agree completely with Claire [Hi Claire!]. Percentile normalization will erase any information on how abundant one OTU is relative to another within a sample, so weighted distance beta-diversity metrics (like weighted UniFrac or Bray Curtis) will be less meaningful and difficult to interpret. And the fuzzy zero issue makes unweighted beta-diversity metrics derived from percentile-normalized data problematic, as Claire described.
I think currently Rob Knight’s group uses rarefaction for UniFrac distance based on the 2017 paper by Weiss et al. Rob developed UniFrac, so while it’s a necessary evil, it’s a current recommendation. An adonis model would let you include the sequencing depth as a term, so you could adjust for that in some of your statistics.
If you want to go rarefaction-less, you may want to look at DECOIDE, which avoids rarefaction but loses the phylogenetic inference.