Filtering before caluculating UniFrac

Hello. I have a question regarding UniFrac calculation. Is this ok to prefilter some low prevalence taxa prior to calculating weighted unifrac? As i have read that we should include all the taxa in samples we are interested. Thanks in advance.

Hi @dearmrm,

Low-abundance taxa will not dramatically impact the results of weighted UniFrac. Distances are weighted by the abundance of each feature not shared by each pair of samples, so will be highly impacted by abundant features.

Still, I would not recommend filtering these for two reasons:

  1. even though there should not be a big impact, it would be better to keep the results simple and transparent. The underlying data should be consistent with other analyses that you are doing. For example:
  2. Unweighted UniFrac will be highly impacted by low-abundance features. Ideally you should use the same underlying data for all distance metrics.

The exception is if you are not interested in these low-abundance features at all, and are filtering them prior to ALL downstream analyses. E.g., we recommend filtering low-abundance OTUs when doing OTU picking. Those ought to be removed prior to any analysis, since many of these (esp. singletons) are very likely to be erroneous.

I hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.