OTU filtering in `qiime diversity core-metrics-phylogenetic`

nick-youngblut · November 30, 2017, 3:54pm

According to this forum post and others like it, there's no direct way to filter sequences based on a filtered feature table (eg., a feature table where OTUs with a low prevalence among samples are filtered out). So if a user doesn't figure out how to filter the sequences to match the feature table, and instead uses all OTU representative sequences to create a phylogeny, will all OTUs (tree tips) be used in the qiime diversity core-metrics-phylogenetic or will the OTUs that are filtered out in the feature table also be filtered out prior to calculating the phylogenetics-based metrics?

If core-metrics-phylogenetic doesn't filter out tree tips in order to match the filtered feature table, then I'm guessing that a lot of qiime2 users will mistakenly think that the metrics generated by core-metrics-phylogenetic are based on the filtered feature table, but in reality all sequences in the unfiltered tree are used.

colinbrislawn · November 30, 2017, 4:48pm

Hello Nick,

This is how it worked in Qiime 1. The way the metrics were implemented, only tree tips that were in the sames were compared, so you could pass the full greengenes tree and still get valid UniFrac results.

Let's confirm that for qiime 2!

Colin

thermokarst · November 30, 2017, 4:57pm

QIIME 2 uses scikit-bio under the hood for diversity metrics, which should be the same case as recent versions of QIIME 1, too (@colinbrislawn, please correct me if I am mistaken!).

My understanding is that the features that are only found in the the phylogenetic tree are filtered out automatically. I will defer to @jairideout or @ebolyen, though, as they are our resident scikit-bio experts. Thanks!

jairideout · December 1, 2017, 12:29am

@colinbrislawn and @thermokarst are correct: when calculating alpha or beta diversity with phylogenetic metrics (e.g. Faith's PD, UniFrac), extra tips in the tree that are not observed in the corresponding feature table are ignored. As @colinbrislawn mentioned above, this behavior is convenient for closed-reference OTU picking, where you have a pre-built reference tree built from all reference sequences -- it would be inconvenient to have to filter that tree prior to diversity analyses on a closed-reference OTU table. I believe the unweighted_unifrac_full_tree metric in QIIME 1's beta_diversity.py script considers all tips in the tree, but we don't have that functionality in QIIME 2.

system · January 1, 2018, 6:30am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.