Running 'diversity core-metrics' using taxonomy

CuriousFox · July 22, 2020, 7:15pm

Hello!

I am quite new to qiime and microbiome analysis but have been playing around with the 'Moving Pictures' tutorial and other actual datasets. I don't believe this question was answered anywhere else on the forum (hopefully its not a dumb question! ).

I was wondering if it's possible to run the 'qiime diversity core-metrics-phylogenetic' command on a feature table with taxonomy? Do both the feature table and representative sequences need to be run through the classifier?

Whenever I run the command I get the following plugin error:

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-tree.qza
--i-table table-level-5.qza
--p-sampling-depth 3545
--m-metadata-file metadata.txt
--output-dir core-metrics-taxonomy \

(The table is collapsed btw! Also, the rooted tree was constructed from the original rep-seqs.qza)

Plugin error from diversity:
The table does not appear to be completely represented by the phylogeny.

I believe some of my confusion is coming from the fact that the tutorials go through doing it with OTU tables and only assign taxonomy after looking at the alpha diversity. Is there an advantage to assigning taxonomy and then doing diversity analysis?

Thanks so much!

Nicholas_Bokulich · July 22, 2020, 10:39pm

Welcome @CuriousFox!

No such thing!

Short/oversimplified answer: no. After you collapse a feature table based on taxonomy, the new feature IDs (taxonomic labels) no longer correspond to the feature IDs in your phylogeny (ASV IDs). Hence the error message you received:

You could attempt to create a phylogeny that has taxonomies as tip labels, but this is not something that QIIME 2 can do right now... so you would need to do externally, then import that tree to QIIME 2.

However, that's an issue limited to phylogenetic methods. So qiime diversity core-metrics will still work with collapsed feature tables!

no, and everyone does things differently. The tutorials are just one way to do it... many users like to assign taxonomy, use that to filter their data (e.g., remove unclassified reads), and then run diversity analyses.

So at the end of the day this becomes a question of what you really want to accomplish. Do you really want to run phylogenetic diversity metrics on a feature table collapsed by taxonomy? That sort of defeats the purpose (since phylogenetic metrics and collapsing on taxonomy are in some ways two means to the same end — accounting for the genetic similarity among organisms when assessing diversity).

I'd recommend running core-metrics-phylogenetic on your non-collapsed feature table (i.e., ASVs).

Then, you could also run core-metrics on the taxonomy-collapsed table if you want to assess things like # of unique species labels observed, etc.

Good luck!

CuriousFox · July 24, 2020, 2:12pm

Okay great! I did this and it all seemed to work well, thank you!

Out of curiosity, wouldn't having a taxonomy imply a phylogeny (since it is basically grouping the features)? So why couldn't we run phylogenetic diversity tests like Faith's on the feature tables with taxonomy? Is it because it's not a complete phylogeny of all the features present?

Nicholas_Bokulich · July 24, 2020, 2:42pm

yes in theory it implies some sort of phylogeny, but that gets more complicated

taxonomy != phylogeny, and the degree of evolutionary relatedness varies for different taxonomic groups so we can't assume too much based on a taxonomic label.
how do you handle unclassified or only partially classified taxa? In those cases you cannot automatically assume any type of evolutionary distance based on taxonomic labels alone.
Besides, you need to have taxonomy labels mapped to a phylogeny... whereas a phylogeny you create with QIIME 2 or derive from a reference database will have sequence IDs mapped to that phylogeny.

So there could be ways to build a dendrogram based on hierarchical taxonomic relationships and use that for unifrac etc... but such an object would need to be created outside of QIIME 2 and imported in. Or there may be external methods to collapse a tree based on taxonomy (to average branch length in a more sophisticated way), but I don't know them.

Good luck!