Heatmap metric vs method

hsapers · March 12, 2021, 2:56pm

Hello - I have what's likely a pretty basic question - but can't seem to find documentation to confirm:

I'm running qiime feature-table heatmap (using transformed relative abundance I re-imported as a FeatureTable[Frequency] object). I'm assuming that I wouldn't want to further normalize (or add a pseudo count), but I'm curious as to what the metrics and methods are operating over.

My assumption is that metric applies a distance metric to the data in the input table and that method applies a clustering method to operate over the generated distance metric and the result is the feature dendrogram and grouping of features. Are samples clustered using the same method selection (I'm trying to think of a case where one might want to use different clustering methods for samples and features...)?

Thank you

thermokarst · March 12, 2021, 3:39pm

Hi @hsapers!

Let's take a peek at the docs for the heatmap visualizer:

  --p-metric TEXT Choices('canberra', 'correlation', 'jaccard',
    'cityblock', 'kulsinski', 'hamming', 'sokalsneath', 'mahalanobis', 'dice',
    'minkowski', 'cosine', 'matching', 'euclidean', 'sqeuclidean',
    'sokalmichener', 'rogerstanimoto', 'russellrao', 'seuclidean',
    'braycurtis', 'yule', 'chebyshev')
                         Metrics exposed by seaborn (see
                         http://seaborn.pydata.org/generated/seaborn.clusterma
                         p.html#seaborn.clustermap for more detail).
                                                        [default: 'euclidean']
  --p-method TEXT Choices('average', 'complete', 'weighted', 'centroid',
    'single', 'ward', 'median')
                         Clustering methods exposed by seaborn (see
                         http://seaborn.pydata.org/generated/seaborn.clusterma
                         p.html#seaborn.clustermap for more detail).
                                                          [default: 'average']

Okay, focusing in on the method parameter for now, let's check out that link:

http://seaborn.pydata.org/generated/seaborn.clustermap.html#seaborn.clustermap

Linkage method to use for calculating clusters. See scipy.cluster.hierarchy.linkage() documentation for more information.

Wow, this is turning into a game of telephone. Let's check out that scipy link in there:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy.linkage

Okay, now that we have some docs on hand:

I don't think that's always the case. Let's put it this way, this QIIME 2 plugin is not computing the metric before handing off the data to seaborn to plot, we're just exposing the seaborn function here. According to the scipy docs:

metric str or function, optional
The distance metric to use in the case that y is a collection of observation vectors; ignored otherwise. See the pdist function for a list of valid distance metrics. A custom distance function can also be used.

So the metric is only used as part of the clustering algorithm, and isn't applied to the final table.

It depends on which axis you choose to cluster on:

  --p-cluster TEXT Choices('features', 'both', 'none', 'samples')
                         Specify which axes to cluster.      [default: 'both']

Hope that helps! Sorry for the game of telephone with the docs...

:qiime2:

hsapers · March 12, 2021, 3:41pm

Thanks @thermokarst definitely clears up some order of operations questions

thermokarst · March 12, 2021, 3:46pm

Glad to hear it! The backstory on this visualizer is that we needed a clustered heatmap for a paper once upon a time, so I wrapped some seaborn functions in a QIIME 2 visualizer - all of the clustering logic is left to the scipy/seaborn experts. The one thing to keep in mind is this method does apply log10 normalization to the input table by default.

system · April 12, 2021, 9:46pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.