Hi, does Qiime2 already have plugins for alternative ordination techniques to PCoA like t-SNE or UMAP? I might have a bachelor student interested in implementing a plugin for this studies. Would that be welcome, or are there already other ongoing efforts?
Hi @Stefan,
Thanks for reaching out.
t-SNE: this has been on my mind for a while, I've been meaning to wrap in q2-sample-classifier but have not gotten around to starting on it. I'd welcome you to grab that issue or make a new plugin for this!
UMAP: a quick googling shows me that @gwarmstrong may be working on a plugin for this — @gwarmstrong is that still in development? let us know if @Stefan can get involved!
Hi @Nicholas_Bokulich, thanks for the prompt reply. I very much like the interactive exploration via Emperor, thus I thought to have something to replace pcoa: Principal Coordinate Analysis — QIIME 2 2020.2.0 documentation with either t-SNE or UMAP (and maybe others as well). What would be the best place to add this functionality? Which hyperparameters shoud we explose? I figure it would be best to wrap TSNE — scikit-learn 1.6.0 documentation
@gwarmstrong any help is very welcome. Let me know if you already made some design decisions for UMAP, we might just want to copy and paste to ensure a consistent API.
Indeed! I like your plan, and having these methods output an ordination result of some kind would allow you to use this as input for emperor or other methods — note that q2-emperor takes a PCoAResults
artifact as input, so let's get @yoshiki and @ebolyen in on this conversation: should we change emperor to accept a different sort of input, e.g., OrdinationResults
? Or cheat and have t-SNE/UMAP output a PCoAResults
artifact?
As I mentioned, you would be very welcome to put this in q2-sample-classifier following that open issue above unless if you wanted to create your own new plugin for this.
Sounds good, that's what I was planning to use in sample classifier. I think all of the options for sklearn.manifold.TSNE
are worth exposing, but set useful defaults so that users don't need to fiddle too much to get something usable.
I'd recommend accepting a distance matrix as input... then any distance metric can be used, including metrics like unifrac that aren't available in sklearn. Actually, accepting a PCoAResults
artifact as input could also be useful (per the note on that page that "It is highly recommended to use another dimensionality reduction method..."). So many possibilities!
I like the idea of displaying t-SNE results using Emperor. The OrdinationResults
object is rather flexible, and can probably do the job. However, I think it would also be fine to use a different format if that made more sense. In terms of the type, I think having a DimensionalityReduction
parent type might make sense. Worst case, we can always have a qiime emperor plot-tsne
visualization and handle a new type directly.
I am happy to help with testing, and debugging any visual artifacts that might come up on the plotting side of things.
Regarding the type: I think the sklearn vocable is "embedding" as a general result from any dimensionality reduction method. I don't want to break the current q2-Emperor input, but too me it looks like we would make q2-Emperor accept either an embedding
(sklearn speech) or an OrdinationResult
(skbio speech). Technically, the current format for Emperor should directly support t-SNE, MDS or others. I would welcome @Nicholas_Bokulich making a decision here as you have the best overview of whole data types in q2.
@yoshiki thanks for your help! From what I saw, t-SNE and UMAP are typically used to produce 2D plots. I tried it with Emperor and it worked, however the default spheres have a too big default radius. Is there a way to default to a smaller one, maybe via the inputfile?

I would welcome @Nicholas_Bokulich making a decision here as you have the best overview of whole data types in q2.
@ebolyen and I chatted out-of-loop and we think that you should just output a PCoAResults
artifact for now... we can always update the method and q2-emperor later on to output/input a specific tSNEOrdination
or some other more specific type later on if necessary.

UMAP: a quick googling shows me that @gwarmstrong may be working on a plugin for this — @gwarmstrong is that still in development? let us know if @Stefan can get involved!
I have not actively been developing the plugin since the initial prototype a few months back. I would be happy to provide input on what I have done!

@gwarmstrong any help is very welcome. Let me know if you already made some design decisions for UMAP, we might just want to copy and paste to ensure a consistent API.
I think the author's implementation and documentation of UMAP is a good place to start. IIRC, there are upwards of 20 parameters to umap.UMAP
, you probably really only need the basic parameters: n_neighbors
, min_dist
, n_components
, min_dist
to start. I would also recommend using random_state
for reproducibility.
I am not sure that a consistent API with what I wrote is really necessary, AFAIK no one is using the plugin-draft I wrote.

I’d recommend accepting a distance matrix as input… then any distance metric can be used, including metrics like unifrac that aren’t available in sklearn.
Totally agree with this! In the plugin I wrote, I ended up exposing two avenues for interacting with umap.UMAP
, one that would use a feature table and one that use a distance matrix. IIRC you could actually just have one interface that accepts something like (FeatureTable, Choices([<list of metrics>]))
or (DistanceMatrix, Choices['precomputed'])
with TypeMap
! lmk if you want more guidance here.

@yoshiki thanks for your help! From what I saw, t-SNE and UMAP are typically used to produce 2D plots.
Typically in the publications I have seen, these methods are used to make 2D plots. You can use them to make 3D plots and I was able to make some nice 3D UMAP visualizations. HOWEVER, if you make 3D plots with TSNE or UMAP, you cannot really just take the top 2 components to make a 2D plot, like you can for PCOA. My understanding is that the objective functions for these methods do not enforce anything special about a particular axis (unlike PCOA, which will order axes by eigenvalue, which is invariant to the number of components).

I tried it with Emperor and it worked, however the default spheres have a too big default radius. Is there a way to default to a smaller one, maybe via the inputfile?
To do this via the interface:
- Go to the
Scale
tab in your emperor plot. - Choose a metadata variable (doesn't matter what). Do not check "Change scale by value".
- Adjust the 'global scaling' slider.
I am not sure if there is a way to set the default while generating the plot.

@ebolyen and I chatted out-of-loop and we think that you should just output a
PCoAResults
artifact for now…
This is what is done in biocore/deicode even though it is really an SVD and not a PCOA. So the precedent exists.

@yoshiki thanks for your help! From what I saw, t-SNE and UMAP are typically used to produce 2D plots. I tried it with Emperor and it worked, however the default spheres have a too big default radius. Is there a way to default to a smaller one, maybe via the inputfile?
Yes, defaults have been an ongoing work in progress. Happy to figure something out once you have some examples.

I think the author’s implementation and documentation of UMAP is a good place to start. IIRC, there are upwards of 20 parameters to
umap.UMAP
, you probably really only need the basic parameters:n_neighbors
,min_dist
,n_components
,min_dist
to start. I would also recommend usingrandom_state
for reproducibility.
If anyone is interested, I would very much be game to try and run this on the browser with one of the JS implementations out there.
Hi, are there any updates on the implementation of UMAP and t-SNE? My master thesis is about dimensional reduction and I want to add both of them to the q2_diversity plugin. Are there any different approaches at the moment?
Hey there @twollhoewer! Ccing @gwarmstrong.
For the record:
We have successfully integrated t-SNE and UMAP computation into the core q2-diversity plug-in. Happy dimensionality reduction!
https://docs.qiime2.org/2021.11/plugins/available/diversity/tsne/
https://docs.qiime2.org/2021.11/plugins/available/diversity/umap/
@gwarmstrong also published a great paper reviewing UMAP for microbiome data as well as some recommendations.