Hi, does Qiime2 already have plugins for alternative ordination techniques to PCoA like t-SNE or UMAP? I might have a bachelor student interested in implementing a plugin for this studies. Would that be welcome, or are there already other ongoing efforts?
Thanks for reaching out.
t-SNE: this has been on my mind for a while, I’ve been meaning to wrap in q2-sample-classifier but have not gotten around to starting on it. I’d welcome you to grab that issue or make a new plugin for this!
UMAP: a quick googling shows me that @gwarmstrong may be working on a plugin for this — @gwarmstrong is that still in development? let us know if @Stefan can get involved!
Hi @Nicholas_Bokulich, thanks for the prompt reply. I very much like the interactive exploration via Emperor, thus I thought to have something to replace https://docs.qiime2.org/2020.2/plugins/available/diversity/pcoa/ with either t-SNE or UMAP (and maybe others as well). What would be the best place to add this functionality? Which hyperparameters shoud we explose? I figure it would be best to wrap https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
@gwarmstrong any help is very welcome. Let me know if you already made some design decisions for UMAP, we might just want to copy and paste to ensure a consistent API.
Indeed! I like your plan, and having these methods output an ordination result of some kind would allow you to use this as input for emperor or other methods — note that q2-emperor takes a
PCoAResults artifact as input, so let’s get @yoshiki and @ebolyen in on this conversation: should we change emperor to accept a different sort of input, e.g.,
OrdinationResults? Or cheat and have t-SNE/UMAP output a
As I mentioned, you would be very welcome to put this in q2-sample-classifier following that open issue above unless if you wanted to create your own new plugin for this.
Sounds good, that’s what I was planning to use in sample classifier. I think all of the options for
sklearn.manifold.TSNE are worth exposing, but set useful defaults so that users don’t need to fiddle too much to get something usable.
I’d recommend accepting a distance matrix as input… then any distance metric can be used, including metrics like unifrac that aren’t available in sklearn. Actually, accepting a
PCoAResults artifact as input could also be useful (per the note on that page that “It is highly recommended to use another dimensionality reduction method…”). So many possibilities!
I like the idea of displaying t-SNE results using Emperor. The
OrdinationResults object is rather flexible, and can probably do the job. However, I think it would also be fine to use a different format if that made more sense. In terms of the type, I think having a
DimensionalityReduction parent type might make sense. Worst case, we can always have a
qiime emperor plot-tsne visualization and handle a new type directly.
I am happy to help with testing, and debugging any visual artifacts that might come up on the plotting side of things.
Regarding the type: I think the sklearn vocable is “embedding” as a general result from any dimensionality reduction method. I don’t want to break the current q2-Emperor input, but too me it looks like we would make q2-Emperor accept either an
embedding (sklearn speech) or an
OrdinationResult (skbio speech). Technically, the current format for Emperor should directly support t-SNE, MDS or others. I would welcome @Nicholas_Bokulich making a decision here as you have the best overview of whole data types in q2.
@yoshiki thanks for your help! From what I saw, t-SNE and UMAP are typically used to produce 2D plots. I tried it with Emperor and it worked, however the default spheres have a too big default radius. Is there a way to default to a smaller one, maybe via the inputfile?
@ebolyen and I chatted out-of-loop and we think that you should just output a
PCoAResults artifact for now… we can always update the method and q2-emperor later on to output/input a specific
tSNEOrdination or some other more specific type later on if necessary.
I have not actively been developing the plugin since the initial prototype a few months back. I would be happy to provide input on what I have done!
I think the author’s implementation and documentation of UMAP is a good place to start. IIRC, there are upwards of 20 parameters to
umap.UMAP, you probably really only need the basic parameters:
min_dist to start. I would also recommend using
random_state for reproducibility.
I am not sure that a consistent API with what I wrote is really necessary, AFAIK no one is using the plugin-draft I wrote.
Totally agree with this! In the plugin I wrote, I ended up exposing two avenues for interacting with
umap.UMAP, one that would use a feature table and one that use a distance matrix. IIRC you could actually just have one interface that accepts something like
(FeatureTable, Choices([<list of metrics>])) or
(DistanceMatrix, Choices['precomputed']) with
TypeMap! lmk if you want more guidance here.
Typically in the publications I have seen, these methods are used to make 2D plots. You can use them to make 3D plots and I was able to make some nice 3D UMAP visualizations. HOWEVER, if you make 3D plots with TSNE or UMAP, you cannot really just take the top 2 components to make a 2D plot, like you can for PCOA. My understanding is that the objective functions for these methods do not enforce anything special about a particular axis (unlike PCOA, which will order axes by eigenvalue, which is invariant to the number of components).
To do this via the interface:
- Go to the
Scaletab in your emperor plot.
- Choose a metadata variable (doesn’t matter what). Do not check “Change scale by value”.
- Adjust the ‘global scaling’ slider.
I am not sure if there is a way to set the default while generating the plot.
This is what is done in biocore/deicode even though it is really an SVD and not a PCOA. So the precedent exists.
Yes, defaults have been an ongoing work in progress. Happy to figure something out once you have some examples.
If anyone is interested, I would very much be game to try and run this on the browser with one of the JS implementations out there.