[HN Gopher] How UMAP Works
___________________________________________________________________
How UMAP Works
Author : tmalsburg2
Score : 48 points
Date : 2021-07-02 12:04 UTC (1 days ago)
(HTM) web link (umap-learn.readthedocs.io)
(TXT) w3m dump (umap-learn.readthedocs.io)
| optimalsolver wrote:
| So the reduced features can be used for inference/prediction?
|
| Also, is there some rule for choosing a) the number of dimensions
| the unreduced dataset should be mapped on to, b) the number of
| neighbors?
|
| I assume the default parameters would work for most tasks.
| paulgb wrote:
| UMAP is typically used for embeddings that get displayed to a
| human, so 2 or 3 dimensions is typical.
|
| It doesn't really suit inference/prediction because you can't
| really add new data without influencing the embedding values of
| the other data. It's not like PCA where you can learn a
| projection once and then map new data points to the same
| embedding space.
| nope_42 wrote:
| Yes you can. https://umap-
| learn.readthedocs.io/en/latest/transform.html
| nestorD wrote:
| You can but it is both costly and a hack (the resulting
| embedding will not be as good as the one you would have
| gotten restarting from scratch). So I would not recommend
| using it in an inference pipeline.
| lmcinnes wrote:
| If this is a thing you want to be able to do efficiently
| then ParametricUMAP (see [docs](https://umap-
| learn.readthedocs.io/en/latest/parametric_umap....) and
| [the paper](https://arxiv.org/abs/2009.12981)) will be
| very effective. It uses a neural network to learn a
| mapping directly from data to embedding space using a
| UMAP loss. Pushing new data through is only slightly more
| expensive than PCA, so being part of an inference
| pipeline is fine.
| ArnoVW wrote:
| generally I abhor videos, but to get a "simple" version of "how
| UMAP works", I found this presentation by Leland mcInnes, one of
| the creators of UMAP.
|
| https://www.youtube.com/watch?v=nq6iPZVUxZU
| andersource wrote:
| There's a beautiful visualization of prime factorization using
| UMAP: https://johnhw.github.io/umap_primes/index.md.html. I
| recommend watching the video!
| cannoneyed wrote:
| Shameless self plug here, but I saw this today and figured I
| should link to an interactive overview article "Understanding
| UMAP" my colleagues and I put together:
|
| https://pair-code.github.io/understanding-umap/
|
| UMAP is a really useful piece in the modern data science toolkit,
| and despite its power it's surprisingly simple and elegant. But
| as with all dimensionality reduction techniques, there's a lot of
| ways to misread the results. High dimensional data behaves very
| counterintuitively, and any reduction in dimensionality
| fundamentally distorts the original data in some way.
| Understanding the fundamental concepts behind UMAP and exploring
| how it works is the best way to develop an intuition of what the
| technique can and can't tell you about your data.
___________________________________________________________________
(page generated 2021-07-03 23:01 UTC)