[HN Gopher] How UMAP Works
       ___________________________________________________________________
        
       How UMAP Works
        
       Author : tmalsburg2
       Score  : 48 points
       Date   : 2021-07-02 12:04 UTC (1 days ago)
        
 (HTM) web link (umap-learn.readthedocs.io)
 (TXT) w3m dump (umap-learn.readthedocs.io)
        
       | optimalsolver wrote:
       | So the reduced features can be used for inference/prediction?
       | 
       | Also, is there some rule for choosing a) the number of dimensions
       | the unreduced dataset should be mapped on to, b) the number of
       | neighbors?
       | 
       | I assume the default parameters would work for most tasks.
        
         | paulgb wrote:
         | UMAP is typically used for embeddings that get displayed to a
         | human, so 2 or 3 dimensions is typical.
         | 
         | It doesn't really suit inference/prediction because you can't
         | really add new data without influencing the embedding values of
         | the other data. It's not like PCA where you can learn a
         | projection once and then map new data points to the same
         | embedding space.
        
           | nope_42 wrote:
           | Yes you can. https://umap-
           | learn.readthedocs.io/en/latest/transform.html
        
             | nestorD wrote:
             | You can but it is both costly and a hack (the resulting
             | embedding will not be as good as the one you would have
             | gotten restarting from scratch). So I would not recommend
             | using it in an inference pipeline.
        
               | lmcinnes wrote:
               | If this is a thing you want to be able to do efficiently
               | then ParametricUMAP (see [docs](https://umap-
               | learn.readthedocs.io/en/latest/parametric_umap....) and
               | [the paper](https://arxiv.org/abs/2009.12981)) will be
               | very effective. It uses a neural network to learn a
               | mapping directly from data to embedding space using a
               | UMAP loss. Pushing new data through is only slightly more
               | expensive than PCA, so being part of an inference
               | pipeline is fine.
        
       | ArnoVW wrote:
       | generally I abhor videos, but to get a "simple" version of "how
       | UMAP works", I found this presentation by Leland mcInnes, one of
       | the creators of UMAP.
       | 
       | https://www.youtube.com/watch?v=nq6iPZVUxZU
        
       | andersource wrote:
       | There's a beautiful visualization of prime factorization using
       | UMAP: https://johnhw.github.io/umap_primes/index.md.html. I
       | recommend watching the video!
        
       | cannoneyed wrote:
       | Shameless self plug here, but I saw this today and figured I
       | should link to an interactive overview article "Understanding
       | UMAP" my colleagues and I put together:
       | 
       | https://pair-code.github.io/understanding-umap/
       | 
       | UMAP is a really useful piece in the modern data science toolkit,
       | and despite its power it's surprisingly simple and elegant. But
       | as with all dimensionality reduction techniques, there's a lot of
       | ways to misread the results. High dimensional data behaves very
       | counterintuitively, and any reduction in dimensionality
       | fundamentally distorts the original data in some way.
       | Understanding the fundamental concepts behind UMAP and exploring
       | how it works is the best way to develop an intuition of what the
       | technique can and can't tell you about your data.
        
       ___________________________________________________________________
       (page generated 2021-07-03 23:01 UTC)