[HN Gopher] Contrastive Representation Learning
       ___________________________________________________________________
        
       Contrastive Representation Learning
        
       Author : yamrzou
       Score  : 64 points
       Date   : 2021-07-12 17:36 UTC (5 hours ago)
        
 (HTM) web link (lilianweng.github.io)
 (TXT) w3m dump (lilianweng.github.io)
        
       | version_five wrote:
       | I've done some work with contrastive learning, and I see pros and
       | cons. You are somehow trading off direct supervision for some
       | other assumptions about invariances in the data. This often works
       | very well. I have also seen cases where contrastive learning
       | fails to latch on to the features you want it to. And you end up
       | effectively trying to do some feature engineering / preprocessing
       | to highlight what you want the model to notice.
       | 
       | So bottom line, I think CL is a specific instance of finding a
       | simple rule or pattern that we can use to label select features
       | that work for many tasks, but that's pretty much all it is. I
       | think it's good progress to be able to find some of these simple
       | core rules about what ML models are really noticing.
        
         | jszymborski wrote:
         | I think our ability to consistently train good CL models will
         | get a lot better if/when we better understand how to
         | disentangle representations.
         | 
         | There's already been great progress, but the better we're able
         | to create meaningful latent spaces, the better we're going to
         | get at CL (maybe that's a self-evident statement :P ).
        
           | version_five wrote:
           | I think I agree, but do you think that in some way getting a
           | more meaningful latent space will just take us back to
           | classical kinds of models (my background is image processing
           | so that's what I'm thinking of). Like if we can have a
           | semantically relevant latent space, it is definitely a win,
           | but it also sort of is a step back towards rules about what
           | we expect to see, vs letting training figure it out. (And,
           | the semantically relevant features may still themselves be
           | found opaquely). I'm not sure how to think about all this,
           | but I worry about a "turtles all the way down" situation
           | where some higher level understanding is gained at the
           | expense of lower level understanding.
        
       | maxs wrote:
       | I don't quite understand how this works in an unsupervised
       | setting.
       | 
       | The only thing that comes to mind is embedding that preserves
       | distance, such as MDS
       | (https://en.wikipedia.org/wiki/Multidimensional_scaling#Metri...)
        
         | adw wrote:
         | One intuition is that you can generate pairs which you know to
         | be the "same thing" (a single example under heavy augmentation)
         | and ensure they're close in representation space whereas
         | mismatched pairs are maximized in distance.
         | 
         | That's a label-free approach which should give you a space with
         | nice properties for eg nearest-neighbor approaches, and
         | there's, it follows, some reason to believe then that it'd be a
         | generally useful feature space for downstream problems.
        
       | andrewtbham wrote:
       | Seems like nnclr would be covered also.
       | 
       | https://arxiv.org/pdf/2104.14548.pdf
        
       | joe_the_user wrote:
       | Seems like a cool concept.
       | 
       | At the same time, it seems like one encounters a fundamental
       | problem with going from a deep learning paradigm to a learning
       | paradigm.
       | 
       | For regular deep learning, you gather enough data to allow you to
       | massive, brute-force curve-fitting that reproduces the patterns
       | within the data. But even with this, you encounter problem of
       | finding bogus patterns as well as useful patterns in the data and
       | also the problem of the data changing over time.
       | 
       | Now, in adding "learning to learning" approaches to deep
       | learning, you are also do brute-force, curve-fitting to discover
       | the transformation between data-pair or similar things that are
       | involved in change, new-stuff arriving. But this too is dependent
       | on the massive data-set, it might learn wrong-things and the kind
       | of change involved might itself change. But that's a more
       | fundamental problem for the learning-to-learn system, because
       | these systems are the one that are expected to deal with new
       | data.
       | 
       | I've heard one-shot/zero-shot learning still hasn't found many
       | applications for these reasons. Maybe the answer is systems using
       | truly massive dataset like Gpt-3.
        
         | BobbyJo wrote:
         | "Learning to learn" with massive amounts of data feels like it
         | might be more inline with nature. Human learning is based on
         | millions of different learning strategies that were encoded
         | into the neurons of each of our ancestors and applied to
         | billions and billions of lifetimes of data. The structure of
         | our brain, and therefore how we learn, was itself learned over
         | millions of generation of trial and error.
        
           | joe_the_user wrote:
           | People every day deal with effectively with new and unknown
           | situations. Some are new as in never-seen-before, some are
           | new as in a variation of what came before and some are
           | combination.
           | 
           | Maybe it took millions of years to come up with this
           | algorithm but it seems like the approach is more than just
           | some long incremental thing.
           | 
           | Deer are the product of millions of years of evolution also.
           | Deer never learn to look both ways before cross a highway,
           | though they can learn a significant number of other things.
        
             | space_fountain wrote:
             | I'm not sure what the point is. Are you saying that because
             | deer don't have what you might call generalized
             | intelligence a data driven learned approach can't or won't?
             | I think most people agree that humans are smarter than deer
             | and there is probably some importance to the conditions
             | that molded us, but it still seems like our intelligence is
             | still "just" the result of learning to learn
        
             | BobbyJo wrote:
             | True, but the cost function deer are optimizing for
             | diverged from our own millions of years ago. Whose to say
             | how much of our intelligence comes from the prior epoch,
             | and how much comes from the latter.
        
       ___________________________________________________________________
       (page generated 2021-07-12 23:00 UTC)