[HN Gopher] Contrastive Representation Learning
___________________________________________________________________
Contrastive Representation Learning
Author : yamrzou
Score : 64 points
Date : 2021-07-12 17:36 UTC (5 hours ago)
(HTM) web link (lilianweng.github.io)
(TXT) w3m dump (lilianweng.github.io)
| version_five wrote:
| I've done some work with contrastive learning, and I see pros and
| cons. You are somehow trading off direct supervision for some
| other assumptions about invariances in the data. This often works
| very well. I have also seen cases where contrastive learning
| fails to latch on to the features you want it to. And you end up
| effectively trying to do some feature engineering / preprocessing
| to highlight what you want the model to notice.
|
| So bottom line, I think CL is a specific instance of finding a
| simple rule or pattern that we can use to label select features
| that work for many tasks, but that's pretty much all it is. I
| think it's good progress to be able to find some of these simple
| core rules about what ML models are really noticing.
| jszymborski wrote:
| I think our ability to consistently train good CL models will
| get a lot better if/when we better understand how to
| disentangle representations.
|
| There's already been great progress, but the better we're able
| to create meaningful latent spaces, the better we're going to
| get at CL (maybe that's a self-evident statement :P ).
| version_five wrote:
| I think I agree, but do you think that in some way getting a
| more meaningful latent space will just take us back to
| classical kinds of models (my background is image processing
| so that's what I'm thinking of). Like if we can have a
| semantically relevant latent space, it is definitely a win,
| but it also sort of is a step back towards rules about what
| we expect to see, vs letting training figure it out. (And,
| the semantically relevant features may still themselves be
| found opaquely). I'm not sure how to think about all this,
| but I worry about a "turtles all the way down" situation
| where some higher level understanding is gained at the
| expense of lower level understanding.
| maxs wrote:
| I don't quite understand how this works in an unsupervised
| setting.
|
| The only thing that comes to mind is embedding that preserves
| distance, such as MDS
| (https://en.wikipedia.org/wiki/Multidimensional_scaling#Metri...)
| adw wrote:
| One intuition is that you can generate pairs which you know to
| be the "same thing" (a single example under heavy augmentation)
| and ensure they're close in representation space whereas
| mismatched pairs are maximized in distance.
|
| That's a label-free approach which should give you a space with
| nice properties for eg nearest-neighbor approaches, and
| there's, it follows, some reason to believe then that it'd be a
| generally useful feature space for downstream problems.
| andrewtbham wrote:
| Seems like nnclr would be covered also.
|
| https://arxiv.org/pdf/2104.14548.pdf
| joe_the_user wrote:
| Seems like a cool concept.
|
| At the same time, it seems like one encounters a fundamental
| problem with going from a deep learning paradigm to a learning
| paradigm.
|
| For regular deep learning, you gather enough data to allow you to
| massive, brute-force curve-fitting that reproduces the patterns
| within the data. But even with this, you encounter problem of
| finding bogus patterns as well as useful patterns in the data and
| also the problem of the data changing over time.
|
| Now, in adding "learning to learning" approaches to deep
| learning, you are also do brute-force, curve-fitting to discover
| the transformation between data-pair or similar things that are
| involved in change, new-stuff arriving. But this too is dependent
| on the massive data-set, it might learn wrong-things and the kind
| of change involved might itself change. But that's a more
| fundamental problem for the learning-to-learn system, because
| these systems are the one that are expected to deal with new
| data.
|
| I've heard one-shot/zero-shot learning still hasn't found many
| applications for these reasons. Maybe the answer is systems using
| truly massive dataset like Gpt-3.
| BobbyJo wrote:
| "Learning to learn" with massive amounts of data feels like it
| might be more inline with nature. Human learning is based on
| millions of different learning strategies that were encoded
| into the neurons of each of our ancestors and applied to
| billions and billions of lifetimes of data. The structure of
| our brain, and therefore how we learn, was itself learned over
| millions of generation of trial and error.
| joe_the_user wrote:
| People every day deal with effectively with new and unknown
| situations. Some are new as in never-seen-before, some are
| new as in a variation of what came before and some are
| combination.
|
| Maybe it took millions of years to come up with this
| algorithm but it seems like the approach is more than just
| some long incremental thing.
|
| Deer are the product of millions of years of evolution also.
| Deer never learn to look both ways before cross a highway,
| though they can learn a significant number of other things.
| space_fountain wrote:
| I'm not sure what the point is. Are you saying that because
| deer don't have what you might call generalized
| intelligence a data driven learned approach can't or won't?
| I think most people agree that humans are smarter than deer
| and there is probably some importance to the conditions
| that molded us, but it still seems like our intelligence is
| still "just" the result of learning to learn
| BobbyJo wrote:
| True, but the cost function deer are optimizing for
| diverged from our own millions of years ago. Whose to say
| how much of our intelligence comes from the prior epoch,
| and how much comes from the latter.
___________________________________________________________________
(page generated 2021-07-12 23:00 UTC)