[HN Gopher] Fun times with energy-based models
       ___________________________________________________________________
        
       Fun times with energy-based models
        
       Author : mpmisko
       Score  : 69 points
       Date   : 2024-08-16 19:16 UTC (1 days ago)
        
 (HTM) web link (mpmisko.github.io)
 (TXT) w3m dump (mpmisko.github.io)
        
       | esafak wrote:
       | I think the rationale for using tricks like score matching and
       | contrastive divergence deserves a mention: the partition function
       | is computationally expensive.
       | 
       | Since we're on the subject, what are EBMs good for today?
        
         | jiggawatts wrote:
         | This paper lists the benefits in the introduction:
         | https://proceedings.neurips.cc/paper_files/paper/2019/file/3...
         | 
         | - Simplicity and Stability: An EBM is the only object that
         | needs to be trained and designed. Separate networks are not
         | tuned to ensure balance.
         | 
         | - Sharing of Statistical Strength: Since the EBM is the only
         | trained object, it requires fewer model parameters than
         | approaches that use multiple networks.
         | 
         | - Adaptive Computation Time: Implicit sample generation is an
         | iterative stochastic optimization process, which allows for a
         | trade-off between generation quality and computation time.
         | 
         | - VAEs and flow-based models are bound by the manifold
         | structure of the prior distribution and consequently have
         | issues modelling discontinuous data manifolds, often assigning
         | probability mass to areas unwarranted by the data. EBMs avoid
         | this issue by directly modelling particular regions as high or
         | lower energy.
         | 
         | - Compositionality: If we think of energy functions as costs
         | for a certain goals or constraints, summation of two or more
         | energies corresponds to satisfying all their goals or
         | constraints.
        
           | programjames wrote:
           | As far as I can tell, flow-based models are bound by the
           | exact same requirements as energy based models (flow =
           | diffusion/normalizing flow/flow-matching models). But they're
           | absolutely right about VAEs. Those are a memetic virus that
           | need to die off in favor of more theoretically grounded
           | encoders.
        
         | mpmisko wrote:
         | EBMs show up all over the place, apparently even your
         | classifier is an EBM :) (https://arxiv.org/abs/1912.03263).
        
           | uoaei wrote:
           | You can take many equivalent perspectives on learning
           | systems, but mostly it reduces to "messing with denominators
           | in Bayes' rule". This is no different.
           | 
           | EBMs today aren't used because first you have to fit the
           | joint model, then you have to fix some inputs, then fit the
           | other inputs in a second optimization step. That's just too
           | much compute for today's workloads compared to feedforward
           | NNs.
        
         | programjames wrote:
         | They're good for reinforcement learning. E.g. Cicero uses piKL
         | which samples according to
         | 
         | p [?] anchor_policy * exp(utility / temperature)
         | 
         | The utility is exactly the same as "energy". The article
         | ignores entropy, but you can add in entropy regularization e.g.
         | in soft actor-critic.
        
         | slashdave wrote:
         | Also: you are free to model p(x) without worrying about
         | normalization, something that would be required to maximize
         | likelihood.
        
       | blt wrote:
       | If the author is reading: In the proof, at the end of Step 6,
       | it's confusing that the "uv" term of the integration by parts is
       | suddenly given a range from -[?] to [?], as if we had previously
       | assumed x [?] R. But elsewhere in the article, including the
       | examples, we have higher-dimensional x's. I suggest to either 1)
       | include the full multidimensional version from the paper, or 2)
       | explicitly mention that this is the simple 1d case, the same
       | result holds in R^n, and refer to the paper.
        
         | cshimmin wrote:
         | In the multidimensional case of integration by parts, the limit
         | of integration is understood to be the (d-1 dimensional)
         | boundary _at infinity _, so writing +/-inf is a reasonable
         | shorthand. In almost all cases, it's used when the term uv can
         | be assumed to be zero (or at least constant) at this boundary.
        
       | adamnemecek wrote:
       | We are working on a startup that is revisiting the math that
       | underlies EBMs. If you want to work with us or invest, check out
       | these links
       | 
       | http://traceoid.ai
       | 
       | https://x.com/adamnemecek1/status/1822727041399328839
        
         | uoaei wrote:
         | Is there a chance that you would ever consider passionate non-
         | PhD candidates with corporate, startup, and FAANG experience?
        
       | stubbi wrote:
       | Interesting. Since I studied them during my Masters I feel in the
       | longer term EBMs will be the way forward for AI
        
       | uoaei wrote:
       | I wish I had more opportunity to use EBMs. Joint distributions
       | seem to be more relevant to our epistemology (vis a vis data and
       | what we can say about it) than conditional ones. But the
       | optimization steps for fitting parameters is kind of a
       | dealbreaker because of how many steps it can take, and also
       | because most ML frameworks are aggressively feedforward.
        
       ___________________________________________________________________
       (page generated 2024-08-17 23:02 UTC)