[HN Gopher] Fun times with energy-based models
___________________________________________________________________
Fun times with energy-based models
Author : mpmisko
Score : 69 points
Date : 2024-08-16 19:16 UTC (1 days ago)
(HTM) web link (mpmisko.github.io)
(TXT) w3m dump (mpmisko.github.io)
| esafak wrote:
| I think the rationale for using tricks like score matching and
| contrastive divergence deserves a mention: the partition function
| is computationally expensive.
|
| Since we're on the subject, what are EBMs good for today?
| jiggawatts wrote:
| This paper lists the benefits in the introduction:
| https://proceedings.neurips.cc/paper_files/paper/2019/file/3...
|
| - Simplicity and Stability: An EBM is the only object that
| needs to be trained and designed. Separate networks are not
| tuned to ensure balance.
|
| - Sharing of Statistical Strength: Since the EBM is the only
| trained object, it requires fewer model parameters than
| approaches that use multiple networks.
|
| - Adaptive Computation Time: Implicit sample generation is an
| iterative stochastic optimization process, which allows for a
| trade-off between generation quality and computation time.
|
| - VAEs and flow-based models are bound by the manifold
| structure of the prior distribution and consequently have
| issues modelling discontinuous data manifolds, often assigning
| probability mass to areas unwarranted by the data. EBMs avoid
| this issue by directly modelling particular regions as high or
| lower energy.
|
| - Compositionality: If we think of energy functions as costs
| for a certain goals or constraints, summation of two or more
| energies corresponds to satisfying all their goals or
| constraints.
| programjames wrote:
| As far as I can tell, flow-based models are bound by the
| exact same requirements as energy based models (flow =
| diffusion/normalizing flow/flow-matching models). But they're
| absolutely right about VAEs. Those are a memetic virus that
| need to die off in favor of more theoretically grounded
| encoders.
| mpmisko wrote:
| EBMs show up all over the place, apparently even your
| classifier is an EBM :) (https://arxiv.org/abs/1912.03263).
| uoaei wrote:
| You can take many equivalent perspectives on learning
| systems, but mostly it reduces to "messing with denominators
| in Bayes' rule". This is no different.
|
| EBMs today aren't used because first you have to fit the
| joint model, then you have to fix some inputs, then fit the
| other inputs in a second optimization step. That's just too
| much compute for today's workloads compared to feedforward
| NNs.
| programjames wrote:
| They're good for reinforcement learning. E.g. Cicero uses piKL
| which samples according to
|
| p [?] anchor_policy * exp(utility / temperature)
|
| The utility is exactly the same as "energy". The article
| ignores entropy, but you can add in entropy regularization e.g.
| in soft actor-critic.
| slashdave wrote:
| Also: you are free to model p(x) without worrying about
| normalization, something that would be required to maximize
| likelihood.
| blt wrote:
| If the author is reading: In the proof, at the end of Step 6,
| it's confusing that the "uv" term of the integration by parts is
| suddenly given a range from -[?] to [?], as if we had previously
| assumed x [?] R. But elsewhere in the article, including the
| examples, we have higher-dimensional x's. I suggest to either 1)
| include the full multidimensional version from the paper, or 2)
| explicitly mention that this is the simple 1d case, the same
| result holds in R^n, and refer to the paper.
| cshimmin wrote:
| In the multidimensional case of integration by parts, the limit
| of integration is understood to be the (d-1 dimensional)
| boundary _at infinity _, so writing +/-inf is a reasonable
| shorthand. In almost all cases, it's used when the term uv can
| be assumed to be zero (or at least constant) at this boundary.
| adamnemecek wrote:
| We are working on a startup that is revisiting the math that
| underlies EBMs. If you want to work with us or invest, check out
| these links
|
| http://traceoid.ai
|
| https://x.com/adamnemecek1/status/1822727041399328839
| uoaei wrote:
| Is there a chance that you would ever consider passionate non-
| PhD candidates with corporate, startup, and FAANG experience?
| stubbi wrote:
| Interesting. Since I studied them during my Masters I feel in the
| longer term EBMs will be the way forward for AI
| uoaei wrote:
| I wish I had more opportunity to use EBMs. Joint distributions
| seem to be more relevant to our epistemology (vis a vis data and
| what we can say about it) than conditional ones. But the
| optimization steps for fitting parameters is kind of a
| dealbreaker because of how many steps it can take, and also
| because most ML frameworks are aggressively feedforward.
___________________________________________________________________
(page generated 2024-08-17 23:02 UTC)