[HN Gopher] Diffusion Forcing: Next-Token Prediction Meets Full-...
___________________________________________________________________
Diffusion Forcing: Next-Token Prediction Meets Full-Sequence
Diffusion
Author : magoghm
Score : 154 points
Date : 2024-07-04 02:09 UTC (20 hours ago)
(HTM) web link (boyuan.space)
(TXT) w3m dump (boyuan.space)
| blovescoffee wrote:
| Am I missing something about training time? Does adding per token
| noise cause training to slow significantly? Cool paper though!
| luke-stanley wrote:
| Anyone know of research or tools for using an existing text
| generating LLM with diffusion like techniques with no new pre-
| training, or at most, a bit of fine-tuning, such that it works
| with a small GPT / Phi 3 / Gwen model, for example? I know about
| Tree of Thoughts with MCTS etc, that are somewhat similar (though
| often with a different reward learned goal) but I'm interested in
| something closer to token level generation. Is this possible?
| vessenes wrote:
| A number of ideas seem notable to me here; first, they are
| merging the idea of sequence masking (the key training idea for
| LLMs) with diffusion models; they do this by keeping track of an
| 'uncertainty' level per pixel. This 'uncertainty' level is
| treated as the 'noise' level for the diffusion model, (a model
| which denoises controlled by some sort of embedding).
|
| There are a bunch of neat things you can do with this: in
| particular, you can firm up parts of the image earlier than
| others, and thus use it for, say maze solving. They even show it
| controlling a robot arm moving fruit around, which is pretty
| wild.
|
| In a way the title undersells the idea - this is a way to do
| _fractional_ masking, since the masking level is a float - and I
| think is really a pretty profound and interesting idea.
|
| However, there's a lot not talked about in this paper; I'd be
| very curious to see their codebase. _How_ exactly do you set up a
| maze-following task vs a video extension task? How do you hook up
| a robot arm to this model, and tell the model what you want done?
| The architecture itself deserves a significant number of papers
| / explication.
| bravura wrote:
| Thank you for this. It appears to be an exceedingly elegant
| take on modeling uncertainty in planning and search. There's
| something quite potent about changing the task to be variable-
| length, but also forcing the agent to account for its current
| situation instead of taking it for granted. This allows the
| agent to react and generalize way better along its path, even
| in the face of unforeseen challenges.
|
| I assume this is set up so that all tasks are treated as
| variable horizon, and the current state as a consequence of
| preceding actions. I agree it would be nice to see the code.
| treprinum wrote:
| Russ is doing diffusion now? Must be very applicable to robotics.
| krasin wrote:
| Diffusion policies are indeed started to be used in Robotics
| recently. See https://diffusion-policy.cs.columbia.edu/ and
| related research.
| omerhac wrote:
| Very cool, but why is it called diffusion forcing?
| jimsimmons wrote:
| I work in the field and the work is presented in an extremely
| obtuse manner.
|
| What is the problem you're trying to solve? Are you proposing a
| new generative model?
___________________________________________________________________
(page generated 2024-07-04 23:01 UTC)