[HN Gopher] Diffusion Forcing: Next-Token Prediction Meets Full-...
       ___________________________________________________________________
        
       Diffusion Forcing: Next-Token Prediction Meets Full-Sequence
       Diffusion
        
       Author : magoghm
       Score  : 154 points
       Date   : 2024-07-04 02:09 UTC (20 hours ago)
        
 (HTM) web link (boyuan.space)
 (TXT) w3m dump (boyuan.space)
        
       | blovescoffee wrote:
       | Am I missing something about training time? Does adding per token
       | noise cause training to slow significantly? Cool paper though!
        
       | luke-stanley wrote:
       | Anyone know of research or tools for using an existing text
       | generating LLM with diffusion like techniques with no new pre-
       | training, or at most, a bit of fine-tuning, such that it works
       | with a small GPT / Phi 3 / Gwen model, for example? I know about
       | Tree of Thoughts with MCTS etc, that are somewhat similar (though
       | often with a different reward learned goal) but I'm interested in
       | something closer to token level generation. Is this possible?
        
       | vessenes wrote:
       | A number of ideas seem notable to me here; first, they are
       | merging the idea of sequence masking (the key training idea for
       | LLMs) with diffusion models; they do this by keeping track of an
       | 'uncertainty' level per pixel. This 'uncertainty' level is
       | treated as the 'noise' level for the diffusion model, (a model
       | which denoises controlled by some sort of embedding).
       | 
       | There are a bunch of neat things you can do with this: in
       | particular, you can firm up parts of the image earlier than
       | others, and thus use it for, say maze solving. They even show it
       | controlling a robot arm moving fruit around, which is pretty
       | wild.
       | 
       | In a way the title undersells the idea - this is a way to do
       | _fractional_ masking, since the masking level is a float - and I
       | think is really a pretty profound and interesting idea.
       | 
       | However, there's a lot not talked about in this paper; I'd be
       | very curious to see their codebase. _How_ exactly do you set up a
       | maze-following task vs a video extension task? How do you hook up
       | a robot arm to this model, and tell the model what you want done?
       | The architecture itself deserves a significant number of papers
       | / explication.
        
         | bravura wrote:
         | Thank you for this. It appears to be an exceedingly elegant
         | take on modeling uncertainty in planning and search. There's
         | something quite potent about changing the task to be variable-
         | length, but also forcing the agent to account for its current
         | situation instead of taking it for granted. This allows the
         | agent to react and generalize way better along its path, even
         | in the face of unforeseen challenges.
         | 
         | I assume this is set up so that all tasks are treated as
         | variable horizon, and the current state as a consequence of
         | preceding actions. I agree it would be nice to see the code.
        
       | treprinum wrote:
       | Russ is doing diffusion now? Must be very applicable to robotics.
        
         | krasin wrote:
         | Diffusion policies are indeed started to be used in Robotics
         | recently. See https://diffusion-policy.cs.columbia.edu/ and
         | related research.
        
       | omerhac wrote:
       | Very cool, but why is it called diffusion forcing?
        
       | jimsimmons wrote:
       | I work in the field and the work is presented in an extremely
       | obtuse manner.
       | 
       | What is the problem you're trying to solve? Are you proposing a
       | new generative model?
        
       ___________________________________________________________________
       (page generated 2024-07-04 23:01 UTC)