hngopher.com

       [HN Gopher] Mercury Coder: frontier diffusion LLM generating 100...
       ___________________________________________________________________
        
       Mercury Coder: frontier diffusion LLM generating 1000+ tok/sec on
       commodity GPUs
        
       Author : ejwang
       Score  : 23 points
       Date   : 2025-02-26 19:58 UTC (3 hours ago)
        
 (HTM) web link (www.inceptionlabs.ai)
 (TXT) w3m dump (www.inceptionlabs.ai)
        
       | starnavigator wrote:
       | just tried out the model in the playground and it seems pretty
       | fast. if what they claim is true, then this could be concerning
       | for cerebras.
        
       | volodia wrote:
       | This is Volodymyr, co-founder at Inception---let us know if you
       | have any questions about diffusion, language modeling, and our
       | new Mercury models!
        
         | olddustytrail wrote:
         | How does producing tokens in parallel not just result in
         | completely incoherent output?
        
           | volodia wrote:
           | The short answer is that we do more than one parallel pass
           | over multiple tokens: we iteratively refine them over a few
           | passes to fix incoherences. This can be seen as a
           | generalization of diffusion algorithms that underlie systems
           | like Midjourney or Sora.
        
           | imtringued wrote:
           | Assuming the model tracks convergence in one way or another,
           | it would simply continue performing iterations until it has
           | reached an error below an epsilon value.
           | 
           | This means that in the worst case the number of iterations is
           | the same as a classic autoregressive transformer.
           | 
           | So they are mostly taking advantage of the fact that the
           | average response is in reality not fully sequential, so the
           | model is discovering the exploitable parallelism on its own.
           | 
           | This is not too dissimilar to a branch and bound algorithm
           | that has a worse theoretical runtime than a simple brute
           | force search, but in practice is solving the integer linear
           | programming problem in almost polynomial time, because not
           | everyone is encoding the hardest instances of problems in NP
           | as integer linear programs.
        
         | tsadoq wrote:
         | It looks super cool, any plan on open sourcing something? Btw,
         | looking for an AI solution/sale engineer :P
        
           | volodia wrote:
           | Good question! We are not open sourcing the models at launch
           | time, but we have a roadmap of future releases in which we
           | hope to make some of our models accessible to the research
           | community.
        
       | itunpredictable wrote:
       | Holy shit this is fast
        
       | fpickle121 wrote:
       | Is their a paper / technical report out on this?
        
         | volodia wrote:
         | Not today, but we will be following up with a technical report
         | over the next week or so. In the meantime, you can take a look
         | at some of the research papers that inspired our work: -
         | https://arxiv.org/abs/2310.16834 -
         | https://arxiv.org/abs/2406.07524
        
         | imtringued wrote:
         | https://ml-gsai.github.io/LLaDA-demo/
         | 
         | As far as I understand (based on reading for less than 5
         | minutes) it is still a transformer model, but they simply start
         | predicting tokens in random positions, with the possibility of
         | updating existing tokens, rather than producing tokens from
         | left to right.
         | 
         | That isn't too different from say good old stable diffusion.
        
       ___________________________________________________________________
       (page generated 2025-02-26 23:01 UTC)