[HN Gopher] Mercury Coder: frontier diffusion LLM generating 100...
___________________________________________________________________
Mercury Coder: frontier diffusion LLM generating 1000+ tok/sec on
commodity GPUs
Author : ejwang
Score : 23 points
Date : 2025-02-26 19:58 UTC (3 hours ago)
(HTM) web link (www.inceptionlabs.ai)
(TXT) w3m dump (www.inceptionlabs.ai)
| starnavigator wrote:
| just tried out the model in the playground and it seems pretty
| fast. if what they claim is true, then this could be concerning
| for cerebras.
| volodia wrote:
| This is Volodymyr, co-founder at Inception---let us know if you
| have any questions about diffusion, language modeling, and our
| new Mercury models!
| olddustytrail wrote:
| How does producing tokens in parallel not just result in
| completely incoherent output?
| volodia wrote:
| The short answer is that we do more than one parallel pass
| over multiple tokens: we iteratively refine them over a few
| passes to fix incoherences. This can be seen as a
| generalization of diffusion algorithms that underlie systems
| like Midjourney or Sora.
| imtringued wrote:
| Assuming the model tracks convergence in one way or another,
| it would simply continue performing iterations until it has
| reached an error below an epsilon value.
|
| This means that in the worst case the number of iterations is
| the same as a classic autoregressive transformer.
|
| So they are mostly taking advantage of the fact that the
| average response is in reality not fully sequential, so the
| model is discovering the exploitable parallelism on its own.
|
| This is not too dissimilar to a branch and bound algorithm
| that has a worse theoretical runtime than a simple brute
| force search, but in practice is solving the integer linear
| programming problem in almost polynomial time, because not
| everyone is encoding the hardest instances of problems in NP
| as integer linear programs.
| tsadoq wrote:
| It looks super cool, any plan on open sourcing something? Btw,
| looking for an AI solution/sale engineer :P
| volodia wrote:
| Good question! We are not open sourcing the models at launch
| time, but we have a roadmap of future releases in which we
| hope to make some of our models accessible to the research
| community.
| itunpredictable wrote:
| Holy shit this is fast
| fpickle121 wrote:
| Is their a paper / technical report out on this?
| volodia wrote:
| Not today, but we will be following up with a technical report
| over the next week or so. In the meantime, you can take a look
| at some of the research papers that inspired our work: -
| https://arxiv.org/abs/2310.16834 -
| https://arxiv.org/abs/2406.07524
| imtringued wrote:
| https://ml-gsai.github.io/LLaDA-demo/
|
| As far as I understand (based on reading for less than 5
| minutes) it is still a transformer model, but they simply start
| predicting tokens in random positions, with the possibility of
| updating existing tokens, rather than producing tokens from
| left to right.
|
| That isn't too different from say good old stable diffusion.
___________________________________________________________________
(page generated 2025-02-26 23:01 UTC)