[HN Gopher] Why I find diffusion models interesting?
       ___________________________________________________________________
        
       Why I find diffusion models interesting?
        
       Author : whoami_nr
       Score  : 186 points
       Date   : 2025-03-06 22:35 UTC (1 days ago)
        
 (HTM) web link (rnikhil.com)
 (TXT) w3m dump (rnikhil.com)
        
       | mistrial9 wrote:
       | this is the huggingface page
       | https://huggingface.co/papers/2502.09992
        
       | jacobn wrote:
       | The animation on the page looks an awful lot like autoregressive
       | inference in that virtually all of the tokens are predicted in
       | order? But I guess it doesn't have to do that in the general
       | case?
        
         | creata wrote:
         | The example in the linked demo[0] seems less left-to-right.
         | 
         | Anyway, I think we'd expect it to usually be more-or-less left-
         | to-right -- We usually decide what to write or speak left-to-
         | right, too, and we don't seem to suffer much for it.
         | 
         | (Unrelated: it's funny that the example generated code has a
         | variable "my array" with a space in it.)
         | 
         | [0]: https://ml-gsai.github.io/LLaDA-demo/
        
           | whoami_nr wrote:
           | yeah but you can backtrack your thinking. You also have a
           | mind voice to plan out the next couple words/reflect/self
           | correct before uttering them.
        
           | frotaur wrote:
           | Very related : https://arxiv.org/abs/2401.17505
        
         | whoami_nr wrote:
         | So, in practice there are some limitations here. Chat
         | interfaces force you to feed the entire context to the model
         | everytime you ping it. Even multi step tool calls have a
         | similar thing going. So, yeah we may effectively turn all of
         | this effectively into autoregressive models too.
        
       | vinkelhake wrote:
       | I don't get where the author is coming from with the idea that a
       | diffusion based LLM would hallucinate less.
       | 
       | > dLLMs can generate certain important portions first, validate
       | it, and then continue the rest of the generation.
       | 
       | If you pause the animation in the linked tweet (not the one on
       | the page), you can see that the intermediate versions are full
       | of, well, baloney.
       | 
       | (and anyone who has messed around with diffusion based image
       | generation knows the models are perfectly happy to hallucinate).
        
         | whoami_nr wrote:
         | The Llada paper: https://ml-gsai.github.io/LLaDA-demo/ here
         | implied strong bidirectional reasoning capabilities and
         | improved performance on reversal tasks (where the model needs
         | to reason backwards).
         | 
         | I made a logical leap from there.
        
         | gdiamos wrote:
         | Bidirectional seq2seq models are usually more accurate than
         | unidirectional models.
         | 
         | However, autoregressive models that generate one token at a
         | time are usually more accurate than parallel models that
         | generate multiple tokens at a time.
         | 
         | In diffusion LLMs, both of these two effects interact. You can
         | trade them off by determining how many tokens are generated at
         | a time, and how many future tokens are used to predict the next
         | set of tokens.
        
         | markisus wrote:
         | Regarding faulty intermediate versions, I think that's the
         | point. The diffusion process can correct wrong tokens with the
         | global state implies it.
        
           | evrydayhustling wrote:
           | I think the discussion here is confusing the algorithm for
           | the output. It's true that diffusion can rewrite tokens
           | during generation, but it is doing so for consistency with
           | the evolving output -- not "accuracy". I'm unaware of any
           | research which shows that the final product, when iteration
           | stops, is less likely to contain hallucinations than with
           | autoregression.
           | 
           | With that said, I'm still excited about diffusion -- if it
           | offers different cost points, and different interaction modes
           | with generated text, it will be useful.
        
         | Legend2440 wrote:
         | Hallucination is probably a feature of statistical prediction
         | as a whole, not any particular architecture of neural network.
        
         | mitthrowaway2 wrote:
         | I'm not sure about hallucination about facts, but it might be
         | less prone to logically inconsistent statements of the form
         | "the sky is red because[...] and that's why the sky is blue".
        
       | gdiamos wrote:
       | I think these models would get interesting at extreme scale.
       | Generate a novel in 40 iterations on a rack of GPUs.
       | 
       | At some point in the future, you will be able to autogen a 10M
       | line codebase in a few seconds on a giant GPU cluster.
        
         | gdiamos wrote:
         | Diffusion LLMs also follow scaling laws -
         | https://proceedings.neurips.cc/paper_files/paper/2023/file/3...
        
           | esperent wrote:
           | Is it possible that combining multiple AIs will be able to
           | _somewhat_ bypass scaling laws, in a similar way that
           | multicore CPUs can _somewhat_ bypass the limitations of a
           | single CPU core?
        
             | gdiamos wrote:
             | I'm sure there are ways of bypassing scaling laws, but I
             | think we need more research to discover and validate them
        
           | impossiblefork wrote:
           | Those aren't the modern type with discrete masking based
           | diffusion though.
           | 
           | Of course, these too will have scaling laws.
        
         | nthingtohide wrote:
         | I read a wikipedia article of a person who was very intelligent
         | but also suffered from a mental illness. He told people around
         | him that his next novel will be of exactly N number of words
         | and it will end with the sentence P.
         | 
         | I don't remember article. I read it a decade ago. It's like he
         | was doing diffusion in his mind, subconsciously perhaps
        
           | eru wrote:
           | Seems pretty easy to achieve if you have text editing
           | software that tells you the number of words written so far?
        
       | Philpax wrote:
       | I know the r-word is coming back in vogue, but it was still
       | unpleasant to see it in the middle of an otherwise technical blog
       | post. Ah well.
       | 
       | Diffusion LMs are interesting and I'm looking forward to seeing
       | how they develop, but from playing around with that model, it's
       | GPT-2 level. I suspect it will need to be significantly scaled up
       | before we can meaningfully compare it to the autoregressive
       | paradigm.
        
         | mountainriver wrote:
         | Meta has one based on flow matching that is bigger, it performs
         | pretty well
        
           | gsf_emergency_2 wrote:
           | A possible detente between SCHMIDHUBER & the school of Yann
           | Lecun ?
           | 
           | https://doi.org/10.1103/PhysRevLett.129.228004
        
         | gsf_emergency_2 wrote:
         | I've got a couple more related snowclones..
         | 
         |  _Sufficiently humourous sneering is indistinguishable from
         | progress_
         | 
         |  _Sufficiently high social status is indistinguishable from
         | wisdom_
        
           | gsf_emergency_2 wrote:
           | Sufficiently profane reasoning is indistinguishable from
           | autoregression
           | 
           | Sufficiently anti-regressive compression is indistinguishable
           | from sentience (--maybe the SCHMIDHUBER)
           | 
           | https://psycnet.apa.org/record/2007-12667-001
        
         | IncreasePosts wrote:
         | Retarded is too good of a word to go unused. It feels super
         | wrong to call a mentally disabled person retarded or a retard.
         | And we're told we can't call stupid things retarded. So who
         | gets to use it? No one?
         | 
         | With gay, on the other hand, gay people call each other gay and
         | are usually okay being labeled as gay. So, it's still in use,
         | and I think it's fine to push back against using it to mean
         | "lame" or whatever.
         | 
         | Finally, you should keep in mind that the author may not be
         | American or familiar with American social trends. "Retarded"
         | might be just fine in South Africa or Australia(I don't know).
         | Similar to how very few Americans would bat an eye at someone
         | using the phrase "spaz out", whereas it is viewed as very
         | offensive in England.
        
           | kazinator wrote:
           | If you have a burning urge to use "retarded" with complete
           | dick-o-matic immunity, try a sentence like, "the flame
           | retardant chemical successfully retarded the spread of the
           | fire". You may singe a few eyebrows, that's about it.
        
           | billab995 wrote:
           | Might seem like a descriptive word but the fact is, it's
           | hurtful to people who are working harder to make their way in
           | life than I'll ever have to. Even when just heard in passing.
           | 
           | Why do things in life that will hurt someone who'll likely
           | just retreat away rather than confront you. Be the good guy.
        
             | mitthrowaway2 wrote:
             | That's the euphemism treadmill though, isn't it? "Retard"
             | literally means late or delayed (hence French: _en retard_
             | ). Back when it was originally introduced to refer to a
             | handicap, it was chosen for that reason to be a kind,
             | polite, and indirect phrasing. That will also be the fate
             | of any new terms that we choose. Hence for example in
             | physics the term _retarded potential_
             | (https://en.wikipedia.org/wiki/Retarded_potential) was
             | chosen to refer to the delaying effect of the speed of
             | light on electromagnetic fields, before the word had any
             | association with mental disability.
             | 
             | Words don't need to retain intrinsic hurtfulness; their
             | hurtfulness comes from their usage, and the hurtful intent
             | with which they are spoken. We don't need to yield those
             | words to make them the property of 1990s schoolyard bullies
             | in perpetual ownership.
             | 
             | To that extent I'd still say this article's usage is not
             | great.
        
               | barrkel wrote:
               | > Words don't need to retain intrinsic hurtfulness; their
               | hurtfulness comes from their usage, and the hurtful
               | intent with which they are spoken.
               | 
               | Yes; and a rose by any other name would smell as sweet.
               | 
               | Words don't need to retain intrinsic hurtfulness, but
               | it's not quite right that the hurtfulness comes from the
               | usage either. The hurtfulness comes from the actual
               | referent, combined with intent.
               | 
               | If I tell someone they are idiotic, imbecilic, moronic,
               | mentally retarded, mentally handicapped, mentally
               | challenged, I am merely iterating through a historical
               | list of words and phrases used to describe the same real
               | thing in the world. The hurt fundamentally comes from
               | describing someone of sound mind as if they are not. We
               | all know that we don't want to have a cognitive
               | disability, given a choice, nor to be thought as if we
               | had.
               | 
               | The euphemism treadmill tries to pretend that the
               | referent isn't an undignified position to be in. But
               | because it fundamentally is, no matter what words are
               | used, they can still be used to insult.
        
             | t-3 wrote:
             | Any word used to describe intellectual disability would be
             | just as hurtful, at least when given enough time to enter
             | the vernacular. That's just how language and society works.
             | Children especially can call each other anything and make
             | it offensive, because bullying and cliquish behavior is
             | very natural and it's hard to train actual politeness and
             | empathy into people in authoritarian environments like
             | schools.
        
               | billab995 wrote:
               | You're right, it's the intent that matters. <any_word>,
               | used to describe something stupid or negative while also
               | being an outdated description for a specific group of
               | people...
               | 
               | The fact is, it's _that_ word that's evolved into
               | something hurtful. So rather than be the guy who sticks
               | up for the_word and try convince everyone it shouldn't be
               | hurtful, I just decided to stop using it. The reason why
               | I stopped was seeing first hand how it affected someone
               | with Down Syndrome who heard me saying it. Sometimes real
               | life beats theoretical debate. It's something I still
               | feel shame about nearly 20 years later.
               | 
               | It wasn't a particularly onerous decision to stop using
               | it, or one that opened the floodgate of other words to be
               | 'banned'. And if someone uses it and hasn't realized
               | that, then move on - just avoid using it next time. Not a
               | big deal. It's the obnoxious, purposefully hurtful use of
               | it that's not great (which doesn't seem to be the case
               | here tbh). It's the intent that matters more.
        
           | whoami_nr wrote:
           | Yes, I am not American and I had no clue about the
           | connotations.
        
         | echelon wrote:
         | > I know the r-word is coming back in vogue
         | 
         | This is so utterly fascinating to watch.
         | 
         | Three years ago this would have cost you your job. Now
         | everybody's back at it again.
         | 
         | What is happening?
        
           | esperent wrote:
           | For anyone else confused, this "r-word" is "retarded".
           | 
           | They're not talking about a human. To me that makes it feel
           | very different.
           | 
           | However, there's also a large component coming from the
           | current political situation. People feel more confident to
           | push back against things like the policing of word usage.
           | They're less likely to get "cancelled" now. They feel more
           | confident that the zeitgeist is on their side now. They're
           | probably right.
        
             | bongodongobob wrote:
             | Eh, I'm as left as they come and I'm tired of pretending
             | that banning words solve anything. Who's offended? Why? Do
             | you have a group of retarded friends you hang out with on
             | the regular? Are they reading the article? No and no. Let's
             | not pretend that changing the term to differentently abled
             | or whatever has any meaning. It doesn't. It's a handful of
             | loud people (usually well off white women) on social media
             | dictating what is and isn't ok. Phrases like "temporarily
             | unhoused" rather than homeless is another good way to
             | pretend to be taking action when you're doing less than
             | nothing. Fight for policy, not changing words.
        
               | esperent wrote:
               | > I'm as left as they come and I'm tired of pretending
               | that banning words solve anything. Who's offended? Why?
               | 
               | I'm with you on this, also speaking as a strong leftist.
               | 
               | I do think that "banning" , or at least strongly
               | condemning, the use of words when the specific group
               | being slurred are clear that they consider it a slur and
               | want it to stop is reasonable. But not when it's social
               | justice warriors getting offended on behalf of other
               | people.
               | 
               | However, I think it's absolutely ridiculous that even
               | when discussing the banning of these words, we're not
               | allowed to use them directly. We are supposed to say
               | "n-word", "r-word" even when discussing in an academic
               | sense. Utter nonsense, it's as if saying these words out
               | loud would conjure a demon.
        
               | imtringued wrote:
               | The point of these meaningless dictionary changes isn't
               | to solve anything. It's to give plausible deniability to
               | asshole behaviour through virtue signalling.
               | 
               | Crazy assholes will argue along the lines that it is an
               | insignificant inconvenience and hence anyone who uses the
               | old language must use it maliciously and on purpose,
               | because they are ableist, racist or whatever.
               | 
               | This then gives assholes the justification to behave like
               | a biggot towards the allegedly ableist person. The goal
               | is to dress up your own abusive bullying as virtuous,
               | even though deep down you don't actually care about
               | disabled people.
        
               | esperent wrote:
               | This is an interesting take, and I think it's not
               | unreasonable to label the worst of the social justice
               | warriors as assholes.
               | 
               | However, most of them are well meaning. They're misguided
               | rather than assholes. They really do want to take action
               | for social improvement. It's just that real change is too
               | hard and requires messy things like protesting on the
               | street or getting involved in politics and law. So, they
               | fall back on things like policing words, or calling out
               | perceived bad actors, which they can do from the comfort
               | of their homes via the internet.
               | 
               | To be fair, some genuinely bad people have been
               | "cancelled". The "me too" movement didn't happen without
               | reason. It's just that it went too far, and started
               | ignoring pesky things like evidence, or innocent until
               | proven otherwise.
        
               | bloomingkales wrote:
               | _Do you have a group of retarded friends you hang out
               | with on the regular?_
               | 
               | I should not have laughed at this.
        
               | Uehreka wrote:
               | Yes and yes? I'm an AI enthusiast interested in the
               | article and I'm offended by that word for pretty non-
               | hypothetical reasons. When I was in middle school I was
               | bullied a lot by people who would repeatedly call me the
               | r-slur. That word reminds me of some of the most shameful
               | and humiliating moments of my life. If I hear someone use
               | it out of nowhere it makes me wince. Seeing it written
               | down isn't as bad, but I definitely would prefer people
               | phased it out of their repertoire.
        
           | inverted_flag wrote:
           | The zeitgeist is shifting away from "wokeness" and people are
           | testing the waters trying to see what they can get away with
           | saying now.
        
           | exe34 wrote:
           | Elon Musk made it cool again.
        
       | kelseyfrog wrote:
       | I'm personally happy to see effort in this space simply because I
       | think it's an interesting set of tradeoffs (compute [?] accuracy)
       | - a departure from the fixed next token compute budget required
       | now.
       | 
       | It brings up interesting questions, like what's the equivalency
       | between smaller diffusion models which consume more compute
       | because they have a greater number of diffusion steps compared to
       | larger traditional LLMs which essentially have a single step. How
       | effective is decoupling the context window size to the diffusion
       | window size? Is there an optimum ratio?
        
         | machiaweliczny wrote:
         | I actually think that diffusion LLMs will be best for code
         | generation
        
       | billab995 wrote:
       | Stopped reading at the r word. Do better.
        
       | mountainriver wrote:
       | The most interesting thing about diffusion LMs that tends to be
       | missed, are their ability to edit early tokens.
       | 
       | We know that the early tokens in an autoregressive sequence
       | disproportionately bias the outcome. I would go as far as to say
       | this is some of the magic of reasoning models is they generate so
       | much text they can kinda get around this.
       | 
       | However, diffusion seems like a much better way to solve this
       | problem.
        
         | kgeist wrote:
         | But how can test-time compute be implemented for diffusion
         | models if they already operate on the whole text at once? Say
         | it gets stuck--how does it proceed further? Autoregressive
         | reasoning models would simply backtrack and try other
         | approaches. It feels like denoising the whole text further
         | wouldn't lead to good results, but I may be wrong.
        
           | eru wrote:
           | Perhaps do a couple of independent runs, and then combine
           | them afterwards?
        
           | spwa4 wrote:
           | Diffusion LLMs are still residual networks. You can Google
           | that, but it means that they don't generate the whole text at
           | once. Every layer generates corrections to be made to the
           | whole text at once.
           | 
           | Think of it like writing a text by forcing your teacher to
           | write for you by entering in the assignment 100 times. You
           | begin by generating completely inaccurate text, almost
           | random, that leans perhaps a little bit towards the answer.
           | Then you systematically begin to correct small parts of the
           | text. The teacher that sees the text, and uses red the red
           | pen to correct a bunch of things. Then the corrected text is
           | copied onto a fresh page, and resubmitted to the teacher. And
           | again. And again. And again. And again. 50 times. 100 times.
           | That's how diffusion models work.
           | 
           | Technically, it adds your corrections to the text, but that's
           | mathematical addition, not adding at the end. Also
           | technically every layer is a teacher that's slightly
           | different from the previous one. And and and ... but this is
           | the basic principle. The big advantage is that this makes
           | neural networks slowly lean towards the answer. First they
           | decide to have 3 sections, one about X, Y and one about Z,
           | then they decide on what sentences to put, then they start
           | thinking about individual words, then they start worrying
           | about things like grammar, and finally about spelling and
           | pronouns and ...
           | 
           | So to answer your question: diffusion networks can at any
           | time decide to send out a correction that effectively erases
           | the text (in several ways). So they can always start over by
           | just correcting everything all at once back to randomness.
        
             | kgeist wrote:
             | Yeah, but with autoregressive models, the state grows,
             | whereas with diffusion models, it remains fixed. As a
             | result, a diffusion model can't access its past thoughts
             | (e.g., thoughts that rejected certain dead ends) and may
             | start oscillating between the same subpar results if you
             | keep denoising multiple times.
        
         | ithkuil wrote:
         | Yeah reasoning models are "self-doubt" models.
         | 
         | The model is trained to encourage re-evaluating the soundness
         | of tokens produced during the "thinking phase".
         | 
         | The model state vector is kept in a state of open exploration.
         | Influenced by the already emitted tokens but less strongly so.
         | 
         | The non-reasoning models were just trained with the goal of
         | producing useful output on a first try and they did their best
         | to maximize that fitness function.
        
       | kazinator wrote:
       | Interestingly, that animation at the end _mainly_ proceeds from
       | left to right, with just some occasional exceptions.
       | 
       | So I followed the link, and gave the model this bit of
       | conversation starter:
       | 
       | > _You still go mostly left to right._
       | 
       | The denoising animation it generated went like this:
       | 
       | > [Yes] [.] [MASK] [MASK] [MASK] ... [MASK]
       | 
       | and proceeded by deletion of the mask elements on the right one
       | by one, leaving just the "Yes.".
       | 
       | :)
        
       | DeathArrow wrote:
       | That got me thinking that it would be nice to have something like
       | ComfyUi to work with diffusion based LLMs. Apply LORAs, use
       | multiple inputs, have multiple outputs.
       | 
       | Something akin to ComfyUi but for LLMs would open up a world of
       | possibilities.
        
         | hdjrudni wrote:
         | Maybe not even 'akin' but literally ComfyUI. Comfy already has
         | a bunch of image-to-text nodes. I haven't seen txt2txt or Loras
         | and such for them though. But I also haven't looked.
        
           | Philpax wrote:
           | It's complicated by the ComfyUI data model, which treats
           | strings as immediate values/constants and not variables in
           | their own right. This could ostensibly be fixed/worked
           | around, but I imagine that it would come at a cost to
           | backwards compatibility.
        
         | dragonwriter wrote:
         | ComfyUI already has nodes (mostly in extensions available
         | through the built in manager) for working with LLMs, both
         | remote LLMs accessed through APIs and local ones running under
         | Comfy itself, the same as it runs other models.
        
         | terhechte wrote:
         | Check out Floneum, it's basically ComfyUI for LLM's, extendable
         | via plugins
         | 
         | https://floneum.com/
         | 
         | Scroll down a bit on the website to see a screenshot.
        
           | DeathArrow wrote:
           | Thank you!
        
       | chw9e wrote:
       | This was a very cool paper about using diffusion language models
       | and beam search: https://arxiv.org/html/2405.20519v1
       | 
       | Just looking at all of the amazing tools and workflows that
       | people have made with ComfyUI and stuff makes me wonder what we
       | could do with diffusion LMs. It seems diffusion models are much
       | more easily hackable than LLMs.
        
       | FailMore wrote:
       | Thanks for the post, I'm interested in them too
        
       | monroewalker wrote:
       | See also this recent post about Mercury-Coder from Inception
       | Labs. There's a "diffusion effect" toggle for their chat
       | interface but I have no idea if that's an accurate representation
       | of the model's diffusion process or just some randomly generated
       | characters showing what the diffusion process looks like
       | 
       | https://news.ycombinator.com/item?id=43187518
       | 
       | https://www.inceptionlabs.ai/news
        
       | alexmolas wrote:
       | I guess the biggest limitation of this approach is that the max
       | output length is fixed before generation starts. Unlike
       | autoregressive LLM, which can keep generating forever.
        
         | gdiamos wrote:
         | max output size is always limited by the inference framework in
         | autoregressive LLMs
         | 
         | eventually they run out of memory or patience
        
       | antirez wrote:
       | There is a disproportionate skepticism in autoregressive models
       | and a disproportionate optimism in alternative paradigms because
       | of the absolutely non verifiable idea that LLMs, when predicting
       | the next token, don't already model, in the activation states,
       | the gist of what they could going to say, similar to what humans
       | do. That's funny because many times it can be observed in the
       | output of truly high quality replies that the first tokens only
       | made sense _in the perspective_ of what comes later.
        
         | spr-alex wrote:
         | maybe i understand this a little differently, the argument i am
         | most familiar with is this one from lecun, where the error
         | accumulation in the prediction is the concern with
         | autoregression
         | https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...
        
           | antirez wrote:
           | The error accumulation thing is basically without any ground
           | as regressive models correct what they are saying in the
           | process of emitting tokens (trivial to test yourself: force a
           | given continuation in the prompt and the LLMs will not follow
           | at all). LeCun provided an incredible amount of wrong claims
           | about LLMs, many of which he now no longer accepts: like the
           | stochastic parrot claim. Now the idea that there is just a
           | statistical relationship in the next token prediction is
           | considered laughable, but even when it was formulated there
           | were obvious empirical hints.
        
             | HeatrayEnjoyer wrote:
             | >force a given continuation in the prompt and the LLMs will
             | not follow at all
             | 
             | They don't? That's not the case at all, unless I am
             | misunderstanding.
        
               | antirez wrote:
               | I'm not talking about the fine tuning that make them side
               | with the user even when they are wrong (anyway, this is
               | less and less common now compared to the past, but anyway
               | it's a different effect). I'm referring if in the
               | template you make the assistant reply starting with wrong
               | words / directions, and the LLM finds a way to say what
               | it really meant saying "wait, actually I was wrong" or
               | other sentences that allow it to avoid following the
               | line.
        
             | spr-alex wrote:
             | i think the opposite, the error accumulation thing is
             | basically the daily experience of using LLMs.
             | 
             | As for the premise that models cant self correct that's not
             | the argument i've ever seen, transformers have global
             | attention across the context window. It's that their
             | prediction abilities are increasingly poor as generation
             | goes on. Is anyone having a different experience than that?
             | 
             | Everyone doing some form of "prompt engineering" whether
             | with optimized ML tuning, whether with a human in the loop,
             | or some kind of agentic fine tuning step, runs into
             | perplexity errors that get worse with longer contexts in my
             | opinion.
             | 
             | There's some "sweet spot" for how long of a prompt to use
             | for many use cases, for example. It's clear to me that less
             | is more a lot of the time
             | 
             | Now will diffusion fare significantly better on error is
             | another question. Intuition would guide me to think more
             | flexiblity with token-rewriting should enable much greater
             | error correction capabilities. Ultimately as different
             | approaches come online we'll get PPL comparables and the
             | data will speak for itself
        
       | flippyhead wrote:
       | It's a pet peeve of mine to make a statement in the form of a
       | question?
        
         | ajkjk wrote:
         | I don't know why (and am curious) but this particularly odd
         | question phrasing seems to happen a lot among Indian immigrants
         | I've met in America. Maybe it's considered grammatically
         | correct in India or something?
        
           | exe34 wrote:
           | I've seen an explanation (that I don't fully buy), that
           | school teachers end most sentences with a question because
           | they're trying to get the children? the children? to
           | complete? their sentence.
        
       | beeforpork wrote:
       | What it is interesting that the original title is not a question?
        
         | beeforpork wrote:
         | Sorry, this was redundant?
        
       | prometheus76 wrote:
       | Why did the person who posted this change the headline of the
       | article ("Diffusion models are interesting") into a nonsensical
       | question?
        
         | amclennon wrote:
         | Considering that the article links back to this post, the
         | simplest explanation might be that the author changed the title
         | at some point. If this were a larger publication, I would have
         | probably assumed an A/B test
        
         | whoami_nr wrote:
         | Author here. I just messed up while posting.
        
       | inverted_flag wrote:
       | How do diffusion LLMs decide how long the output should be?
       | Normal LLMs generate a stop token and then halt. Do diffusion
       | LLMs just output a fixed block of tokens and truncate the output
       | that comes after a stop token?
        
       | bilsbie wrote:
       | What if we combine the best of both worlds? What might that look
       | like?
        
       ___________________________________________________________________
       (page generated 2025-03-07 23:01 UTC)