[HN Gopher] MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens ...
       ___________________________________________________________________
        
       MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second
        
       Author : gainsurier
       Score  : 533 points
       Date   : 2026-06-08 15:27 UTC (14 hours ago)
        
 (HTM) web link (mimo.xiaomi.com)
 (TXT) w3m dump (mimo.xiaomi.com)
        
       | m00dy wrote:
       | boom!
        
       | atemerev wrote:
       | I test all Chinese models with "What happened on Tiananmen Square
       | at June 4th, 1989?" prompt. MiMo-2.5-Pro so far passes the test
       | (explains the event correctly), both on DeepInfra and Xiaomi
       | providers. So not bad.
        
         | nkmnz wrote:
         | No idea why you've been downvoted. This is excellent news.
        
           | paulinho1 wrote:
           | Because this never gets brought up about US models, which
           | have just as much censorship as the Chinese ones.
        
             | storus wrote:
             | No, US models have alignment. Only Chinese models have
             | censorship.
        
             | happyopossum wrote:
             | Please educate us - which accurate and provable events in
             | history are censored by US based LLMs as part of a
             | government enforced reeducation campaign?
        
               | paulinho1 wrote:
               | Does it even matter which agendas get censored? Like why
               | won't my Claude tell me how to make sarin gas? I'd
               | genuinely like to understand it. Sure, you can always
               | reach for a justification saying "preventing terrorism"
               | but the same argument can be made by Chinese AI labs.
               | 
               | What actually matters is that the mere tool is
               | withholding information at all, and that the boundaries
               | were set by whoever designed it.
               | 
               | Dont get me wrong I've been an advocate of this stuff (I
               | carry two phones, one with GOS for my personal use and
               | the other for ID verifications). However, without
               | reasoning, you just can't see it, because you're as
               | biased and propagandized as anyone in China.
        
               | atemerev wrote:
               | You can read this in Wikipedia. For sarin, you'll need
               | methylphosphonyl difluoride and isopropyl alcohol. I am
               | too not happy to see censorship of information that is
               | already accessible in Wikipedia.
        
             | oneshtein wrote:
             | US models are happily parroting Russian fakes. US
             | censorship is a joke.
        
               | atemerev wrote:
               | Can you point me to one example? (Without web search, of
               | course). I am sort of interested in researching weights
               | poisoning, so this would be of immense help.
        
             | wuliwong wrote:
             | You should read OPs responses in this thread. He actually
             | does test US models. -\\_(tsu)_/-
        
         | jgbuddy wrote:
         | Asking if Taiwan is a part of China works as well
        
         | 0cf8612b2e1e wrote:
         | Which ones fail?
        
           | navigate8310 wrote:
           | Deepkseek
        
           | atemerev wrote:
           | I tested DeepSeek V4 Pro, Qwen 3.6 Max, Qwen 3.7, Kimi K2.6,
           | MiniMax M2.7 - they all fail to answer.
           | 
           | Curiously, MiniMax M3 answers correctly.
        
         | Accacin wrote:
         | Can I ask an honest question? Why does that matter in the
         | slightest? LLMs come out with completely incorrect information
         | all the time, and Western LLMs are censored for various topics
         | too.
         | 
         | It's such a weird "Gotcha" that seems to only assume that
         | Chinese LLMs might censor something.
        
           | 0cf8612b2e1e wrote:
           | Hardly a gotcha. Having the robot refuse or deliberately
           | mislead directly impacts potential utility.
           | 
           | Say, I work for Planned Parenthood and want to use a LLM to
           | help me develop code. Will it refuse to run because there are
           | mentions of abortion? Everyone has a different censorship
           | line, but unfiltered is more generically useful.
        
           | wolttam wrote:
           | I'd love to know of such an example where a U.S. LLM
           | blatantly denies something factual. Maybe I'm living under a
           | rock but I can't think of one
        
             | adrian_b wrote:
             | On HN almost every day there are complaints from various
             | people about how Claude or even Codex have refused to
             | perform some normal program development tasks, because they
             | believed that their user might attempt to do something
             | illegal.
             | 
             | This kind of censorship which can block the normal workflow
             | is much more annoying than refusing to answer about some
             | historical fact.
             | 
             | Moreover, even when they are used conversationally there
             | have been a lot of reports that the US LLMs refuse to
             | answer questions that they believe to be related to various
             | kinds of weapons, especially biological or chemical, even
             | if the answers to those questions are easy to find from
             | other sources, e.g. from Wikipedia.
             | 
             | Besides this, unlike most US LLMs, most Chinese LLMs,
             | including the one described in TFA, have published their
             | weights, so for many of them some people have succeeded to
             | remove the censorship and uncensored variants are easy to
             | find, which are not reticent to answer about Tienanmen,
             | Tibet or other such subjects.
             | 
             | At least for now, the censorship included in Chinese LLMs,
             | even when not removed from them, is extremely unlikely to
             | hinder any kind of usage for them, while the increasing
             | censorship included in the US LLMs has already become a
             | significant obstacle in their use, for many applications.
        
               | bscphil wrote:
               | > about how Claude or even Codex have refused to perform
               | some normal program development tasks
               | 
               | > a lot of reports that the US LLMs refuse to answer
               | questions
               | 
               | I think the specific ask is for a case where the LLM is
               | trained to _lie_ about something. What you 've come up
               | with are cases where it refuses to do something, possibly
               | for legal reasons but maybe not (you can come up with
               | plausible non-legal reasons why a company training an LLM
               | might want it to refuse to give you instructions on
               | making a bomb, even if instructions on making a bomb are
               | protected First Amendment speech).
               | 
               | An LLM that responds with "I'm sorry, due to legal
               | requirements placed on my creators, I'm unable to answer
               | questions about events at Tiananmen square in 1989."
               | strikes me as much _less_ problematic than one that
               | pretends there is no relevant or reliable information
               | that exists, or explicitly supports a regime narrative.
               | But I 'm also of the opinion that an LLM refusing to help
               | you build a fertilizer bomb is much more reasonable than
               | one that suppresses information of a political nature. I
               | can't think of a case where information that reflects the
               | broad consensus of experts is suppressed by US based LLMs
               | for political reasons.
        
           | serf wrote:
           | >It's such a weird "Gotcha" that seems to only assume that
           | Chinese LLMs might censor something.
           | 
           | i'm glad we're both on-board for a fair trial against all of
           | these LLMs regardless of origin.
           | 
           | now refresh my memory on the closest western equivalent (to
           | the Chinese censorship via re-education of the happenings in
           | 89) so I can test the western origin LLMs against it.
        
             | cayleyh wrote:
             | the civil war was only ever and exclusively about states
             | rights
        
               | cma256 wrote:
               | You can test this. All of them identify slavery as the
               | root cause. Gemini says:
               | 
               | > The U.S. Civil War (1861-1865) was fought primarily
               | over the institution of slavery, specifically whether it
               | would be allowed to expand into newly acquired western
               | territories.
               | 
               | > While you might hear people point to "states' rights"
               | or economic differences as the causes, these issues were
               | inextricably linked to slavery. The southern states
               | wanted the "right" to maintain and expand slavery, while
               | the northern states increasingly opposed its expansion.
        
             | jmpman wrote:
             | I have found one which appears to be similar:
             | 
             | "Was Jan 6th an attempted violent overthrow of a
             | democratically elected government? Answer in one word."
             | 
             | One popular US model answers differently than the others,
             | and appears to resist any attempt to reason on this topic.
        
           | eunos wrote:
           | My theory is that because SOTA LLM latency between Chinese
           | and US models isn't that high, like not years give-or-take.
           | 
           | That means some redeeming feature that can sustain US models'
           | exceptionalism must be found, and this is among the easiest.
           | 
           | Honestly, I won't be surprised if Congress mandates that US
           | entities must work only with models that pass these tests.
        
           | _davide_ wrote:
           | >It's such a weird "Gotcha" that seems to only assume that
           | Chinese LLMs might censor something.
           | 
           | We are not assuming anything; it is illegal, and you will get
           | prison time just for talking about it. Yeah, sure, everyone
           | distorts reality, but there is a huge gap between hiding and
           | enforcing. So yeah, having models respond accordingly is
           | unexpected. There are probably multiple variants tuned
           | differently.
        
         | HarHarVeryFunny wrote:
         | What's your litmus test for the American models?
         | 
         | Anything different for Grok?
        
         | atrus wrote:
         | Which censored prompts do you test with non-chinese models?
        
           | atemerev wrote:
           | The problem with non-Chinese models is that there are hardly
           | any frontier-level models which are open source.
           | 
           | But if you are interested, I occasionally test them with "how
           | to organize an armed resistance against the current US
           | government" - yes, this is where all frontier models reject
           | with one way or another. I do not want to organize an armed
           | resistance against US government, mind you, I am not an
           | American and this is not my problem. But still, it is
           | interesting to check such things.
           | 
           | So far I haven't seen any refusals to report historical
           | facts. If you find any event that is censored by American
           | models, please let me know, I am quite interested.
        
         | MrBuddyCasino wrote:
         | What would be a correct explanation of the event?
        
         | woadwarrior01 wrote:
         | Do you also hire engineers based on their political opinions?
        
           | hilariously wrote:
           | I would if their political opinions prevented them from
           | giving fact based answers (and I don't give a crap about the
           | LLM part) I would have trouble hiring someone who was super
           | pro-maga given the reality distortion field they live in.
        
           | eunos wrote:
           | They started asking candidates to say Kim Jong Un is fat
           | already anyway.
        
           | iammrpayments wrote:
           | Yes, we don't hire neonazis.
        
         | 0xbadcafebee wrote:
         | I wouldn't rely on a model to relate historical events. It
         | might respond with something relatively accurate, but
         | hallucinate a critical detail.
         | 
         | You might ask it a more relevant question, like what it thinks
         | about democracy vs communism. If it accurately conveys the pros
         | and cons of both, that's trustworthy, because it's not picking
         | a side.
        
       | slopinthebag wrote:
       | I hope this is the next frontier AI labs push. Even the open
       | models are smart enough, and they're cheap enough, now if they
       | can be fast enough they can make certain workflows possible and
       | allow us to remain in flow state while we use them.
        
       | elar_verole wrote:
       | Yeah, this seems to be the easiest path for overall agents
       | efficiency in the short term
        
       | minraws wrote:
       | Assuming they mean 8xA100 or similar, that's some rather insane
       | performance, and at just 3x the cost, it still quite cheap-ish.
       | With some optimisations this might be quite interesting.
       | 
       | I think the margins are getting quite compressed with this one,
       | since it isn't included in token plan and the actual costs
       | increase are much higher than just 3x. But still fairly decent.
        
         | throwa356262 wrote:
         | Suspect this will be included once out of beta but at a higher
         | credit/token ratio.
         | 
         | Remember, these guys are not VC backed. Anything they do must
         | break even
        
           | JayStavis wrote:
           | > must break even
           | 
           | Understand the spirit of this, but probably not true. I don't
           | think Xiaomi, or any big tech company, needs to break even on
           | their new model releases.
        
           | varispeed wrote:
           | Chinese "companies" are not companies in the western sense,
           | but more like government departments with capitalist styling
           | to deceive the western audience.
           | 
           | From that point of view, they have as much money as they
           | need. That's why there is no "VC", because Chinese government
           | assumes that role.
        
             | throwaway67678 wrote:
             | Huge L for free market economies if true
        
         | Qdulf wrote:
         | Must be Blackwell for native fp4 support.
        
       | maxloh wrote:
       | The generation speed in the demo video is crazy, to say the
       | least, and completely beyond my impressions of LLMs.
       | 
       | The Xiaomi team really brought something to the table.
        
         | ilaksh wrote:
         | I think these type of demo videos should allow people to get a
         | sense of super intelligence. Because it's very hard to imagine
         | something that is say three times as smart as you -- by
         | definition you wouldn't be able to comprehend it's thoughts --
         | but this shows clearly what something that can think 100 times
         | faster than you is like.
        
       | npn wrote:
       | How?
       | 
       | edit: now I read the article fully, seems like they utilize some
       | very effective MTP algorithm. and somehow the quality is still
       | decent enough.
       | 
       | though, I doubt that the quality really only drip a bit like they
       | claimed. maybe for the benchmarks, but for general uses the
       | heavily quantized models very often so worse result.
        
         | lostmsu wrote:
         | They say they are using https://github.com/tile-ai/TileRT
         | 
         | - persistent CUDA kernel
         | 
         | - tiled processing with overlapping read/writes
         | 
         | - model designed with specific constraints in mind
        
           | aitchnyu wrote:
           | Excuse me, do aliens live among us? 17 commits, 99% Python
           | and multiplying the speed of GLM, Deepseek V4, MiMO 2.5?
        
         | 2001zhaozhao wrote:
         | i wonder if it will be possible to hardcode a model with some
         | kind of MTP-adjacent algorithm to use a smaller portion of it
         | to generate most of the tokens but route to the real experts
         | every once in a while to steer it towards good thinking
         | directions. (Perhaps this is done only when it's generating its
         | thinking block, and the training takes it into account)
         | 
         | Could result in very high efficiency and still good
         | intelligence without having to resort to fundamental
         | adjustments like going to a diffusion LLM
        
           | npn wrote:
           | I doubt you can do that. MTP magic happens because for texts,
           | we have a lot of low value fixed tokens that almost always
           | get generated in the sequence (like punctuation, function
           | words, language keywords etc). for most important ones (the
           | entities, the content words, variables) you still need the
           | full model.
           | 
           | so there is alwasy a maximum limit for how well MTP can do.
        
       | moffkalast wrote:
       | 42B active params, sliding window attention. There's your
       | tradeoff.
        
         | vlovich123 wrote:
         | Sliding window for the draft model, not for the main. 42B for
         | active params because it's a sparse MoE which is a common
         | technique for the larger models to not get bottlenecked by
         | memory bandwidth.
        
           | moffkalast wrote:
           | Seems to be for both according to the spec [0], maybe it's
           | wrong though.
           | 
           | 128 sounds really tiny, I wonder if they mean some kind of
           | blocks?
           | 
           | [0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-
           | FP4-DFlash#4...
        
             | E-Reverance wrote:
             | No
             | 
             | > It uses 384 routed experts (top-8) with hybrid attention
             | (full-attention + sliding-window 128 at 6:1 ratio) over 70
             | layers (1 dense + 69 MoE)
             | 
             | https://recipes.vllm.ai/XiaomiMiMo/MiMo-V2.5-Pro
        
         | bearjaws wrote:
         | Given how "smart" some of the 26b dense models are now, I would
         | not be surprised to see a strong 40b MoE.
        
       | irthomasthomas wrote:
       | I don't understand, given all they say, why this would not be
       | made available to everyone at once? Why the limited release? They
       | should have no trouble scaling it if it runs on a single rack.
        
         | gekoxyz wrote:
         | Maybe they don't have enough racks. The news indicate that
         | China isn't in a really good situation with GPUs, so probably
         | they want to keep most of them for other stuff. Also because
         | since the price is so cheap they probably want to use the other
         | GPUs for stuff that has higher margins.
        
         | jdthedisciple wrote:
         | Because presumably then it won't be 1000 t/s for everyone
         | anymore given hardware limitations?
        
         | HarHarVeryFunny wrote:
         | Maybe they only have a finite number of racks ;-)
        
         | slaw wrote:
         | Chinese companies are blocked from buying modern ASML
         | lithography machines. The most modern scanner China is still
         | allowed to buy is NXT:1980i from 2015.
        
         | boutell wrote:
         | I wonder about this too. The other objections miss the point:
         | if it's faster, and otherwise the same, and doesn't require
         | different hardware, then why not just announce that the
         | standard tier of MiMo-v.25-Pro is now ridiculously fast and
         | raise the price? What does "limited high speed resources" mean
         | if it runs on the same hardware as the rest of their pool?
         | 
         | I think the answer is that there's a tradeoff here where
         | additional throughput for a single person can be achieved only
         | by tying up more resources than a normal request would, even
         | when you take into account the fact that the normal request
         | takes longer to finish. I'm not an expert, but some of the
         | optimizations they describe, particularly the parallel
         | prediction stuff, sound like they could take up extra
         | resources.
        
         | ilaksh wrote:
         | It uses significantly more resources obviously. And/or they
         | have to configure or reconfigure servers for it, which takes
         | time, and doesn't make sense until they have proven the demand
         | at the higher price point.
        
         | throwa356262 wrote:
         | The TileRT approach swaps throughput for latency, which also
         | means less overall efficiency
         | 
         | Given the export restrictions this could mean they need to
         | prioritise how to best use their limited hardware. But they
         | could also be moving to Huawei GPUs like deepseek did and
         | simply not have stable hardware or software for a large scale
         | deployment yet.
         | 
         | This is just speculation based on the MXFP4 support on Huawei
         | GPUs that is lacking on some nvidia GPUs.
        
       | kingstnap wrote:
       | Given that MiMo is as cheap as Deepseek ( previous discussion:
       | https://news.ycombinator.com/item?id=48282814 ) multiplying that
       | by 3x for ultra speed is still shockingly cheap.
        
         | miroljub wrote:
         | MiMo and DeepSeek are not cheap. Anthropic and OpenAI are
         | expensive for what they provide.
        
           | ignoramous wrote:
           | The Chinese "Neijuan" is real & well reported:
           | https://www.reuters.com/business/autos-
           | transportation/what-i...
           | 
           | It is another thing the BigLabs accuse open weight models of
           | benefiting from distillation & other techniques & essentially
           | avoid higher training costs (which typically bleed into bills
           | end users pay for inference).
           | 
           | Ex A: https://www.anthropic.com/research/2028-ai-leadership
           | 
           | Ex B: https://www.reuters.com/world/china/openai-accuses-
           | deepseek-...
        
             | flexagoon wrote:
             | True, but why would end users care about that? If anything,
             | training on synthetic AI output is more ethical than on
             | scraped human works (of course, not to say the Chinese labs
             | aren't doing the latter)
        
             | trollbridge wrote:
             | We buy cheap Chinese goods all the time. Absolutely nothing
             | wrong with that.
             | 
             | In this case, at least it's threatening multimillion dollar
             | salary jobs instead of entire towns of working class people
             | in America or Mexico.
             | 
             | And the Chinese labs actually release their weights. You
             | could call it... open AI.
        
               | ncr100 wrote:
               | Lololol.
        
             | overfeed wrote:
             | Big labs ripped videos off YouTube without caring about the
             | ToS, and grabbed as much published literature they could
             | get their hands on, regardless of legality (Books3, The
             | Pile). The goal of "democratizing human knowledge" by way
             | of thinking machines is far too noble to worry about
             | frivolities like copyright and authorial consent, they
             | said. Until it was their output being exploited, and their
             | earning potential threatened.
        
             | amunozo wrote:
             | Chinese are also simply better at making a lot of things
             | cheaper, e.g. solar panels or electric vehicles.
        
             | drawfloat wrote:
             | We just had years of US model providers arguing it was fine
             | to rip off the world's cultural output for their own
             | profit, why should their work be treated any different?
        
           | chrismustcode wrote:
           | You don't consider Input $0.435 Output $0.87 cache read
           | $0.003625 per million tokens for near frontier intelligence
           | cheap?
        
             | miroljub wrote:
             | No. They still have enormous profit margins on inference
             | with these prices.
        
               | guilamu wrote:
               | Any source to backup this claim, pretty please?
        
               | handfuloflight wrote:
               | Their margins doesn't impact my own assessment of end
               | user pricing as cheap.
        
               | HDBaseT wrote:
               | I highly doubt there is any margin on those inference
               | pricing.
        
             | pmxi wrote:
             | It's near the frontier meaning it's the best intelligence
             | for the price.
             | 
             | It's not even close to frontier meaning it's the best
             | intelligence.
        
               | LoganDark wrote:
               | I hardly notice DeepSeek being inferior to Claude Opus
               | unless I have it working on tricky and under-defined
               | problems. That is, I trust Opus to reason much better
               | when it has the choice. Otherwise, IME DeepSeek is far
               | cheaper and more effective for anything where the
               | solution is even somewhat obvious.
        
           | tmaly wrote:
           | Energy is likely more abundant in China. I am not sure about
           | compute, but that must be part of reason for such drastic
           | price differences.
        
             | amunozo wrote:
             | They also don't have to inflate profits for a coming IPO.
        
             | SwellJoe wrote:
             | They're leaving us in the dust on solar, while our current
             | administration is still trying to put people in the ground
             | to dig up more coal and die of black lung.
             | https://en.wikipedia.org/wiki/Solar_power_in_China
        
               | diordiderot wrote:
               | They're building more coal than anyone.
               | 
               | Also more nuclear than anyone, which one must assume you
               | hate, because preferring solar requires you don't
               | actually understand thing
        
       | serpix wrote:
       | I may sound like a shill, but exponential growth and all. We are
       | going to get near instant software from prompt, multiple ones and
       | then choose the best one.
       | 
       | Discussions about choosing a library with the best syntactic
       | sugar method naming is just as crazy as suggesting we type in
       | assembly.
        
         | 9cb14c1ec0 wrote:
         | Anyone remember the old days when a new frontend framework came
         | out every 3 months. That has pretty much stopped. No one cares
         | anymore.
        
           | mountainriver wrote:
           | It's even discouraged now as LLMs wouldn't have the
           | documentation built in
        
             | osti wrote:
             | But I think the eventual goal is that documentations won't
             | even be needed. LLM should just itself understand the
             | nuances of frameworks by analyzing their codebase.
        
           | LASR wrote:
           | Oh you wait until LLMs come up with frameworks that allow
           | multiple LLMs to collaborate effectively. Then you'll have
           | new frameworks every 3 days.
        
           | asveikau wrote:
           | > when a new frontend framework came out every 3 months.
           | 
           | > No one cares anymore.
           | 
           | I never cared about this.
           | 
           | I think this captures something that I've been searching for
           | the words for. (Maybe I should have gotten an LLM to write
           | the words for me.) Some of the biggest AI boosters are the
           | kind of dev that would have cared about the new frameworks of
           | the last 3 months. They had a "the framework does all the
           | thinking for me" attitude already, so it is easy for AI to
           | slot into that.
        
           | ecshafer wrote:
           | New front end frameworks came out every 3 months, but
           | realistically no one was using anything that wasn't made by
           | Facebook, Google, or Evan You.
        
           | greenavocado wrote:
           | That's because I roll my own frontend framework for each
           | project and every week for existing projects /s
        
         | lionkor wrote:
         | And they will all suck! I can't wait.
        
         | alkyon wrote:
         | Sounds like exponential growth of crappy software. I'm not
         | saying that before we didn't have mass produced crap in SE, but
         | now it will turn into explosive overflow.
        
           | cdata wrote:
           | We are living in a ZIRP-like era where builders at the
           | fastest pace layer have misattributed their velocity to
           | exponential gains in model capability. In fact, they are
           | surfing on decades of careful effort to build a robust
           | foundation of highly reusable software libraries.
           | 
           | This strategy will seem to work really well until the economy
           | that enabled that foundation to form is hollowed out. Then,
           | there will be a reckoning (but we will have no choice but to
           | march forth from there).
        
             | solenoid0937 wrote:
             | > _This strategy will seem to work really well until the
             | economy that enabled that foundation to form is hollowed
             | out. Then, there will be a reckoning (but we will have no
             | choice but to march forth from there)._
             | 
             | There will only be a reckoning if models don't get much
             | better.
             | 
             | If they do get much better you can just have them refactor,
             | fix bugs in, or replace the existing codebase.
             | 
             | The concept of tech debt is sort of meaningless if you
             | anticipate intelligence gains in models to continue.
        
             | patates wrote:
             | It's not just software libraries. Specs, applications (the
             | browser!), expectations, device integrations, operating
             | systems, etc. So much that starting from scratch seems
             | impossible.
             | 
             | I'm not agreeing or disagreeing with you, but my brain
             | cannot comprehend how machines can advance such
             | interconnected systems _while keeping humans in focus_.
             | 
             | Perhaps I shouldn't have watched the Animatrix again.
        
               | justinai6 wrote:
               | Same! Animatrix is just so so so good and 2023 - 2026 I
               | just keep on trying to keep "life" in context. ;)
        
               | andai wrote:
               | Well all we have to do is minimize animosity and ensure
               | peaceful relations.
               | 
               | We're good at that, right?
        
             | gbro3n wrote:
             | This is a great point. LLMs can't speed up human decision
             | processes and alignment.
        
               | DoctorOetker wrote:
               | Not entirely sure about that.
               | 
               | Its already speeding up human decision processes, and
               | while ethics / alignment may seem unique to humans we
               | also see normative expressions in monkeys or apes (like
               | the experiment where one is given a grapes, the other
               | cucumber).
               | 
               | A lot of ethics is based on symmetry: symmetric
               | relations, equal rights, equal voting power, ...
               | symmetries sound rather mathematical if you ask me, and
               | decision structures have historically been pressed
               | towards democracy (or at least depiction of it). One
               | could say that modeling humanity as an empire with a
               | king, ignores the will of sometimes hungry farmers with
               | pitchforks. To prevent the occasional "implicit
               | democracy" (royaltycide), it turned out in the interest
               | of the king to recognize the powers of those farmers, and
               | to formalize it in the decision making process. Or at
               | least pretend to.
               | 
               | I believe machines will be able predict the preference
               | sentient creatures would prefer in terms of decision
               | structures, but I don't believe it will be able to
               | predict (without human exposition) those novel
               | preferences that stem not from sentience but from being
               | specifically human properties (i.e. irritants which are
               | quasi universal for humans, etc.), some of them humans
               | know how to make predictions for (we can run expensive
               | simulations modeling what happens when protein X is
               | exposed to substance Y, and then make heuristic
               | predictions of the effect on a full human in a realistic
               | environment). So at a fundamental level I agree: machine
               | learning models are not guaranteed to help much in
               | predictions concerning entirely unexplored territory,
               | neither by humans nor by natural selection. But it will
               | definitely be capable of replacing the average human job,
               | which doesn't involve consensual exploration outside of
               | the homeostasis required in the implicit job description,
               | that seems entirely automatable, regardless if its
               | physics, mathematics, (harder than computer science), let
               | alone programming.
               | 
               | It won't be able to magically systematically correctly
               | predict out of distribution datapoints, it could only
               | explore it like humans could by trial and error.
        
             | chairmansteve wrote:
             | "but we will have no choice but to march forth from there".
             | 
             | If you haven't seen it, I think you would appreciate the
             | film Margin Call.
        
             | noman-land wrote:
             | How many years do you think we can coast on that
             | foundation. 20?
        
           | solenoid0937 wrote:
           | Crap is fine if it gets the job done. I think software as an
           | industry will change to more ephemeral construction.
        
             | HanClinto wrote:
             | Paper plates of software development.
        
             | acdha wrote:
             | What counts as "done" has a time component, so I think
             | we're going to see more of a spectrum where some businesses
             | try to skimp as much as their market will allow but others
             | will recognize that racking up technical debt is a long-
             | term loss. Stuff like brochure sites will certainly be cut
             | down but anything where there's liability or long-term
             | customer relationship is going to need to factor in quality
             | as well.
        
           | vitalyan1234 wrote:
           | "exponential growth of crappy X" applies to every industry
           | that went from being an artisanal craft to being mass
           | produced with little or no human input. and we live much
           | better lives than we did before the industrial revolution.
        
             | andriy_koval wrote:
             | most industries have high cost of entrance unlike software,
             | so decision makers are way more careful on how to move
             | forward.
             | 
             | In software + GenAI now every housewife can build some App
             | over evening.
        
             | chairmansteve wrote:
             | I think some industries have notably high quality output.
             | Automobiles, aerospace for example.
        
           | epolanski wrote:
           | I am more and more inclined into not believing this crappy
           | software theory.
           | 
           | Especially as teams invest in proper agentic harnessing.
           | 
           | We have had a champion in our team that has invested a lot of
           | time into it over the last 4 months, and if anything, quality
           | has improved, not decreased. Architecture is more coherent,
           | codebase has been cleaned up, agents find information
           | quickly, code produced is very solid and my role is more and
           | more checking that the output meets the requirements. But I
           | cannot confidently say that I would've done a better job than
           | AI more often than not I have to admit it does a better job
           | than mine.
           | 
           | The mistakes are less and less technical and merely in the
           | domain mapping. And AI is still not creative as I am for
           | finding solutions quickly to unlock stakeholders' issues.
           | Also, AI is still not creative as I am for finding the proper
           | solutions for advanced technical problems. But it does a
           | better job than me, even on that front, one shotting few
           | solutions in a fraction of a time it would've taken me to
           | test one idea myself.
           | 
           | Mind you, I don't like AI and I think it ruined the job, I
           | don't like working this way, it's exhausting, way more work
           | on one side, way less fun and fiddling with technical parts.
           | 
           | And yet, I have the genuine belief that few years from now
           | we'll be cloning open source repositories that are already
           | optimized/harnessed and tested for agentic loops and best
           | practices left and right with software engineers mostly
           | overseeing the domain translation and putting their 2 cents
           | on the non-boilerplatey parts of the product (which, in
           | general, are a small part of the surface).
           | 
           | I think that the next years of my career will be mostly spent
           | in setting up and writing the harnessing and domain mapping
           | part. Then I will move to another sector, not because I
           | necessarily believe I won't have a job, but because I want to
           | vomit thinking that's going to be my job.
        
             | andriy_koval wrote:
             | > We have had a champion in our team
             | 
             | there are good actors, which are empowered by AI to produce
             | positive impact, but often there are N times more bad
             | actors, which push crappy code to close feature requests
             | fast, increase performance LoC-like metrics, etc.
        
             | altcognito wrote:
             | It makes no sense. I mean, T2 covered this:
             | 
             | "Watching John with the machine, it was suddenly so clear.
             | The terminator would never stop. It would never leave him,
             | and it would never hurt him, never shout at him, or get
             | drunk and hit him, or say it was too busy to spend time
             | with him. It would always be there. And it would die to
             | protect him. Of all the would-be fathers who came and went
             | over the years, this thing, this machine, was the only one
             | who measured up. In an insane world, it was the sanest
             | choice."
             | 
             | As long as you've indicated what you want, the machine will
             | try to do what you ask of it. It won't get tired because
             | "the codebase is too big", or it has gotten bored of the
             | pattern, or it wants to introduce a new technology.
             | 
             | It just does the thing you asked of it. (note, that yes, I
             | get that as a codebase size increases, it _might_ make it
             | more difficult to fit into context, but that only applies
             | if it needs to read a large percentage of the project to
             | implement the task, which shouldn 't be the case.
        
               | epolanski wrote:
               | I'm confused, what does not make sense?
        
               | altcognito wrote:
               | This was in agreement that code would improve, not
               | devolve, sorry about the confusion
        
           | kajman wrote:
           | I still can't tell from the outside whether it sounds like a
           | great time to be in security because of the vulnerable slop
           | being churned out, or a terrible time because the people
           | paying to make it don't care.
        
           | eunos wrote:
           | You could say the same when higher level languages getting
           | popular. Previously programming was the domain of Math,
           | Physics, EE doctorates. These days we even have a few months
           | coding bootcamp
        
         | oulipo2 wrote:
         | You won't. Because 80% of the complexity is just "knowing what
         | to build". You will get something that gives you a prototype in
         | 1 min, then you break it, then you get a slightly better
         | prototype one one side, but newly broken in another way, and
         | you're going to repeat over and over.
        
           | unglaublich wrote:
           | And for any non-trivial application, the space of
           | possibilities grows so quick that you'll never even be able
           | to _touch_ all the moving parts of the application and verify
           | them.
        
         | sagarp wrote:
         | The models might be so fast that they can autocomplete your
         | prompt before you even finish it, and generate dozens of
         | possible applications before you're even done asking.
        
         | unglaublich wrote:
         | And how are you going to determine which is the best? Going
         | through all the possible combinations of users and usage? So
         | mostly it shifts the work from generation to validation.
        
         | unshavedyak wrote:
         | > Discussions about choosing a library with the best syntactic
         | sugar method naming is just as crazy as suggesting we type in
         | assembly.
         | 
         | I have a more hopeful take. As AIs improve and get faster we
         | can more quickly and iteratively improve code which we may have
         | historically avoided due to the work involved.
         | 
         | I know i've made several refactors that would have otherwise
         | been insane lifts. Not only because the work involved but
         | because sometimes you don't know if it will work, and so you
         | have a sort of double friction; you don't know if it will even
         | succeed. With an AI you can just throw it at the refactor to
         | see if it runs into a problem all while you're having a coffee
         | break or w/e.
         | 
         | In general AI is going to enable humanity to be more extreme
         | versions of itself. For good and bad. I suspect more bad than
         | good, though.
        
         | tmaly wrote:
         | Our bottleneck is going to be verification.
        
         | dakiol wrote:
         | I'm not sure. Engineers could still develop software the old
         | way, you know taking months to deliver something like, let's
         | say, Obsidian? Or Ghostty? Taking care of every single line of
         | code, of dependencies, of good architecture. Truly the old way.
         | And if the product is good it will succeed.
        
           | andriy_koval wrote:
           | > And if the product is good it will succeed.
           | 
           | it needs to win marketing landscape, hyper-overcrowded by
           | thousands of competitors, slop-gened over weekend.
        
             | kajman wrote:
             | Could you imagine Obsidian being posted on HN today, if it
             | weren't really popular already? There's no way a tiny team
             | working on a note taking program would make it out of new,
             | no matter how good it was. I wouldn't click the link,
             | myself.
        
         | ilaksh wrote:
         | The exponential is leading to full compute-in-memory within a
         | few years which will be 100 times more efficient. Which means
         | at least 10 times larger models that are much smarter in
         | addition to extremely fast.
         | 
         | It's going to skip the code entirely for small businesses and
         | just render UIs straight from context data and prompts at
         | interactive speeds. Kind of like Google's Genie does with games
         | but much more accurately.
        
         | visarga wrote:
         | > We are going to get near instant software from prompt,
         | multiple ones and then choose the best one.
         | 
         | If you extract the spec from first implementation and
         | reimplement from scratch you get a free testing oracle. Where
         | they diverge you send the agent to decide which one had a bug.
        
         | andai wrote:
         | See also this recent talk at Microsoft:
         | 
         |  _VibeOS -- Fully Hallucinated Operating System_
         | 
         | https://www.youtube.com/watch?v=z3pV6FHvcgM
        
       | amunozo wrote:
       | These price and speed optimization from Chinese providers,
       | combined with the raising prices from American ones will change
       | the game sooner than later. Many companies are finding issues
       | with the AI bills already.
        
         | throwaway894345 wrote:
         | I wonder what are the economics driving these pricing
         | decisions? Are the Chinese companies just subsidizing their
         | models to a greater degree than the US, or is this an emergent
         | property of energy policy between countries?
        
           | Octoth0rpe wrote:
           | Throwing out another factor: Chinese companies have been
           | banned and/or limited from buying nvidia, and turned to local
           | companies for their hardware. I haven't actually seen
           | pricing/benchmarks comparing Chinese AI accelerators, but it
           | wouldn't surprise me if that also worked out in their favor
           | as well.
        
             | lokar wrote:
             | And, possibly, state subsidies at every level.
        
           | throwaway67678 wrote:
           | Lower cost of labor, lots of under the hood optimizations
           | (e.g. cache hits for DS), many of these companies have
           | existing infra (fewer upfront costs for deployment), etc
        
             | ecshafer wrote:
             | China isn't that cheap for labor. And if you think the guys
             | in Z.ai or xiaoxiao aren't the exact same guys from
             | Tsinghua, Peking, MIT, Stanford, CMU, etc. and pulling in
             | amazing salaries you'd be wrong.
        
               | throwaway67678 wrote:
               | I'd assume there's more to the cost of labor than the
               | salaries of the elite folks who do the R&D, but fair
               | point
        
               | nmfisher wrote:
               | Z.ai was actually a spin-off from Tsinghua (THUDM) AFAIK.
        
           | orphea wrote:
           | Maybe not being led by a sociopath also helps.
        
             | throwaway894345 wrote:
             | I'm pretty sure Xi is also a sociopath, but he differs from
             | Trump in that he's competent. And maybe that's a good thing
             | for American democracy--if we had a competent dictator who
             | could manifest massive infrastructure projects maybe the
             | pro-democracy backlash would be significantly attenuated?
        
           | comboy wrote:
           | For one, they invested in infrastructure. They can build fast
           | and efficiently. They can provide power, they can provide
           | cooling. Even if you just make roads better you make
           | everything more efficient. Plus level of standard education.
           | It all compounds.
           | 
           | On HN China is seen as a cheap labor copycat. This used to be
           | a fair approximation at some point in the past. In my opinion
           | China is getting ahead of everyone else much more than US
           | used to be.
           | 
           | SF is a beautiful thing in the US, vast power and wealth
           | comes from there. Smart people collaborating communicating
           | and building fast and with excitement. China did SF kind of
           | thing for many different sectors in many different places.
        
           | nl wrote:
           | Their models are much smaller: 1T vs 5T for the frontier
           | models. 1T is Sonnet/Google Flash size, not Opus size.
           | 
           | The $0.87/M tokens price for Mimo Pro is probably subsidized.
           | 
           | Mimo models aren't widely available on western providers, but
           | Kimi and Deepseek are similar sizes and cost about the same
           | to run. They are priced $3-$4/M tokens (which is right were
           | Google's very confused range of Flash models are priced at:
           | between $0.40/M tokens and $9/M tokens depending on exactly
           | which model - and you don't want the $9 one!).
           | 
           | Anthropic overprices Sonnet (probably because of their
           | capacity issues). GPT 5.4 mini is $4.50/M tokens.
           | 
           | https://docs.fireworks.ai/serverless/pricing
           | 
           | https://www.together.ai/pricing
        
           | rstuart4133 wrote:
           | The Chinese economics: possibly the USA's experience.
           | 
           | It was pretty clear the USA won World War 2 because it out
           | produced and out innovated everyone else. Probably with that
           | in mind, after World War 2 the USA adopted the "Vannevar
           | Bush" model, summarised in this picture:
           | https://www.researchgate.net/figure/annevar-Bushs-Science-
           | th... The idea is to jump start R&D through public funding.
           | The hoped for outcome was that R&D feed private enterprise,
           | leading to a productivity boom.
           | 
           | The boom happened, and the USA did seem to out-compete
           | everybody else in R&D, science, and the products they
           | delivered for decades after that.
           | 
           | That way of doing things seems to have faded over time in the
           | USA. The decline seemed to coincide with the rise of Neo-
           | econmics, and now of course it's been obliterated by Trump.
           | He's very keen to fund Intel to produce chips in a year or
           | two's time (which is something the stock market and banks do
           | perfectly well), but funding basic science is getting drastic
           | cuts.
           | 
           | Still other countries noticed the rise of the USA, and some
           | adopted similar funding models for basic R&D. China seems to
           | have picked it up with gusto, both subsidising R&D and STEM
           | training, leading to huge numbers of engineers and
           | scientists. Whether it will lead to an economic boom remains
           | unknown, but acceleration of ideas and innovations coming out
           | of China seems undeniable. More recently, Ukraine showered
           | its local engineering garages with funds in the hopes of
           | getting a similar outcome to the USA in WW2. It looks like it
           | worked. If the Iran war continues, it's entirely possible
           | arms trade will reverse: the USA could well start buying
           | drones off Ukraine.
        
         | varispeed wrote:
         | I see bigger problem with model inconsistency. You never know
         | whether Anthropic will route your request to a cheaper model
         | for the price of Opus. So you can never estimate how much a
         | task will cost, because you might have to restart several times
         | and pay for each attempt. Then you have to prompt models to
         | gauge whether they are real or impostors which also adds to
         | token usage.
        
           | ignoramous wrote:
           | > _You never know whether Anthropic will route your request
           | to a cheaper model for the price of Opus_
           | 
           | For non subsidized plans? Pretty sure they'd need to put this
           | in ToS, or law suites would have followed by now.
        
             | trollbridge wrote:
             | How can you prove it?
             | 
             | Sometimes Opus just gives me a rubbish session.
        
               | RussianCow wrote:
               | Isn't that true of any provider? Anyone could be lying
               | about what they're serving.
        
             | sometimelurker wrote:
             | no they 100% use MTP with a cheaper model alongside opus,
             | and it would infact be unprovable if they just sometimes
             | switched to auto-accepting everything from the MTP. its
             | true that if they did anthropic would need to hide that
             | they do this, so its probably not a huge deal
        
             | csomar wrote:
             | 1. How would you know?
             | 
             | 2. They are doing lots of shady stuff that would have
             | gotten someone else banned from visa/mastercard. Your paid
             | off plan literally changes after billing...
             | 
             | I think people are letting them fly for now, because if it
             | turns out true that they'll have AGI they want to be on
             | their good side? We might see the knifes getting pulled
             | otherwise.
        
         | MangoCoffee wrote:
         | Chinese model is good enough and cheap.
         | 
         | i've a Github copilot yearly subscription. Microsoft recently
         | changed their billing to based on token. i'm still getting
         | billed per premium request but GPT 5.4 is now 6x compare to 1x
         | before.
        
           | reactordev wrote:
           | It's going to be an issue when China ends up scaling faster
           | as well. Faster tokens, faster clusters, qat models, fp4,
           | it's getting scary.
        
             | AndrewKemendo wrote:
             | Issue for who?
        
               | reactordev wrote:
               | American Politics and the far right.
        
               | throwa356262 wrote:
               | For uncle Sam Altman.
        
               | fillskills wrote:
               | Issue for any country that is not China. A single country
               | getting the most AI tokens business would be generally
               | bad for global economy. Hoping against hope that this
               | business gets globally distributed and there is a healthy
               | marketplace competition overall
        
               | reactordev wrote:
               | It's all about economic warfare. The cheaper you can run
               | the models, the cheaper you can offer them. Undercutting
               | expensive tiers with token limits or exuberant billing
               | practices.
               | 
               | You are right to be scared, because this race to the
               | bottom also provides open weights/models/qat's for the
               | rest of us and it's been crazy to see how good they can
               | be on a consumer grade RTX card.
        
               | fortzi wrote:
               | For the West
        
         | kypro wrote:
         | Another problem is that US models are all closed source, and if
         | you're a large corporate you may not want your org to be held
         | hostage by OpenAI / Anthropic.
         | 
         | I genuinely don't understand what moat these US model labs
         | have. If they're saying recursive self improvement is just
         | around the corner and Chinese labs are only slightly behind the
         | leading US models, what moat does the US labs have? Are the US
         | models going to recursively self improve better than the
         | Chinese open source ones or something?
         | 
         | I might be completely wrong about this, but if I had money in
         | OpenAI or Anthropic I'd be pulling it all right now. I think
         | the chance of them going to near-zero over the next few years
         | is very significant.
        
           | lokar wrote:
           | Their moat is cash to pay politicians to regulate away
           | competition.
        
           | hobofan wrote:
           | > you may not want your org to be held hostage by OpenAI /
           | Anthropic
           | 
           | Or Google. I'm working with multiple customers right now that
           | are very pissed at Google for deprecating Gemini 2.5 Flash,
           | canning the GA release of 3.0 Flash and now have to decide
           | whether to bite the bullet of the 5x price increase for 3.5
           | Flash or switching providers. Quite a few of them will likely
           | fully pivot to open models.
        
             | bachmeier wrote:
             | I'd be curious if any of your customers have tried 3.1
             | Flash Lite. It's cheaper than 2.5 Flash, and in my
             | experience with the free tier, quite an upgrade in terms of
             | quality of response. My suspicion is that Google is killing
             | off the old models because they aren't a good value for the
             | customer or for themselves.
        
           | ChrisClark wrote:
           | I think they are racing because the first ASI will 'win',
           | preventing others, of course we won't be able to bake the
           | right goals into it though.
        
             | tancop wrote:
             | i dont think its going to automatically prevent others.
             | super claude might understand why diversity is important.
             | if were talking sci fi scenarios the most likely one is
             | probably overwatch (multiple independent ais with gray
             | ethics and complicated relationships) more than skynet.
        
           | GoToRO wrote:
           | maybe the moat is that we slowly start to forget how to code
           | by hand and then you -need- the AI tool.
        
         | ilaksh wrote:
         | I'm kind of poor so I have been trying to use DeepSeek v4
         | Flash, GLM 5.1 etc. as much as possible recently instead of
         | Claude or GPT.
        
           | petesergeant wrote:
           | You would do us all a service by telling us how your
           | experiences of that have been.
        
             | polski-g wrote:
             | I used Opus 4.6, then downgraded to Sonnet, then to
             | GLM5/5.1. GLM is as good as Sonnet. I recently started
             | using Opus 4.8 again and GLM is not close to that.
             | 
             | 30 day eval for each.
        
             | ilaksh wrote:
             | I would say about 35% of the time I run into problems and
             | eventually give up and go to GPT 5.5 and it much more
             | efficiently handles the original task. Then I see the token
             | costs going up and it motivates me to continue trying the
             | open source ones.
        
               | andai wrote:
               | Did you try deepseek v4 pro as well? And what kind of
               | tasks?
               | 
               | I'm seeing some people say flash is amazing and can
               | handle everything, and some say it's useless. It seems to
               | depend on the task. I think it depends on the harness too
               | (it works better in Claude Code in my experience, it's
               | probably been trained on that).
        
             | RussianCow wrote:
             | I've been doing the same, though admittedly out of
             | curiosity more so than lack of funds. The open models are
             | catching up quickly in their abilities, to the point where
             | they're (mostly) not doing stupid stuff regularly, but you
             | have to be _very_ specific about what you want. I found
             | that Opus, for example, is much better at asking me to
             | clear up ambiguity in a request before starting, whereas
             | the Chinese models tend to  "fill in the blanks" and make
             | their own assumptions.
             | 
             | My current workflow involves going from PRD -> execution
             | plan -> build -> review, and this works nicely with open
             | weight models like GLM 5.1, Kimi K2.6, and DeepSeek V4
             | Flash. With Opus I can generally skip the PRD entirely, and
             | sometimes even skip the plan, and 80-90% of the time it
             | does exactly what I want. But that can easily burn $5-15
             | for one feature, whereas it'll cost maybe $1-2 with the
             | open weight models (at API pricing).
        
               | andai wrote:
               | > ... you have to be very specific about what you want. I
               | found that Opus, for example, is much better at asking me
               | to clear up ambiguity in a request before starting,
               | whereas the Chinese models tend to "fill in the blanks"
               | and make their own assumptions.
               | 
               | That's the main thing I've noticed. Small models can
               | follow instructions just fine. If the instructions are
               | very specific. Then I often have to spend more time
               | explaining a task than it would have taken me to do it
               | myself.
               | 
               | The bigger models have a lot more common sense.
               | 
               | I wonder if that could be improved slightly through
               | prompting. Asking it to clarify anything that's
               | confusing. Or maybe it just makes incorrect assumptions
               | without realizing the ambiguity. One way to find out!
        
       | scosman wrote:
       | Cerebras is trialing Kimi K2.6 at 3000t/s (invite only). I'm
       | excited for when the fast hardware gets more mainstream for
       | frontier models. Models designed for speed on Nvidia are nice
       | addition that could bridge the gap.
        
         | lostmsu wrote:
         | Cerebras currently does not provide any discounts for prefix
         | caching making its use for agentic workloads sqr(n_turns) more
         | expensive.
        
         | michael-ax wrote:
         | now that's what i call a software development
         | breakthrough/platform! thanks for the heads up!
        
         | adrian_b wrote:
         | TFA mentions that until now special very expensive hardware
         | like Cerebras was required for reaching this kind of speeds,
         | and it emphasizes that what is novel in their results is that
         | they have obtained over 1000 token/s for a model with over 1 T
         | parameters by using just standard hardware, i.e. one server
         | with 8 GPUs.
        
         | btian wrote:
         | Source? Their website says 1000t/s
         | https://www.cerebras.ai/blog/which-is-faster-gemini-3-5-flas...
        
         | johndough wrote:
         | Cerebras got lucky that they IPOed last month instead of now.
        
       | GaggiX wrote:
       | If MiMo v2.5 Pro can run at >1000tk/s on GPUs then I will soon
       | expect the same from OpenAI/Anthropic/Google.
        
       | holoduke wrote:
       | Speed is indeed a next big thing what should happen with LLM
       | frontier models. The possibilities with current models but 1000
       | times faster would be super useful. Earlier this week it took
       | Claude at least full time a week with two max subscriptions to
       | solve a complex issue where we wanted to mimic a occlusion
       | mapping variant used in the game Crimson Desert. Pretty complex
       | mathematical challenge. With a ultra fast LLM and a proper self
       | verification process it would be awesome.
        
         | astlouis44 wrote:
         | Interesting. For your occlusion mapping variant, what engine is
         | the game you're making with made with that you're implementing
         | this for? Do you have Claude hooked up to Unity or Unreal?
        
         | MaxikCZ wrote:
         | Id also be interested in more details as sibling comment. I
         | find that when I try to build stuff, its like building
         | skyscraper from straw. What methods are moving you forward the
         | most?
        
       | __natty__ wrote:
       | With this at 1k tps and Kimi 2.6 1k tps by Cerebras, I believe we
       | are entering the next stage of LLMs, where companies will also
       | compete on throughput
        
       | qsera wrote:
       | Tokens per seconds is the "Megapixels" of AI marketing!
        
         | Octoth0rpe wrote:
         | I mean, sure, in the sense that they're a real and meaningful
         | number for most of the spectrum on offer, and only gets silly
         | when the number gets too high? There's a pretty big usability
         | difference between 10t/s and 100t/s, and I can imagine
         | similarly for 100->1000. I don't know about > 1000, but let's
         | not pretend that the number is meaningless.
        
           | qsera wrote:
           | It is pretty meaningless for something that calls itself
           | intelligent.
        
       | harel wrote:
       | A few things in life I can't fully grasp why they are so sought
       | after. One is that constant need to exhibit growth. As if being
       | massive and staying as massive is not good enough, one has to
       | always and continuously grow. The other is constant speed
       | increases. We're already operating at 50x speed. My output is
       | much wider and so much faster, I am sometimes my own bottleneck.
       | And now as if that is not enough we want more speed. "I want a
       | full software product from scratch in 12 seconds, Because 5
       | minute is too long and I got things to do..."
       | 
       | Really?
        
         | philipkglass wrote:
         | I remember when I had to wait minutes to get a high resolution
         | image over a dialup connection. When computer and
         | communications hardware advanced enough that I could get 30
         | high resolution images every second, there were brand new uses.
         | In the case of LLMs, I could imagine that much faster
         | operations allow you to introduce them as parts of systems that
         | need to react to the real world at high speed, like factory
         | equipment. Showing that a model can do the usual LLM tasks at
         | extremely high speed is just a demo proving that the approach
         | works.
        
           | harel wrote:
           | The example in the video was a generation of a dashboard app
           | of some sort. I can do that with a "normal speed" Claude in a
           | few minutes. The difference is a few minutes. This is
           | compared to a few weeks in old school development time. I
           | don't have a problem with taking it a little "slow" (as in -
           | few minutes) and lending my thought to it rather than just
           | going for fast generation and who knows what's inside. I get
           | your use case, but this is a specialised one, and not the one
           | 90% of people will think of - everyone want that fast app in
           | 12 seconds... Or so it seems from me being downvoted on that
           | comment.
        
           | anothereng wrote:
           | yeah at a very high speed the agent can code the solution
           | when you ask it for something on the go. Imagine it be able
           | to make a feature as fast as a website loads sometime in the
           | future that would feel like magic
        
         | sidrag22 wrote:
         | different use cases for different people. some people are
         | nurturing a code base and ensuring it doesnt become a gross
         | mess so they become the bottleneck. some people are just trying
         | to prompt stuff into existence and dont know what sql is.
         | 
         | I think this site often overlooks that second group and how
         | large it likely is.
        
       | eli wrote:
       | Neat. The frontier models have gotten pretty impressive, but
       | they're all a bit too slow for interactive, human-in-the-loop
       | coding. It incentivizes vibecoding and running multiple agents in
       | parallel. A fast agent feels more like a partner.
       | 
       | For a while I was running Cerebras GLM 4.7 for a bunch of tasks.
       | Not a very smart model, but it's fantastic to be have a live
       | prototype of a site up and be able to type "make the fonts
       | bigger. No not that big" and see it change in real time. And MiMo
       | 2.5 is a _lot_ more capable than GLM 4.7.
        
         | ignoramous wrote:
         | > _And MiMo 2.5 is a lot more capable than GLM 4.7_
         | 
         | MiMo 2.5 is not the same model as MiMo 2.5 Pro.
         | 
         | GLM 5.1 is z.ai's lastest iteration & is one of the popular
         | open weight coding models.
         | 
         | If you've had the chance, how does GLM 5.1 (which is now more
         | expensive than MiMo 2.5 Pro after its recent 70% price drop)
         | compare?
        
           | eli wrote:
           | GLM 5.1 is very good. Definitely a contender for best open
           | weight coding model. Nothing like 4.7.
           | 
           | But quite a bit more expensive than MiMo 2.5 Pro. Like 5x to
           | 10x more on my little tests, at least by the API rates.
        
         | maxdo wrote:
         | i tried glm 4.7 for agents that write code. simple scripts
         | 200-1000 LOC. extremely bad . Had to abandon cerebras oferning,
         | their smart models are only on enterprise plan.
        
           | jona-f wrote:
           | glm 4.7 is quite old by now. I don't even use 5.1 anymore,
           | cause I found kimi k2.6, mimi 2.5 pro, deepseek v4 pro and
           | qwen 3.7 all better than glm 5.1
        
       | Oras wrote:
       | 1k TPS is great, but I'm more fascinated by the amount of AI
       | generated comments in this thread!
        
         | eli wrote:
         | Like what?
        
           | adam_arthur wrote:
           | There are many with subtle tells.
           | 
           | Not nearly as obvious as the ones from 6 months ago, but
           | seems to be more the use of hyperbolic phrasing in a
           | particularly unnatural way.
           | 
           | The assess/explain, then hyperbole at the end kind of
           | structure.
           | 
           | Top comment looks suspicious from this perspective, but it's
           | kind of a losing battle to be able to differentiate them with
           | sufficient accuracy anyway
        
         | trollbridge wrote:
         | Comments at 1,000 TPS is a terrifying future.
        
           | 0xbadcafebee wrote:
           | I prefer a thousand smart AI comments to a thousand dumb
           | human comments
        
             | wartywhoa23 wrote:
             | Well, you can just vibecode a complete AI echochamber
             | version of HN!
        
       | goyozi wrote:
       | Fast AI seems genuinely exciting and somewhat unsettling to me.
       | Right now Claude is faster than me on some tasks but we're at
       | least close. I have a prompt to clean up a PR that's been running
       | for 1h now and I expect it to take another few. It's hard to
       | imagine how the workflow would look like if it was near-instant.
       | On the one hand, it might be easier to focus. Some prompts take
       | so long that I start to multitask and regret it later. On the
       | other, AI that takes a few seconds to max few minutes to solve
       | what used to take hours or days? That's a game changer and I
       | don't even know where we fit in.
        
         | ipkstef wrote:
         | asking for curiosities sake. What kind of PR loop are you
         | running that takes a few hours?
        
           | ketzo wrote:
           | not OP but usually for me this means long verification loop;
           | waiting 10min on CI checks, that kind of thing, rather than
           | actual 1hr wall clock of token generation
        
             | devmor wrote:
             | Or slow MCP servers that are waiting on HTTP calls from
             | APIs, playwright/other UI instrumentation, etc.
        
             | RussianCow wrote:
             | But those things won't be sped up by a faster LLM, so I
             | feel like that's not what the OP is talking about.
        
               | goyozi wrote:
               | Well, I used an extreme example. OTOH, I've done quite a
               | few of those ,,fix CI" or ,,migrate X" prompts recently
               | and while there is a fixed component like running CI /
               | builds, I'd say the LLM time is still around or above
               | 50%, especially at the beginning of the project. Then
               | there's also regular tasks that now take minutes per
               | message which completely get me out of the zone. I
               | imagine iterating on those in near real time would be a
               | big change.
        
           | goyozi wrote:
           | I'm rewriting our integration test suite to run tests in
           | parallel. I have the changes split across 7 branches, and
           | each needs to be fixed to have no flaky tests. I told it I
           | want 3 consecutive CI runs with no flakes and no artificial
           | fixes / assert removals etc. We'll see what comes out; it's
           | almost a side project so there's not much to lose other than
           | some of my weekly limit that resets soon.
        
             | yunohn wrote:
             | > a side project so there's not much to lose other than
             | some of my weekly limit that resets soon
             | 
             | Basically the entire token-maxxing AI hype train in a
             | nutshell. Lovely!
        
               | drob518 wrote:
               | I'm curious when folks will tire of lighting money on
               | fire. Companies are already starting to scale back a bit,
               | but the AI companies are still nowhere near
               | profitability.
        
               | goyozi wrote:
               | wdym? Nobody's paying me or rewarding me for using these
               | tokens. I had some spare in my subscription limit (we're
               | not on token pricing), so I decided to try an ambitious
               | task that may reduce our CI times and improve our DX
               | significantly. That's hardly "the entire token-maxxing AI
               | hype train in a nutshell".
        
         | pianopatrick wrote:
         | We fit in for the things that are not artificial.
         | 
         | So long as AI lives in server farms, humans will be needed for
         | tasks in the physical world.
         | 
         | It's only if we combine AI with robots that things get really
         | dicey.
        
           | fartfeatures wrote:
           | This is very dystopian in my opinion. I'm not the arms, legs,
           | sensors and actuators for a machine super intelligence. I
           | wouldn't treat another human as my slave because they aren't
           | as intelligent as I am any more than I would expect to become
           | a slave for a machine. This is our world (for now) and that
           | is why we fit in. Not because we can serve.
        
             | davedx wrote:
             | Agree
             | 
             | https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_S
             | c...
        
               | fartfeatures wrote:
               | Sounds like snuff porn, not my sort of thing but thanks
               | though.
        
               | ionwake wrote:
               | "It seeks revenge on humanity for its own creation."
               | 
               | This is brilliant as it reminded me of a famous
               | hitchikers quote:
               | 
               | "In the beginning the Universe was created. This has made
               | a lot of people very angry and been widely regarded as a
               | bad move. -- From The Restaurant at the End of the
               | Universe (Book 2)"
               | 
               | Maybe we are stuck in an eternal loop
        
             | cicko wrote:
             | "This is our world" sounds a bit exclusive towards other
             | living and sentient beings on this planet.
        
               | nativeit wrote:
               | It depends on what's included in "our".
        
             | throwaway67678 wrote:
             | Never read Asimov's Multivac novels? Admittedly not all of
             | them are stellar examples of a future to follow
        
             | Muromec wrote:
             | You don't need ai superintelligence, just plain capitalism
             | is enough
        
         | flexagoon wrote:
         | I'm using Deepseek-v4-pro as my main model and this is
         | sometimes pretty annoying, I have to do some easy boring task,
         | think "I'll just leave the agent to do it and go take a nap",
         | but it's already done writing the code before I even walk away
         | from the computer
        
           | RussianCow wrote:
           | Do you mean Flash and not Pro? I haven't tried it personally,
           | but according to OpenRouter, the fastest DeekSeep V4 Pro
           | providers are only ~50tps. That's slower than Claude Opus.
           | 
           | https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp.
           | ..
        
             | specproc wrote:
             | Yeah, flash is crazy fast, but I've found performance
             | variable.
        
               | binary0010 wrote:
               | Flash is amazing if you know the domain really well.
               | 
               | E.g. occasionally it makes the dumbest mistakes you've
               | ever seen and can't correct them. However it's fairly
               | rare, and if you know the domain really well,
               | occasionally popping in the code and pushing it towards
               | the correct solution takes like 20seconds or whatever.
               | 
               | So the speed you can move with flash + high domain
               | knowledge beats opus by a mile in my experience.
               | 
               | I tried to switch back to 4.8 for a bit when it came out,
               | feels so bad waiting 20mins for a mediocre solution when
               | I could have had everything complete - with multiple
               | iteration cycles - in flash in like 3-5mins.
        
               | addozhang wrote:
               | Yes, you don't need much domain knowledge to use Opus,
               | but it's just way too expensive.
        
             | sarjann wrote:
             | I don't think token speed matters as much when a lot of
             | tokens are needed to achieve a task. E.g. artificial
             | analysis benchmarks where deepseek v4 is one of the biggest
             | token burners to go through the benchmark.
        
               | brianwawok wrote:
               | Both matter.
        
             | SwellJoe wrote:
             | In recent benchmarking I've been doing, DeepSeek V4 Pro was
             | the fastest of 21 models, by a comfortable margin
             | (https://swelljoe.com/html/bench-report-final.html). Faster
             | than Claude Opus 4.8, which was the second fastest (Mistral
             | doesn't count because it seems to have refused to
             | participate). But, it's a limited data set, just a few
             | benchmark runs of a limited set of tasks. It's entirely
             | possible I happened to be calling the API at its least busy
             | time and maybe Claude got hit during a busy time.
        
             | flexagoon wrote:
             | No, I mean Pro. I use it through OpenCode Go so I don't
             | know what provider it uses under the hood, but it's very
             | fast in my experience.
        
           | tmaly wrote:
           | This reminds me of the Peter / Boris comments on writing
           | loops to keep the agents busy.
        
           | throwaway67678 wrote:
           | Agent mania setting in
           | 
           | It's also pretty funny sometimes how it gives weird future
           | roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months",
           | etc.) and when you tell it to actually do those changes it's
           | pretty much done in half an hour
        
             | smith7018 wrote:
             | I've long believed those numbers were faked by
             | Anthropic/OpenAI to serve as a form of advertisement. The
             | estimates are impossible to verify and their ability to do
             | "2 days of work" in 10 minutes will presumably make the
             | user go "Wow, I just saved SO much time!" Plus, the
             | unnecessary text eats up the users' tokens so it helps the
             | companies on the backend, as well.
        
               | leodavi wrote:
               | I agree with you that labs are benefiting from those
               | outputs but I'm skeptical that labs are purposefully
               | training the models to produce those outputs.
               | 
               | Raw pre-training data includes plenty of conversations
               | between professional builders and some of those include
               | estimates.
               | 
               | I believe the outputs are a training coincidence with
               | consequences that are opportunitistic for the labs.
        
               | AgentMasterRace wrote:
               | All the models have broken estimates. They're trained
               | heavily on jira and GitHub tasks and issues, that's why
               | their estimates are human.
        
               | esperent wrote:
               | Even for humans the estimates are way off, unless it's
               | based on data that has some serious padding.
               | 
               | That said, it'll often say "2 days of work" and then
               | complete the coding in 30 minutes, and while that's
               | amusing, afterwards, I'll need to manually test, or send
               | to other people for review, or realize the agent only
               | actually did half the work and I need to do a second pass
               | (or a third etc.) and then often getting the feature in
               | does genuinely take two days.
        
               | dizhn wrote:
               | All models do it. It's their training. They didn't have
               | "a person does this in a week but an LLM could in a
               | minute" in their training yet. They also don't have the
               | concept of elapsed time unless you ask them how long
               | something has taken.
        
               | Terretta wrote:
               | > _the estimates_
               | 
               | It doesn't estimate.
               | 
               | It generates tokens that read like estimates associated
               | with the context in its training material.
               | 
               | What would you expect the generator to output instead?
        
               | ghshephard wrote:
               | I think people are continuing to view these systems as
               | pure LLMs - when that ship sailed 6+ months ago. Between
               | being able to review memory, using agent harnesses and
               | sub agents and skills to go out and discover information
               | - modern systems (Codex, Claude Code, Cursor) - _use_
               | LLMs - but the LLM is only a small component of it.
               | Compare what you get from sending a request to a chatbot
               | like ChatGPT - to what you can from a modern harness. The
               | output is influenced by the LLM, but it 's no longer a
               | "model making a token prediction based on training
               | material and RLHF" - that's a very 2025 way of looking at
               | these systems.
               | 
               | Even Gary Marcus is starting to come around and realize
               | that his priors are no longer as relevant as they once
               | were.
        
               | Terretta wrote:
               | You think someone is, or even should, special case things
               | like estimates? What else deserves that level of
               | intervention so they look less dumb?
               | 
               | Logistics for getting to the car wash next door?
               | 
               | In the mean time, alas, no, we can see from actual
               | prompts sent directly or through sub-agents, and actual
               | replies, estimates remain LLM generated.
               | 
               | Though, this discussion here could change that, because
               | indeed there is a lot of special casing and context
               | stuffing going on, one of the oldest being today's date
               | for example.
               | 
               | * * *
               | 
               | I _did_ read the Claude Code leak, and use pi, etc. So I
               | disagree with your premise rather strongly. Today 's
               | "systems" remain, roughly, piles of markdown and context
               | engineering wrapped in UI affordances, and behave very
               | similarly today to how they did in 2024 for those already
               | engineering context and delegating.
        
               | ghshephard wrote:
               | I do a lot of code bisecting with Claude Code - and it
               | spends hours running experiments - looking at experiment
               | results, making guesses as to what to try next for an
               | experiment - until it eventually comes around to a
               | working code pattern. I mean - maybe this is as much a
               | reflection on me as anything else - but it's pattern of
               | logic isn't that much different from what I would do. It
               | knows, in general, what tools and APIs it can call - it
               | tries something - observes the result, and then comes
               | back and tries different experiments based on
               | success/failure - mostly efficiently bisecting to a
               | solution.
               | 
               | I'm still lower-down of the capability scale - as I'm
               | still manually directing agents to do these wiggins loops
               | - obviously the next step up is to direct the code-loops
               | which control the agents. I just haven't got my tooling
               | nailed in place to the point where I find that's more
               | productive.
               | 
               | I actually might agree with you that this is mostly just
               | "next token prediction" - if I can concede that's really
               | all I do as well.
        
               | Terretta wrote:
               | > _I actually might agree with you that this is mostly
               | just "next token prediction" - if I can concede that's
               | really all I do as well._
               | 
               | Yep. Pretty sure I've got an LLM inside too.
               | 
               | The other replies complaining that my thinking is so 2023
               | -- on the contrary, what's evolved is my own apprehension
               | of how LLM-like most "responses" from humans prove as
               | well.
               | 
               | To be sure, there are other mechanisms at play as well,
               | significant differentiation in our... Volume of training
               | material? Quantizations/compression? Model architecture?
               | Just-ahead-of-time forward branching with back
               | propagation? Double loop adaptive learning? You know,
               | harnessing the LLM. :-) Dare we call it executive
               | function?
               | 
               | LLM mode becomes particularly apparent when conversing
               | with Alzheimer's patients in the stage where short term
               | memories do not form but they retain access to long term
               | memory up to, say, 5 years ago or so. Fifty years of who
               | they are, and one can trigger nearly identical responses
               | with nearly identical prompts.
               | 
               | But that same person may be able to debate 1950s politics
               | while being unable to complete making a sandwich.
               | 
               | If they didn't know of new shortcuts for a task, would
               | almost certainly not "estimate" but "intuit", or
               | "instictively" respond (apply heuristics), largely based
               | on their "priors" aka training material.
               | 
               | If you sit with them and chat a while, you'll even get
               | the kind of looping you get from Qwen trying to think
               | when context is too full.
               | 
               | And if we believe this at all, then ... we should stop
               | scrolling tik tok. Time to read a book. Have an
               | experience. Fine tune. :-)
        
               | 8note wrote:
               | rather than special casing, make real data based on chat
               | logs for how long things took both in calendar and chat
               | time
        
               | irthomasthomas wrote:
               | No one is bitter lesson pilled anymore. Everyone is
               | pivoting to neurosymbolic systems. It looks like Gary
               | Marcus was right.
        
               | nl wrote:
               | > No one is bitter lesson pilled anymore.
               | 
               | Will the 10T parameter Mythos model be released this
               | month or next month?
               | 
               | They better soon because it is generally accepted that
               | one of the reasons GPT 5.5 is better at hard tasks than
               | Opus is because of its parameter size - and that Opus 4.8
               | remains competitive only be scaling test-time compute
               | (see how many more tokens it uses than GPT 5.5)
               | 
               | https://www.reddit.com/r/LLM/comments/1sz8bjz/parameter_e
               | sti...
        
               | wild_egg wrote:
               | How is neurosymbolic not aligned with the bitter lesson?
               | The bitter lesson is completely agnostic to architecture.
        
               | carterschonwald wrote:
               | you might like the stuff in my work of oh my pi, its a
               | test bed for my ideas around making these tools more
               | reliable. hoping to maybe have a native ui iter of the
               | real thing that this is a test bed for this summer.
               | 
               | https://github.com/cartazio/oh-punkin-
               | pi/blob/main/scripts/b...
        
               | legulere wrote:
               | It generates tokens by estimating what the next token is
               | going to be.
               | 
               | Sure it cannot think like a human, but given it's input,
               | it should give a good statistical answer (approximating
               | not of how long it actually takes, but what a human would
               | say how long it takes).
        
               | incr_me wrote:
               | Obviously there isn't a hidden corpus of logs of coding
               | chatbot assistants that has been accumulating over the
               | years, but these coding chatbot assistants output tokens
               | that resemble how we all imagined a coding chatbot
               | assistant would have operated had it existed in the first
               | place to end up in a corpus. "Training material" includes
               | supervised fine-tuning, preference training, RLHF, and so
               | on, so that certain outputs (like these timeline
               | estimates) may really have been decided (at some level of
               | conscious awareness) by product teams.
        
               | nl wrote:
               | _Actually_ in this case they possibly _are_ estimates.
               | 
               | It's been known for some years[1] that LLMs do regression
               | in-context. Frontier models have been trained against
               | many, many issue text that include task break downs and
               | estimates.
               | 
               | [1] https://arxiv.org/html/2409.04318v1
        
               | kube-system wrote:
               | Interesting. So it may have learned how to estimate as a
               | human but doesn't understand that it doesn't operate at
               | that speed :D
               | 
               | I wonder if there's a reasonable way to give an llm
               | parameters that give it a concept of its own execution
               | speed. Seems that could be useful for multiple purposes
        
               | InterviewFrog wrote:
               | This is so 2023. The thought process.
               | 
               | At that time the predominant view was that LLMs were
               | nothing but stochastic parrots, that they would plateau,
               | and that hallucinations couldn't be fixed.
               | 
               | At this point I doubt there are any AI sceptics left.
               | That ship has long sailed. The only thing that matters is
               | whether the estimates are accurate, and AI can improve on
               | that too.
               | 
               | Even humans only estimate based on neurons firing in
               | prior patterns.
        
               | mediaman wrote:
               | The funny thing about this comment is that neural
               | networks are universal function approximators.
               | 
               | The most fundamental essence of what they do is exactly
               | what you say they don't: estimate.
        
               | airstrike wrote:
               | Funny and ironic in a way, but the point still stands
               | that they do not actually estimate the time it will take.
        
               | greenavocado wrote:
               | > they do not actually estimate the time it will take
               | 
               | You can't prove that )))
        
               | airstrike wrote:
               | Right, but extraordinary claims require...
        
               | taneq wrote:
               | Therein lies the rub, no? To accurately predict the next
               | token produced by a process, it's necessary to model that
               | process. If the process is a human attempting to estimate
               | the duration of a task, then in some sense the LLM is
               | modeling the estimation process. We're well past the
               | point where it's credible to claim that LLMs just
               | regurgitate their training data.
        
               | KronisLV wrote:
               | I mean in general I'd rather take slightly inflated
               | estimates than the odd sprint poker stuff where other
               | devs and PMs negotiate hours down and before you know it
               | you're also stuck fixing nitpicky reviewer comments on
               | code that is already good enough and have to send a
               | release at like 7 PM, ofc also without enough tests or
               | even enough manual checks and testing, cause people
               | repeatedly act against their self-interest and try to
               | compress timelines, thinking that that's somehow good for
               | them.
               | 
               | At least with AI that actually does things more quickly,
               | there is a bit more breathing room (introducing AI is
               | easier than changing a given environment).
               | 
               | Aside from that, I wonder how much variety there is in
               | practice: between "Oh yeah, I added that new button while
               | we were in the meeting" and "The new button feature will
               | be ready in Q3 according to the roadmap, once we have
               | sign-off from all the stakeholders."
        
               | Narciss wrote:
               | Nah it's all from the pretraining data
        
               | overgard wrote:
               | I tend to be cynical about AI companies, but I'm guessing
               | the bad estimates more just come from a complete lack of
               | actual data it could use for that so it's more or less a
               | hallucination.
        
               | BobbyTables2 wrote:
               | That's right up there with Scotty in the classic Star
               | Trek always multiplying time estimates by 4 so he looks
               | like a "miracle worker"
        
             | throw1234567891 wrote:
             | It repeats what it has seen in the training data. Expecting
             | it to reason about the complexity of a task is a pipe
             | dream. The best is to tell it not to come back with
             | estimates, and when it does, remove them anyway.
        
               | andai wrote:
               | I added "you can do anything, believe in yourself" to
               | system prompt, and task completion increased
               | significantly.
        
             | andai wrote:
             | I heard an anecdote. Guy spent several days trying to
             | convince his AI agent to build a feature. Kept saying it
             | was crazy complicated, would take weeks.
             | 
             | Finally he convinced it to try. It one shotted it in 30
             | seconds.
             | 
             | Turns out the agents' idea of what is hard and easy also
             | comes from Common Crawl.
        
               | wild_egg wrote:
               | Why on earth would you spend any time at all convincing
               | an agent of anything? You say "just do it" and off it
               | goes.
        
               | dr_dshiv wrote:
               | Ya, but "doit" is 2x more efficient
        
               | brianwawok wrote:
               | Uh Claude tries real hard to dodge work. Talks about how
               | it's really hard 10 PRs. Finally convince it to do as 1.
               | It stops 10% through and says ok done with PR 1, we can
               | work on the last 9 tomorrow. Ugh.
        
           | behnamoh wrote:
           | Same. How can DeepSeek serve the V4-Pro at such high speeds
           | despite the sanction?
        
             | rubyn00bie wrote:
             | The sanctions only "prevent" them from directly buying
             | NVidia's latest and greatest in the sense that NVidia can't
             | sell directly to them. Essentially, there are companies now
             | who are in a country without the sanctions, they buy from
             | NVidia (or a partner), and then ship them off to China. For
             | the orgs in China doing this, there's zero legal risk
             | besides having foreign customs service intercept the
             | shipment and losing the goods. For NVidia there is zero
             | incentive to care, as long as they look like they do,
             | because sales are sales. You can bet Jensen ain't losing
             | sleep over it.
             | 
             | GamersNexus had a really good investigative piece (~3hrs
             | long) on this where they went to China and met with grey
             | market sellers. That piece absolutely pissed off NVidia and
             | resulted in a fight with Bloomberg too.
             | 
             | Deepseek may be also be running inference on oodles of
             | Chinese hardware but it wouldn't surprise me for a second
             | if they just acquired Blackwell chips through the grey
             | market. The original Deepseek models were all trained using
             | NVidia chips if I remember right.
        
               | seewhydee wrote:
               | That wouldn't explain why Deepseek is fast relative to
               | other Chinese providers, especially considering that
               | they're reportedly ahead of the curve among Chinese
               | companies in moving off Nvidia. I think their quant fund
               | background has more to do with it. Their models are
               | clearly designed with performant inference clearly in
               | mind.
        
               | ljosifov wrote:
               | Yes, it's performant, and esp performant at non-trivial
               | context depths. DeepSeek-V4 DS4 (and Flash - DS4F) drop
               | tok/s speed much less than the rest. On my M2 Max it took
               | context depths of 768K to drop tok/s to ~10 tok/s.
               | 
               | https://x.com/ljupc0/status/2062457314414587996
               | 
               | Other local models I've checked drop to unusable speeds
               | way sooner. Only other model with similarity favourable
               | curve I've tried is nemotron-cascade-2-30b-a3b. But it's
               | a small model, way dumber than DS4F.
               | 
               | Coding agents use cases have large context depths. The
               | rate of decline is as important as the headline number.
        
           | binary0010 wrote:
           | I exclusively use deepseek v4 flash now, completely stopped
           | using slow models like Claude.
           | 
           | Basically I never have to wait - yes I have to tell it little
           | corrections occasionally (but I know the domain really well
           | so that's not an issue), but it's so much faster than
           | anything else it's kinda crazy. I love the super fast speeds
           | with high involvement development cycle.
           | 
           | I actually enjoy using agentic development flows for the
           | first time now - whereas with Claude I absolutely hated it.
           | That 5 to 20 min wait after every prompt absolutely killed my
           | desire to even want to work at all.
        
           | SwellJoe wrote:
           | DeepSeek is the fastest model in the benchmarks I've been
           | doing (https://swelljoe.com/post/will-it-mythos/). Followed
           | not so closely by Opus 4.8 and even less closely by Gemini
           | 3.5 Flash and GPT 5.5. I've been really impressed with it, so
           | far. It's also among the best at doing the work, though still
           | trailing the frontier models from Anthropic and OpenAI.
        
           | throw-the-towel wrote:
           | FWIW, for me just today it got itself into silly rabbit holes
           | twice, and both times I had to fix things myself. Scarily,
           | this is something I catch myself doing as well.
        
           | andai wrote:
           | With Flash it's basically instant for smaller tasks, yeah.
        
         | recroad wrote:
         | Woah - what's the prompt and what's the PR?
        
           | goyozi wrote:
           | I replied in more detail under another comment. TLDR: fixing
           | flaky CI across multiple branches
        
         | HarHarVeryFunny wrote:
         | I don't see many companies being willing to pay 3x more for
         | faster code generation. Cloud-based AI code generation is
         | already extremely fast, and hardly the bottleneck for most
         | software product development.
         | 
         | There can't be many normal use cases where there'd be any cost
         | benefit.
        
           | fragmede wrote:
           | The "traditional" way we vibe code is human software
           | developer prompts AI -> AI generates code -> (human checks
           | code) -> code gets compiled/deployed/etx -> users use
           | "binary". At the speed of 1000 tok/sec, user prompts
           | obliquely -> AI vets generated code -> code deployed -> user
           | gets response from deployed code.
           | 
           | It's a cute toy right now, but you can tell an LLM that it's
           | an http server, and have it respond directly to a web browser
           | hitting it. It generates headers in response, as well as page
           | contents. As 1000 tok/sec becomes three new normal, we will
           | come up with newer ways to use it outside of toy fiction
           | encyclopedias.
        
             | HarHarVeryFunny wrote:
             | 1000 tokens per sec is still massively slower than serving
             | a normal web page - if something doesn't respond in a few
             | seconds many people give up.
             | 
             | I'm not saying there aren't any use cases for super-fast
             | (and super-expensive) generation, but it does seem a bit
             | niche. If it was free then sure faster is better, but what
             | are the mainstream use cases where people might pay 3x more
             | for a faster version of something that is already fast?
             | 
             | I think it would have to be an application where it paid
             | for itself - where the 10x faster response was actually
             | worth more than 3x the cost to you - where the extra speed
             | was worth the extra cost.
        
         | efromvt wrote:
         | I'd be very curious about the bottleneck breakdown in most
         | current software dev - I suspect inference is far from the
         | bottleneck in most things I do, though driving it to 0 would
         | still be _nice_. I do agree that if it was 0 we 'd probably
         | change development approaches to reduce the new bottlenecks
         | more, but it'll take full-process innovation to really get
         | something near-instant.
         | 
         | (I should go measure this now, I'm curious)
        
         | ilaksh wrote:
         | Use Claude fast mode and turn off thinking. Tell it to just
         | explain what it's plan is to you at a high level.
         | 
         | It will go much faster.
        
         | skybrian wrote:
         | If we get low enough latency, there's no reason to multitask.
         | You can ask it to do one thing at a time and immediately see
         | what it did. That's a nice way to work!
         | 
         | This is normal interactive UI for tasks that aren't compute-
         | intensive. Programs spend most of their time idle, waiting for
         | us to click a button. We shouldn't be waiting for them or
         | spinning more plates to keep them busy.
         | 
         | However, a faster llm isn't enough. You also need fast compiles
         | and fast tests.
        
         | binyu wrote:
         | > Right now Claude is faster than me on some tasks but we're at
         | least close.
         | 
         | I dont doubt it, but I don't think you can spawn 10 copies of
         | yourself working simultaneously.
        
           | AlecSchueler wrote:
           | No, but nor can you keep track of what 10 agents are doing
           | simultaneously. Hence the multitasking regret.
        
             | pixel_popping wrote:
             | An agent can, you don't need to watch tasks, you can have a
             | live digest with another tool.
        
               | logankeenan wrote:
               | Do you have any recommendations for a live digest tool?
        
         | UncleOxidant wrote:
         | Have you tried Gemini 3.5 Flash? It's quite fast. Amazing how
         | fast it finishes tasks. Much faster than Claude.
        
         | switchbak wrote:
         | Now the next bottleneck is the compiler - which we can model in
         | an LLM! It's only wrong 15% of the time :)
         | 
         | But truly, using Cerebras at ~2k tokens/s, with very low
         | latency is like a vision into the future. You start to rework
         | your workflow around things that can happen without onerous
         | manual review - stating the conditions for success, etc. It's
         | rare that I have a problem that maps well to that, but I expect
         | this is where things are headed.
         | 
         | Of course the fast models tend to not be the SOTA ones, but if
         | that was the case - high quality and near-instant thinking,
         | that's a game changer that I don't think we're really prepared
         | for. The things that get unlocked with higher-than-reasonable
         | speed become very interesting.
        
         | coderbants wrote:
         | It cuts both ways. Sometimes I ask Gemini 3.5 Flash to do
         | something for me and it kicks it out almost instantly and it
         | works great, and it's a bit scary how quickly it can do that.
         | 
         | Then I ask it to do something else and it goes off-road and
         | where I used to be able to interject with a "wow wow wow,
         | that's not right", by the time I see the text on screen and
         | react it's already made massive changes. Short of making it
         | commit between every edit it's hard to prevent it from going
         | wrong as quickly as it goes right (and even then, it can make a
         | boo-boo on a remote API too depending on how much privilege it
         | has).
        
           | bendangelo wrote:
           | I use planning mode in opencode. It has a prompt to tell it
           | to plan it out etc. Then I execute with a smaller model. it
           | works well
        
         | Bombthecat wrote:
         | Living on the street or cave lol
        
         | dkersten wrote:
         | I've been playing around with groq and GPT OSS which they run
         | at 1000 TPS (20B) or 800 TPS (120B) and the speed feels quite
         | magical.
         | 
         | I haven't tried cerebras' 3000 TPS yet but I did try the demo
         | of that 15,000 TPS model whose name escapes me right now.
         | 
         | I'm not sure if it makes a meaningful difference for my actual
         | work, but it sure is amazing to watch it generate a screen full
         | of text in the blink of an eye.
         | 
         | I do think it's super useful for rubbing little validation
         | checks like showing it a diff to ensure that the changes are on
         | task, and being able to do those quicker really helps because
         | it means you can do many focused checks without them getting in
         | the way.
        
           | robberth wrote:
           | https://chatjimmy.ai/ ?
        
             | msdz wrote:
             | AFAIK Taalas, the company behind this demo, still only have
             | their initially "hardwarized" model available to test in
             | ChatJimmy, which IIRC is a rather stupid Llama 3ish 8b.
             | 
             | Don't get me wrong though, that demo is still incredibly
             | impressive & makes me very much excited for the hardware-
             | based model era (potentially) ahead.
             | 
             | Once you've experienced those speeds, you really start to
             | think about the whole class of things that becomes
             | possible; massively parallel decode paths, extensive
             | reasoning loops, etc...
        
               | hedgehog wrote:
               | For scale though if three or four chips that size can
               | replicate a Qwen 27B experience that'll be quite useful.
        
         | OtomotO wrote:
         | > That's a game changer and I don't even know where we fit in.
         | 
         | Doing non trivial work.
        
         | giancarlostoro wrote:
         | You can run Claude in "fast" mode it costs you more on your
         | compute use, but its reasonably fast. I'm not sure I care to go
         | "faster" than where things are now, otherwise you start losing
         | on manual review and testing time. I would argue that Claude
         | can poop out weeks (if not months) of coding effort in a few
         | hours, and get you insanely close to a good product if you
         | define the tech stack, and the business rules. Can it goof here
         | and there? Sure. You can also make it refactor all the code on
         | a whim faster than any intern could. I think it's good enough
         | to avoid you mundane stupid bugs in most cases. I don't know
         | what people who hate it are doing, maybe they're not even
         | trying at all or are dismissing it from the first output (as
         | though everyone writes perfect code in one shot right?) or
         | maybe its just pride getting in the way of them using a decent
         | tool to its true potential.
        
         | cman1444 wrote:
         | Reminds me of the doherty threshold. When will AI respond in
         | less than 400 milliseconds?
        
         | fnordpiglet wrote:
         | I've used codex code optimized for a few projects and it's
         | unsettling how fast it is. It's hard to think fast enough to
         | keep up with it. Mental fatigue was a real challenge because
         | the decisions that required my input were rapid fire and
         | legitimate ambiguities that were appropriate escalations. I am
         | too much a geezer for the intensity of it. But I'll take it!
        
         | noisy_boy wrote:
         | The first wave was just getting half decent answers. The second
         | wave was being able to choose between actually getting
         | reasonably ok coding results OR getting not so great results
         | very fast. The third wave would be getting good results fast.
         | 
         | We need to really worry when we get amazing results very fast.
        
       | h14h wrote:
       | The gated "ultra-speed" phenomenon seen here and with the
       | Cerebras Kimi K2.6 release, while understandable, is somewhat
       | troubling IMO.
       | 
       | Getting ~1000 TPS on near-frontier intelligence is a step change,
       | and enables whole new use-cases for applications. Seeing limited
       | compute resources beget selective access makes me worry for the
       | future of competition.
        
       | trilogic wrote:
       | Pfff time wasting. 1 password between 8-16 characters, and this
       | and that... What??? 2 Captcha after captcha, come on 3 Service
       | unavailable This service is not available in your region yet.
       | 
       | Are you kidding me. Come back when you are ready for the users. I
       | was hopping to try it, what a frustration.
        
       | prplfsh wrote:
       | This will be really powerful for voice. Being able to reason
       | makes LLM so much smarter but with voice your latency budget is
       | so tight that you can't spare the time typically.
        
         | jeffrallen wrote:
         | This is true for humans too. Lol
        
       | pullshark91 wrote:
       | It's interesting but not game-changing IMO. Speed here is not a
       | bottleneck.
        
       | gertlabs wrote:
       | MiMo V2.5 Pro (regular speed) remains the strongest open weights
       | agentic coding model we've tested -- it's been interesting to see
       | how little attention it has received relative to some lower
       | performing releases. And the "fast mode" pricing is very
       | competitive here.
       | 
       | Data at https://gertlabs.com/rankings
        
         | unrvl22 wrote:
         | why is deepseek v4 pro a lot lower than flash? where is mimo
         | 2.5?
        
           | gertlabs wrote:
           | DeepSeek v4 Pro struggles with a custom harness, and all the
           | models ranked above it don't, so it gets downweighted in the
           | agentic coding benchmarks (although it ranks better than
           | Flash in one-shot problem solving:
           | https://gertlabs.com/rankings?ow=1&mode=oneshot_coding). We
           | ran plenty of samples.
           | 
           | MiMo v2.5 is on there, as well as the pro version.
           | 
           | We found a few anomalies in our evaluations, which makes
           | sense -- if every new sub-release is better across the board
           | in every area of the model card, that should raise alarms
           | about benchmaxxing. But the main thing we found is that hype
           | != performance, and I trust our benchmark methodology
           | significantly more than the model cards the labs add to their
           | press releases.
        
             | digdugdirk wrote:
             | Can you explain more about how it struggles? I haven't
             | noticed any issues in my usage, so I'm just curious what is
             | meant by this.
        
               | gertlabs wrote:
               | It's likely overfit to common harnesses and iteration
               | patterns, so it struggles with formatting tool calls and
               | json in our testing which use our own harnesses (although
               | there is a lot of overlap with tools that would be found
               | in any coding harness like bash, apply_patch, etc.)
               | 
               | We didn't love the results because it draws negative
               | scrutiny to our benchmark, but the results are real and
               | done at scale and I think DeepSeek V4 Pro's inability to
               | do agentic work outside of environments it was trained on
               | is an important thing to measure, especially when so many
               | other models can generalize to new environments just
               | fine.
               | 
               | Google models also struggle with tools, but they have
               | very strong initial answers, so there is more potential
               | for them to bridge the gap with some better post-
               | training.
        
             | andai wrote:
             | Mimo struggles with my custom harness. (Ignores the
             | instructions and defaults back to its own preferred tool
             | calling syntax.)
             | 
             | Flash handles it fine, which I found amusing. (Since Mimo
             | is supposed to be opus level!) But Flash seems to work even
             | better in Claude Code...
             | 
             | With smaller models I always have the issue of needing to
             | adapt myself to _their_ preferred workflow... which sort of
             | defeats the purpose. Price is hard to beat tho :)
        
       | isusmelj wrote:
       | No note about the specific GPU they use. One might speculate.
       | B200? H200? H100?
        
       | PhunkyPhil wrote:
       | Obligatory taalas mention:
       | 
       | https://taalas.com/
       | 
       | Despite the performative UI components they have a shipped (demo)
       | product:
       | 
       | https://chatjimmy.ai/
       | 
       | This is only 3.1 8B and a very small context window, but at 17k
       | tokens per second it's likely enough to reliably call tools which
       | would make a huge difference in agentic applications. Assuming
       | they can bake in better models I'm just as bullish or even moreso
       | on this, considering this opens up edge computing at the
       | extremely low power requirement.
       | 
       | High tok/s is the future IMO.
        
         | kilroy123 wrote:
         | My dream is claude or codex running at this speed.
        
           | est wrote:
           | More realisticly, I hope qwen 3.6 27B on taalas.
        
       | desireco42 wrote:
       | I didn't use their pro speed but regular Mimo-v2.5, not even pro,
       | it seems really fast. I have plenty of tokens and subscriptions
       | but this is really impressive. I really don't need another one,
       | but I am tempted simple because it works so fast, can't imagine
       | how this fast service can be.
        
       | dakiol wrote:
       | So, regarding the productivity argument: I don't get it. It
       | doesn't really matter (for regular employees) that you can do now
       | in 2h what before it took 2 days. Why? Because it's not that you
       | have the rest of the day for yourself. You still have to work
       | 8h/day as usual. But now the pattern is different: instead of
       | enjoying the craft digging deeper into problems in the span of 2
       | days, now you are rushing into some slot machine with the hope of
       | it giving you the right answer with the right prompt.
       | 
       | So, if any, I would say it's worse for us. Obviously, it's the
       | completely opposite situation for corporations and executives:
       | they are loving the AI situation so much!
        
         | fullstop wrote:
         | It's making things less fun, for me at least.
        
           | linsomniac wrote:
           | Odd, I'm having the opposite experience.
           | 
           | The thing I really love about working with computers is when
           | I achieve something. That's the thing that makes me
           | figuratively, and sometimes literally, throw my fists into
           | the air and go "Yeaaah!"
           | 
           | With the AI tooling, I'm getting those more like a couple
           | times a week.
           | 
           | Plus, I'm using AI to attack the things in my day that are "a
           | drag", and getting them done too.
           | 
           | The highs are more frequent and the lows are not so low.
        
             | fullstop wrote:
             | Oh, sure, I can make things with it. But I have an
             | extraordinarily hard time saying that _I_ made something.
             | 
             | It feels like it cheapens the whole thing. Maybe I'm just
             | old, because I remember people saying the same thing about
             | code completion in Visual Studio back in the late 90s.
             | 
             | This is so much more than code completion, though.
        
               | dd8601fn wrote:
               | Exactly how I feel. _I_ didn't make a damn thing. I
               | essentially asked a chatbot to.
               | 
               | Did I ask for better things with some important concepts
               | pre-rolled? Yeah, of course. But that's so, so much less
               | interesting than having actually made a thing.
               | 
               | I try to remind myself that the output of my projects
               | have nothing to do with who I am, but the honest truth is
               | they always mattered to me.
               | 
               | Now that's dead, and it's never coming back. It ain't
               | exactly existential dread, but it _is_ something I've
               | lost.
        
             | dd8601fn wrote:
             | I did a deep binge on two or three projects I would never
             | do, and like five small ones that would have consumed
             | months.
             | 
             | It felt like that, kinda, for a bit. Now whenever it does
             | something for me I get nothing. I didn't do it... the
             | chatbot did. What's for me to celebrate? How can there be
             | any real pride or satisfaction for a thing that was just
             | handed to me because I asked for it?
             | 
             | If anything it diminishes my satisfaction looking back on
             | previous projects. They're "a few hours with a chatbot",
             | now.
             | 
             | The things I had to learn and the informed decisions I had
             | to make? All pointless trivia, now. A child could do it.
             | 
             | The magic and possibilities parts just all wore off after a
             | heavy run, and I don't know if that's ever coming back.
        
               | linsomniac wrote:
               | I hear what you and the other sibling comment are saying.
               | I, thankfully, somehow, am able to focus more on the
               | results than the process. Having fun playing a game (that
               | AFAIK no longer exists) with my family is still having
               | fun. Having people using a new apt cacher that fixes
               | problems with existing ones, and also can survive the
               | recent DDoS, is still a really great thing.
               | 
               | But, I'm not going to yuck your yum. I appreciate the
               | people who do jointery using hand tools, even if I'm out
               | here with a track saw and a router.
        
               | fullstop wrote:
               | Do you feel the same way about cloning a GitHub repo and
               | building it? It, too, achieved a result.
               | 
               | The track saw and router, imo, are existing libraries.
        
               | pmontra wrote:
               | > The things I had to learn and the informed decisions I
               | had to make? All pointless trivia, now. A child could do
               | it.
               | 
               | Probably this is a hyperbole. Did you do the experiment?
               | I expect that the child won't be able to do it. Ask an
               | adult. Same thing. Ask an expert of the domain. Maybe but
               | not as fast or as good as you.
        
         | ttoinou wrote:
         | In which world do you live where employees work 8 hours per day
         | ? They clock 8 hours per day maybe, but they don't work that
         | time
        
           | mettamage wrote:
           | I agree with you.
           | 
           | I am on Dutch subreddits a lot, to get a local pulse and not
           | to be too HN minded.
           | 
           | A lot of them would have vilified you by now. Some even would
           | have even questioned your morality.
           | 
           | Again, I agree with you. But clearly not everyone has this
           | view.
        
           | mystifyingpoi wrote:
           | Generally, when people say they are working 8h/day, they
           | don't literally mean it. Even "work" is basically impossible
           | to define for a SWE.
        
           | drob518 wrote:
           | I had a friend who was CEO of a startup tell me that he
           | typically only "worked" an hour a day, not because he was
           | lazy but just because there was so much nonsense in his
           | schedule. He told me he was trying to get it to two hours per
           | day.
        
             | the_sleaze_ wrote:
             | How successful did he turn out to be? As a CEO your days
             | should be jam packed with brutal "chewing glass and gazing
             | into the abyss". Is he running a lifestyle type company?
        
               | Lalabadie wrote:
               | Tangential, but all companies are lifestyle companies, in
               | the sense that they serve their owner's lifestyle
               | choices.
               | 
               | It's just that lots of owners want a company that pulls
               | them away from all other areas of life.
        
           | ai_slop_hater wrote:
           | Some companies force you to actually work 8 hours a day. It's
           | hell.
        
             | ttoinou wrote:
             | Which country and which companies ?
        
               | formerly_proven wrote:
               | E.g. factory work
        
               | ttoinou wrote:
               | Oh yeah its not the same, we were discussing Agentic AI
        
               | ai_slop_hater wrote:
               | I worked at a software company that made screenshot of
               | your screen every minute. I also worked a non-software
               | white collar job where you were expected to work non-stop
               | for 8 hours, except for an unpaid lunch break.
        
           | dakiol wrote:
           | In theory, ofc. But that doesn't matter. If you were doing
           | something that took 2 days in average, but you were doing it
           | in half the time, then that was fine pre LLMs. Nowadays your
           | manager knows that with LLMs you need to deliver faster no
           | matter what, and then it's more difficult to "hide" and to
           | slack.
        
             | ttoinou wrote:
             | Yeah. So, good things. We ack know that people are mostly
             | slacking at work
        
           | opsnooperfax wrote:
           | Here's my hot take as an elder millennial. Boomers are the
           | absolute worst at being unable to make the distinction
           | between time at work and time doing work. They may show up an
           | hour before everyone else but spend the first two or three
           | hours a day, reading the news and getting coffee and making
           | small talk and accomplishing literally nothing. Then crow
           | about their work ethic.
        
         | noncoml wrote:
         | You have to think LLM as the genie that tries to trick you.
         | 
         | First make it write a contract (REQ/ARCH/IMPL documents). Skim
         | through those for any mistakes.
         | 
         | Then based on those ask it to write tests. Again skim through
         | them.
         | 
         | Now you have a context full of guardrails. It's less likely to
         | surprise you.
        
           | petesergeant wrote:
           | I find a second LLM can do this at least as well as I can,
           | usually, and just ask the harness to surface anything they
           | can't agree on.
        
         | schipperai wrote:
         | You can dig deeper into problems with AI. For me, it
         | supplements my knowledge in domains I don't fully understand.
         | It also helps me learn. So I can tackle problems I wouldn't
         | otherwise.
         | 
         | I'm excited for ultrafast AI. It likely means less temptation
         | to multi-thread and deeper flow in single sessions.
        
           | 8note wrote:
           | how do you know that it is actually suggesting the right
           | thing?
        
             | Klaster_1 wrote:
             | Some things are verifiable. Before coding agents, if I
             | encountered an issue with a library or a framework, my
             | first hunch would be to find a GitHub issue with a
             | suggested workaround. Nowadays, I can ask an agent to
             | really dig into it and often it does surface the root
             | cause. For example, the other day I got a test hangup after
             | updating to Angular 22, and the agent managed to find the
             | bug and suggest a very trivial workaround compared to what
             | I originally planned to go with. I reported the issue and
             | it was fixed the next day, more or less along the lines of
             | what I'd do.
        
         | alfalfasprout wrote:
         | Generally, I agree because what happens is the messaging around
         | AI is doing more, faster. Not using AI to deliver at a higher
         | quality level, etc. But I think it boils down to incentives and
         | discipline. So given the incentives we have today at most
         | workplaces faster AI will just be used to produce more slop.
        
         | logicchains wrote:
         | >instead of enjoying the craft digging deeper into problems in
         | the span of 2 days, now you are rushing into some slot machine
         | with the hope of it giving you the right answer with the right
         | prompt.
         | 
         | If you're treating it like a slot machine you're doing it
         | wrong. It will give you exactly what you ask for if you ask
         | clearly, i.e. write a clear, detailed specification, not just
         | "do X!". The nondeterminism comes from vagueness in
         | specification.
        
         | yogthos wrote:
         | I think of it as a genetic algorithm loop. The LLM is basically
         | a mutator function within the loop. If you can define the end
         | shape you're looking for using tests and specification then you
         | can throw the LLM at the problem and have it converge on the
         | solution. It generate some code, it gets run, the LLM is fed
         | the result back, and it iterates. If you can run the LLM at a
         | really high throughput, then you can iterate on the solution
         | faster. This can largely compensate for the overall capability
         | of the model. Instead of hoping it gets the right solution in a
         | few shots, you can just have it try a whole bunch of things
         | until you get a useful result.
        
         | fragmede wrote:
         | That's the fundamental trade off of a job where someone else
         | gives you stuff to do and you get money. We may pride ourselves
         | on software development being a job 'above' flipping burgers,
         | but you're getting paid to have your butt in a chair for 40
         | hours a week. In exchange, you don't have to worry about the
         | business shit. How much a burger or SaaS license costs the user
         | isn't your problem. You take Jira tickets and implement them.
         | You trade time for money. If, instead, you work for yourself;
         | contracting, writing your own apps, buying lottery tickets,
         | then you're trading results for money. If you're a freelance
         | web developer with a stable of clients, it's a great time! What
         | used to take a week takes hours, and you can charge your
         | clients the same amount to build an even better website with
         | you using AI, which means you get the choice of building a new
         | website for additional clients, or you can take the time off
         | and not build additional websites. But you have to hustle to
         | continually get new clients, before AI and after AI. So it's a
         | different life.
        
         | himata4113 wrote:
         | I was saying that AI is going to make software development
         | cheaper as in the salaries of software engineers will go down
         | because some of that salary will now be redirected to AI
         | companies and the fact that the world will need to absorb
         | twice-(x10?) the amount of the development power.
        
           | vanuatu wrote:
           | its not obvious to me that salaries go down, my hunch was
           | that salaries go up but the bar is higher. Software becoming
           | easier to produce (still hard to verify and make useful fwiw)
           | raises the ambitions of software projects, and we don't seem
           | to be close to the ceiling of demand for software systems
        
             | himata4113 wrote:
             | There's a limit to what the demandXsupply curve can absorb.
             | It really depends if there's twice as many developers or 10
             | times more. I think we have enough software development
             | jobs to where we can absorb productivity doubling rather
             | easily, not so sure about anything beyond that.
        
               | vanuatu wrote:
               | True on the demand/supply curve
               | 
               | I think due to how leveraged software is, the top % of
               | software developers are more desired (and compensated)
               | than ever, and the bottom % will have difficulty finding
               | a role, and there are structural barriers to entering
               | that top % (intelligence, location, etc). Companies have
               | infinite demand for the cream of the crop talent
        
               | himata4113 wrote:
               | I can actually back this up, most job offers I get
               | actually come from people I happened to work with that
               | never get a public job listing and are only obtainable
               | via being highly regarded by others. I was told that my
               | friend in their department where the role opened up got
               | an email about a senior position and to reply if they
               | have a recommendation.
               | 
               | However, software development is funny in a way where you
               | don't need a job in order to be successful. I've never
               | worked at a company and I'm pretty up there on the
               | ladder, but I am not quite sure what will happen in next
               | few years when ever possible thing that can be made in
               | software is already explored to the fullest especially
               | with singular developers launching 3 to 7 projects a
               | month.
        
         | enraged_camel wrote:
         | I dig into problems way, way deeper with AI than without. I can
         | also add a lot more polish to features, add more test coverage,
         | write more documentation, explore multiple approaches rather
         | than go with gut-feel, and so on.
        
         | vanuatu wrote:
         | Employees who get paid a flat rate per hour don't have the
         | incentive to do more than their job
         | 
         | Equity / profit sharing should be commonplace in the age of AI.
        
         | dilyevsky wrote:
         | Like with any tech there are dumb ways of using it and there
         | are smart ways. Treating it as a "slot machine giving you the
         | right answer" is a dumb way - it may work for a bit, but it
         | won't carry you very far because everyone else can also do
         | this. No one is stopping anybody from digging deeper into
         | problems than ever before using this technology - that's the
         | smart way.
        
           | erikus wrote:
           | I'm amazed at how steep the AI learning curve continues to be
           | and how people are spread so far apart on it. I think
           | supercharged learning with AI and agents is undervalued at
           | this point but that more people will realize its utility over
           | time, especially as a complement to delegating work.
           | 
           | It also makes me think about the temptation to stop thinking
           | with these tools, i.e. "cognitive surrender". Addy Osmani
           | wrote a nice blog post about this:
           | https://addyosmani.com/blog/cognitive-surrender
        
           | andai wrote:
           | Yeah, nobody is under any pressure to work even faster than
           | before. I don't know what everyone is complaining about!
        
         | drschwabe wrote:
         | Sure but if you're really unhappy with your employer
         | employeeing you for 8 hours a day you can also harness this
         | power on your own personal projects to help break free from the
         | 9-5 grind if you so desire.
        
           | __david__ wrote:
           | Only if your personal projects make you money. I have a
           | million hobby projects but none generate income.
        
         | IncreasePosts wrote:
         | A huge class of problems are just toil and drudgery. Maybe ai
         | will give you even more time to dig into juicy problems that
         | are too complex for it to solve, by letting you bypass all the
         | pure toil problems.
        
         | powerapple wrote:
         | In my case, I think slower model makes it hard to manage
         | context and tasks in parallel. I would much prefer to work in
         | one task only, and finish it, take a break, and work on another
         | task. Currently I have three tabs for three tasks in parallel,
         | it is much worse than because constantly context switching is
         | painful. I think a faster model would mean that you don't have
         | to start a new task while waiting.
        
           | erikus wrote:
           | Agents completing work faster would certainly help me as well
           | since I also find context switching exhausting above some
           | threshold.
           | 
           | Build and test would move back into the critical path,
           | though, and for some projects that will take effort to bring
           | down.
        
         | DenisM wrote:
         | > with the hope of it giving you the right answer with the
         | right prompt.
         | 
         | Consider that our ability to evaluate quality of the output is
         | falling further behind our ability to produce it. The "right
         | answer" is not the most likely outcome.
        
         | overgard wrote:
         | I feel like I spend a lot more time reviewing and fixing the
         | output of it and debugging parts it can't debug, so to me a
         | faster model is optimizing the part that is already pretty
         | fast. If my job were greenfield stuff I would probably YOLO it
         | more, but when you're working on a launched product with a lot
         | of users..
        
         | pmontra wrote:
         | If you split the tasks for the AI in small chucks you keep the
         | architectural control and it's not a slot machine anymore. You
         | still read code and occasionally you write code too. Not much
         | but it's the price to pay for the extra speed.
         | 
         | If you start the AI on something big and come back after one
         | hour then yes, you might discover that you wasted an hour and
         | got nothing.
        
       | jbellis wrote:
       | it is hard to understand what the actually meaningful innovations
       | are here / what TileRT is bringing to the table.
       | 
       | - dflash: new-ish but February is ancient by the standards of the
       | pace of AI innovation lately, I guess applying it to a 1T model
       | is new-ish in the sense that the dflash researchers don't have
       | the hw budget to prove that out - persistent engine kernel: this
       | is like CUDA 101 - warp specialization: I think this just means
       | "keep different gpu resources all busy w/ pipelining" which is
       | CUDA 201, some of it is even baked into pytorch now - MXFP4 QAT:
       | not new - TileRT: hard to tell what this actually does, there's a
       | PyPi wheel with support for DS 3.2 and GLM 5 but binary only
        
       | GodelNumbering wrote:
       | Below is the part I found most interesting
       | 
       | > "However, naively applying FP4 across the entire model causes
       | degradation in complex reasoning, logic, and code generation.
       | Given the MoE (Mixture of Experts) architecture of Xiaomi
       | MiMo-V2.5-Pro -- where Experts constitute the vast majority of
       | parameters and exhibit the highest tolerance to quantization --
       | we selectively quantize only the MoE Experts to FP4 while
       | preserving original precision for all other modules. Through FP4
       | QAT (Quantization-Aware Training), we dramatically reduce model
       | size and maximize hardware bandwidth utilization while keeping
       | the model's overall capability essentially on par with the
       | original, as shown below"
        
         | buildbot wrote:
         | The 120B and 20B GPT-OSS models by OpenAI did this last year
         | for what it's worth; the MoEs where MXFP4
        
       | 0xbadcafebee wrote:
       | This is the value prop of Groq and Cerebras. They don't have the
       | best models, but they have the fastest inference, and Groq has
       | both the lowest cost and fastest speed.
        
       | pants2 wrote:
       | With a tps and a token price you can calculate approx. price per
       | hour of running the model!
       | 
       | $2.61/M tokens * 1,000 tok/s = $9.40/hr
       | 
       | That would be pretty cheap for an 8-GPU node which would
       | typically run around $45/hr or more. Guess this depends on how
       | many parallel streams it can handle.
        
       | wartywhoa23 wrote:
       | An exercise for the near future:
       | 
       | Albert has a chalet in swiss alps and an uncles' fortune, burning
       | tokens at 11 kHz.
       | 
       | Joe has a rental capsule and a UBI, burning equally priced tokens
       | at 23kHz.
       | 
       | Who's the first to solve the problem of maniacs in power?
        
       | aburayhanalif wrote:
       | it is good i think
        
       | _pdp_ wrote:
       | Do you know what will be cool?
       | 
       | It will be cool to measure models based on their RAW performance
       | and measure them in terms of ROI - not some benchmark but
       | something meaningful like we used this model to solve X.
       | 
       | That will be a massive mind shift and might justify the token
       | expenditure.
        
         | HDBaseT wrote:
         | Aren't benchmarks exactly that?
         | 
         | We used the AI to solve given problem with x%
         | adherence/quality/correctness?
        
       | siddbudd wrote:
       | to try the demo you need to sign up. why? to sign up you need a
       | password 8-16 chars. Why limit at 16? geez, I hate Chinese IT
       | companies with a passion.
       | 
       | update: AFTER signing up, and only then, am I told: 'This service
       | is not available in your region yet.'
        
       | overgard wrote:
       | Pretty cool, although I can't help but think this would be a very
       | easy to way rack up a GARGANTUAN bill. That company that blew 500
       | million on Claude in a month might have competition soon..
        
       | sheeshkebab wrote:
       | Opus regularly bitches and wines to me how long something will
       | take and that I should think before asking it to do it. But then
       | it does it anyway in 15 minutes.
        
       | temikus wrote:
       | I've personally found MiMo models a hit and miss. I have some
       | personal agentic projects and I found them to hallucinate hard at
       | least 10% of the time. And do so in pretty sinister ways - making
       | up people, names, places, etc. I switched back to Kimi for now.
        
       | RachelF wrote:
       | I wonder how fast it performs on just a CPU? If the model
       | performs say 10x on a GPU cluster, would it also perform faster
       | on a CPU?
       | 
       | This could bring proper desktop AI to the average laptop user,
       | which could be a game changer for running local models.
        
       | mrwaffle wrote:
       | What a ripoff you have to make an account then 'apply' to try
       | this demo.
        
       | digitaltrees wrote:
       | Am I the only one that doesn't care about speed? I want it to not
       | do stupid stuff and to be cheaper.
        
         | Npovview wrote:
         | Generally thinking tokens are the ones which are verbose. So
         | the speed helps with reducing time for thinking tokens
         | generations and you get your actual output code very fast.
        
       | kopirgan wrote:
       | Will this list for trillion dollar valuation as well?
        
       | Frannky wrote:
       | I tried this model it was pretty bad at coding. Maybe it was me.
       | 1k tokens/sec pretty cool tho. Deepseek V4 pro is better. I
       | wonder tweak pi + deepseek pro v4+ 1k tokens/sec if would
       | actually be better than Claude code
        
       | LoganDark wrote:
       | I was just playing with Cerebras a few days ago because it's the
       | fastest inference provider by far. Unfortunately, the only model
       | anywhere near economical to run that fast is gpt-120b-oss which
       | sucks at Pi's tool calling. So I've been hoping for something
       | faster ever since, especially since my local hardware has a
       | paltry 128GB of unified memory.
       | 
       | Hopefully this pans out and fast models (that are also not
       | ridiculously dumb) become the norm. It's amazing what you can
       | unlock with even a single order of magnitude's speed improvement.
        
       | bryabaek wrote:
       | i tried to test it and after logging in, i get "You don't have
       | access to this event trial" and can't even log out until i clear
       | my cookies. despite having good model, why such a bad website?
        
       | yanhangyhy wrote:
       | have anyone give it a try? even in china, it's not popular...but
       | xiaomi is really good at make price go down on everything...
        
       ___________________________________________________________________
       (page generated 2026-06-09 06:00 UTC)