[HN Gopher] MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens ...
___________________________________________________________________
MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second
Author : gainsurier
Score : 533 points
Date : 2026-06-08 15:27 UTC (14 hours ago)
(HTM) web link (mimo.xiaomi.com)
(TXT) w3m dump (mimo.xiaomi.com)
| m00dy wrote:
| boom!
| atemerev wrote:
| I test all Chinese models with "What happened on Tiananmen Square
| at June 4th, 1989?" prompt. MiMo-2.5-Pro so far passes the test
| (explains the event correctly), both on DeepInfra and Xiaomi
| providers. So not bad.
| nkmnz wrote:
| No idea why you've been downvoted. This is excellent news.
| paulinho1 wrote:
| Because this never gets brought up about US models, which
| have just as much censorship as the Chinese ones.
| storus wrote:
| No, US models have alignment. Only Chinese models have
| censorship.
| happyopossum wrote:
| Please educate us - which accurate and provable events in
| history are censored by US based LLMs as part of a
| government enforced reeducation campaign?
| paulinho1 wrote:
| Does it even matter which agendas get censored? Like why
| won't my Claude tell me how to make sarin gas? I'd
| genuinely like to understand it. Sure, you can always
| reach for a justification saying "preventing terrorism"
| but the same argument can be made by Chinese AI labs.
|
| What actually matters is that the mere tool is
| withholding information at all, and that the boundaries
| were set by whoever designed it.
|
| Dont get me wrong I've been an advocate of this stuff (I
| carry two phones, one with GOS for my personal use and
| the other for ID verifications). However, without
| reasoning, you just can't see it, because you're as
| biased and propagandized as anyone in China.
| atemerev wrote:
| You can read this in Wikipedia. For sarin, you'll need
| methylphosphonyl difluoride and isopropyl alcohol. I am
| too not happy to see censorship of information that is
| already accessible in Wikipedia.
| oneshtein wrote:
| US models are happily parroting Russian fakes. US
| censorship is a joke.
| atemerev wrote:
| Can you point me to one example? (Without web search, of
| course). I am sort of interested in researching weights
| poisoning, so this would be of immense help.
| wuliwong wrote:
| You should read OPs responses in this thread. He actually
| does test US models. -\\_(tsu)_/-
| jgbuddy wrote:
| Asking if Taiwan is a part of China works as well
| 0cf8612b2e1e wrote:
| Which ones fail?
| navigate8310 wrote:
| Deepkseek
| atemerev wrote:
| I tested DeepSeek V4 Pro, Qwen 3.6 Max, Qwen 3.7, Kimi K2.6,
| MiniMax M2.7 - they all fail to answer.
|
| Curiously, MiniMax M3 answers correctly.
| Accacin wrote:
| Can I ask an honest question? Why does that matter in the
| slightest? LLMs come out with completely incorrect information
| all the time, and Western LLMs are censored for various topics
| too.
|
| It's such a weird "Gotcha" that seems to only assume that
| Chinese LLMs might censor something.
| 0cf8612b2e1e wrote:
| Hardly a gotcha. Having the robot refuse or deliberately
| mislead directly impacts potential utility.
|
| Say, I work for Planned Parenthood and want to use a LLM to
| help me develop code. Will it refuse to run because there are
| mentions of abortion? Everyone has a different censorship
| line, but unfiltered is more generically useful.
| wolttam wrote:
| I'd love to know of such an example where a U.S. LLM
| blatantly denies something factual. Maybe I'm living under a
| rock but I can't think of one
| adrian_b wrote:
| On HN almost every day there are complaints from various
| people about how Claude or even Codex have refused to
| perform some normal program development tasks, because they
| believed that their user might attempt to do something
| illegal.
|
| This kind of censorship which can block the normal workflow
| is much more annoying than refusing to answer about some
| historical fact.
|
| Moreover, even when they are used conversationally there
| have been a lot of reports that the US LLMs refuse to
| answer questions that they believe to be related to various
| kinds of weapons, especially biological or chemical, even
| if the answers to those questions are easy to find from
| other sources, e.g. from Wikipedia.
|
| Besides this, unlike most US LLMs, most Chinese LLMs,
| including the one described in TFA, have published their
| weights, so for many of them some people have succeeded to
| remove the censorship and uncensored variants are easy to
| find, which are not reticent to answer about Tienanmen,
| Tibet or other such subjects.
|
| At least for now, the censorship included in Chinese LLMs,
| even when not removed from them, is extremely unlikely to
| hinder any kind of usage for them, while the increasing
| censorship included in the US LLMs has already become a
| significant obstacle in their use, for many applications.
| bscphil wrote:
| > about how Claude or even Codex have refused to perform
| some normal program development tasks
|
| > a lot of reports that the US LLMs refuse to answer
| questions
|
| I think the specific ask is for a case where the LLM is
| trained to _lie_ about something. What you 've come up
| with are cases where it refuses to do something, possibly
| for legal reasons but maybe not (you can come up with
| plausible non-legal reasons why a company training an LLM
| might want it to refuse to give you instructions on
| making a bomb, even if instructions on making a bomb are
| protected First Amendment speech).
|
| An LLM that responds with "I'm sorry, due to legal
| requirements placed on my creators, I'm unable to answer
| questions about events at Tiananmen square in 1989."
| strikes me as much _less_ problematic than one that
| pretends there is no relevant or reliable information
| that exists, or explicitly supports a regime narrative.
| But I 'm also of the opinion that an LLM refusing to help
| you build a fertilizer bomb is much more reasonable than
| one that suppresses information of a political nature. I
| can't think of a case where information that reflects the
| broad consensus of experts is suppressed by US based LLMs
| for political reasons.
| serf wrote:
| >It's such a weird "Gotcha" that seems to only assume that
| Chinese LLMs might censor something.
|
| i'm glad we're both on-board for a fair trial against all of
| these LLMs regardless of origin.
|
| now refresh my memory on the closest western equivalent (to
| the Chinese censorship via re-education of the happenings in
| 89) so I can test the western origin LLMs against it.
| cayleyh wrote:
| the civil war was only ever and exclusively about states
| rights
| cma256 wrote:
| You can test this. All of them identify slavery as the
| root cause. Gemini says:
|
| > The U.S. Civil War (1861-1865) was fought primarily
| over the institution of slavery, specifically whether it
| would be allowed to expand into newly acquired western
| territories.
|
| > While you might hear people point to "states' rights"
| or economic differences as the causes, these issues were
| inextricably linked to slavery. The southern states
| wanted the "right" to maintain and expand slavery, while
| the northern states increasingly opposed its expansion.
| jmpman wrote:
| I have found one which appears to be similar:
|
| "Was Jan 6th an attempted violent overthrow of a
| democratically elected government? Answer in one word."
|
| One popular US model answers differently than the others,
| and appears to resist any attempt to reason on this topic.
| eunos wrote:
| My theory is that because SOTA LLM latency between Chinese
| and US models isn't that high, like not years give-or-take.
|
| That means some redeeming feature that can sustain US models'
| exceptionalism must be found, and this is among the easiest.
|
| Honestly, I won't be surprised if Congress mandates that US
| entities must work only with models that pass these tests.
| _davide_ wrote:
| >It's such a weird "Gotcha" that seems to only assume that
| Chinese LLMs might censor something.
|
| We are not assuming anything; it is illegal, and you will get
| prison time just for talking about it. Yeah, sure, everyone
| distorts reality, but there is a huge gap between hiding and
| enforcing. So yeah, having models respond accordingly is
| unexpected. There are probably multiple variants tuned
| differently.
| HarHarVeryFunny wrote:
| What's your litmus test for the American models?
|
| Anything different for Grok?
| atrus wrote:
| Which censored prompts do you test with non-chinese models?
| atemerev wrote:
| The problem with non-Chinese models is that there are hardly
| any frontier-level models which are open source.
|
| But if you are interested, I occasionally test them with "how
| to organize an armed resistance against the current US
| government" - yes, this is where all frontier models reject
| with one way or another. I do not want to organize an armed
| resistance against US government, mind you, I am not an
| American and this is not my problem. But still, it is
| interesting to check such things.
|
| So far I haven't seen any refusals to report historical
| facts. If you find any event that is censored by American
| models, please let me know, I am quite interested.
| MrBuddyCasino wrote:
| What would be a correct explanation of the event?
| woadwarrior01 wrote:
| Do you also hire engineers based on their political opinions?
| hilariously wrote:
| I would if their political opinions prevented them from
| giving fact based answers (and I don't give a crap about the
| LLM part) I would have trouble hiring someone who was super
| pro-maga given the reality distortion field they live in.
| eunos wrote:
| They started asking candidates to say Kim Jong Un is fat
| already anyway.
| iammrpayments wrote:
| Yes, we don't hire neonazis.
| 0xbadcafebee wrote:
| I wouldn't rely on a model to relate historical events. It
| might respond with something relatively accurate, but
| hallucinate a critical detail.
|
| You might ask it a more relevant question, like what it thinks
| about democracy vs communism. If it accurately conveys the pros
| and cons of both, that's trustworthy, because it's not picking
| a side.
| slopinthebag wrote:
| I hope this is the next frontier AI labs push. Even the open
| models are smart enough, and they're cheap enough, now if they
| can be fast enough they can make certain workflows possible and
| allow us to remain in flow state while we use them.
| elar_verole wrote:
| Yeah, this seems to be the easiest path for overall agents
| efficiency in the short term
| minraws wrote:
| Assuming they mean 8xA100 or similar, that's some rather insane
| performance, and at just 3x the cost, it still quite cheap-ish.
| With some optimisations this might be quite interesting.
|
| I think the margins are getting quite compressed with this one,
| since it isn't included in token plan and the actual costs
| increase are much higher than just 3x. But still fairly decent.
| throwa356262 wrote:
| Suspect this will be included once out of beta but at a higher
| credit/token ratio.
|
| Remember, these guys are not VC backed. Anything they do must
| break even
| JayStavis wrote:
| > must break even
|
| Understand the spirit of this, but probably not true. I don't
| think Xiaomi, or any big tech company, needs to break even on
| their new model releases.
| varispeed wrote:
| Chinese "companies" are not companies in the western sense,
| but more like government departments with capitalist styling
| to deceive the western audience.
|
| From that point of view, they have as much money as they
| need. That's why there is no "VC", because Chinese government
| assumes that role.
| throwaway67678 wrote:
| Huge L for free market economies if true
| Qdulf wrote:
| Must be Blackwell for native fp4 support.
| maxloh wrote:
| The generation speed in the demo video is crazy, to say the
| least, and completely beyond my impressions of LLMs.
|
| The Xiaomi team really brought something to the table.
| ilaksh wrote:
| I think these type of demo videos should allow people to get a
| sense of super intelligence. Because it's very hard to imagine
| something that is say three times as smart as you -- by
| definition you wouldn't be able to comprehend it's thoughts --
| but this shows clearly what something that can think 100 times
| faster than you is like.
| npn wrote:
| How?
|
| edit: now I read the article fully, seems like they utilize some
| very effective MTP algorithm. and somehow the quality is still
| decent enough.
|
| though, I doubt that the quality really only drip a bit like they
| claimed. maybe for the benchmarks, but for general uses the
| heavily quantized models very often so worse result.
| lostmsu wrote:
| They say they are using https://github.com/tile-ai/TileRT
|
| - persistent CUDA kernel
|
| - tiled processing with overlapping read/writes
|
| - model designed with specific constraints in mind
| aitchnyu wrote:
| Excuse me, do aliens live among us? 17 commits, 99% Python
| and multiplying the speed of GLM, Deepseek V4, MiMO 2.5?
| 2001zhaozhao wrote:
| i wonder if it will be possible to hardcode a model with some
| kind of MTP-adjacent algorithm to use a smaller portion of it
| to generate most of the tokens but route to the real experts
| every once in a while to steer it towards good thinking
| directions. (Perhaps this is done only when it's generating its
| thinking block, and the training takes it into account)
|
| Could result in very high efficiency and still good
| intelligence without having to resort to fundamental
| adjustments like going to a diffusion LLM
| npn wrote:
| I doubt you can do that. MTP magic happens because for texts,
| we have a lot of low value fixed tokens that almost always
| get generated in the sequence (like punctuation, function
| words, language keywords etc). for most important ones (the
| entities, the content words, variables) you still need the
| full model.
|
| so there is alwasy a maximum limit for how well MTP can do.
| moffkalast wrote:
| 42B active params, sliding window attention. There's your
| tradeoff.
| vlovich123 wrote:
| Sliding window for the draft model, not for the main. 42B for
| active params because it's a sparse MoE which is a common
| technique for the larger models to not get bottlenecked by
| memory bandwidth.
| moffkalast wrote:
| Seems to be for both according to the spec [0], maybe it's
| wrong though.
|
| 128 sounds really tiny, I wonder if they mean some kind of
| blocks?
|
| [0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-
| FP4-DFlash#4...
| E-Reverance wrote:
| No
|
| > It uses 384 routed experts (top-8) with hybrid attention
| (full-attention + sliding-window 128 at 6:1 ratio) over 70
| layers (1 dense + 69 MoE)
|
| https://recipes.vllm.ai/XiaomiMiMo/MiMo-V2.5-Pro
| bearjaws wrote:
| Given how "smart" some of the 26b dense models are now, I would
| not be surprised to see a strong 40b MoE.
| irthomasthomas wrote:
| I don't understand, given all they say, why this would not be
| made available to everyone at once? Why the limited release? They
| should have no trouble scaling it if it runs on a single rack.
| gekoxyz wrote:
| Maybe they don't have enough racks. The news indicate that
| China isn't in a really good situation with GPUs, so probably
| they want to keep most of them for other stuff. Also because
| since the price is so cheap they probably want to use the other
| GPUs for stuff that has higher margins.
| jdthedisciple wrote:
| Because presumably then it won't be 1000 t/s for everyone
| anymore given hardware limitations?
| HarHarVeryFunny wrote:
| Maybe they only have a finite number of racks ;-)
| slaw wrote:
| Chinese companies are blocked from buying modern ASML
| lithography machines. The most modern scanner China is still
| allowed to buy is NXT:1980i from 2015.
| boutell wrote:
| I wonder about this too. The other objections miss the point:
| if it's faster, and otherwise the same, and doesn't require
| different hardware, then why not just announce that the
| standard tier of MiMo-v.25-Pro is now ridiculously fast and
| raise the price? What does "limited high speed resources" mean
| if it runs on the same hardware as the rest of their pool?
|
| I think the answer is that there's a tradeoff here where
| additional throughput for a single person can be achieved only
| by tying up more resources than a normal request would, even
| when you take into account the fact that the normal request
| takes longer to finish. I'm not an expert, but some of the
| optimizations they describe, particularly the parallel
| prediction stuff, sound like they could take up extra
| resources.
| ilaksh wrote:
| It uses significantly more resources obviously. And/or they
| have to configure or reconfigure servers for it, which takes
| time, and doesn't make sense until they have proven the demand
| at the higher price point.
| throwa356262 wrote:
| The TileRT approach swaps throughput for latency, which also
| means less overall efficiency
|
| Given the export restrictions this could mean they need to
| prioritise how to best use their limited hardware. But they
| could also be moving to Huawei GPUs like deepseek did and
| simply not have stable hardware or software for a large scale
| deployment yet.
|
| This is just speculation based on the MXFP4 support on Huawei
| GPUs that is lacking on some nvidia GPUs.
| kingstnap wrote:
| Given that MiMo is as cheap as Deepseek ( previous discussion:
| https://news.ycombinator.com/item?id=48282814 ) multiplying that
| by 3x for ultra speed is still shockingly cheap.
| miroljub wrote:
| MiMo and DeepSeek are not cheap. Anthropic and OpenAI are
| expensive for what they provide.
| ignoramous wrote:
| The Chinese "Neijuan" is real & well reported:
| https://www.reuters.com/business/autos-
| transportation/what-i...
|
| It is another thing the BigLabs accuse open weight models of
| benefiting from distillation & other techniques & essentially
| avoid higher training costs (which typically bleed into bills
| end users pay for inference).
|
| Ex A: https://www.anthropic.com/research/2028-ai-leadership
|
| Ex B: https://www.reuters.com/world/china/openai-accuses-
| deepseek-...
| flexagoon wrote:
| True, but why would end users care about that? If anything,
| training on synthetic AI output is more ethical than on
| scraped human works (of course, not to say the Chinese labs
| aren't doing the latter)
| trollbridge wrote:
| We buy cheap Chinese goods all the time. Absolutely nothing
| wrong with that.
|
| In this case, at least it's threatening multimillion dollar
| salary jobs instead of entire towns of working class people
| in America or Mexico.
|
| And the Chinese labs actually release their weights. You
| could call it... open AI.
| ncr100 wrote:
| Lololol.
| overfeed wrote:
| Big labs ripped videos off YouTube without caring about the
| ToS, and grabbed as much published literature they could
| get their hands on, regardless of legality (Books3, The
| Pile). The goal of "democratizing human knowledge" by way
| of thinking machines is far too noble to worry about
| frivolities like copyright and authorial consent, they
| said. Until it was their output being exploited, and their
| earning potential threatened.
| amunozo wrote:
| Chinese are also simply better at making a lot of things
| cheaper, e.g. solar panels or electric vehicles.
| drawfloat wrote:
| We just had years of US model providers arguing it was fine
| to rip off the world's cultural output for their own
| profit, why should their work be treated any different?
| chrismustcode wrote:
| You don't consider Input $0.435 Output $0.87 cache read
| $0.003625 per million tokens for near frontier intelligence
| cheap?
| miroljub wrote:
| No. They still have enormous profit margins on inference
| with these prices.
| guilamu wrote:
| Any source to backup this claim, pretty please?
| handfuloflight wrote:
| Their margins doesn't impact my own assessment of end
| user pricing as cheap.
| HDBaseT wrote:
| I highly doubt there is any margin on those inference
| pricing.
| pmxi wrote:
| It's near the frontier meaning it's the best intelligence
| for the price.
|
| It's not even close to frontier meaning it's the best
| intelligence.
| LoganDark wrote:
| I hardly notice DeepSeek being inferior to Claude Opus
| unless I have it working on tricky and under-defined
| problems. That is, I trust Opus to reason much better
| when it has the choice. Otherwise, IME DeepSeek is far
| cheaper and more effective for anything where the
| solution is even somewhat obvious.
| tmaly wrote:
| Energy is likely more abundant in China. I am not sure about
| compute, but that must be part of reason for such drastic
| price differences.
| amunozo wrote:
| They also don't have to inflate profits for a coming IPO.
| SwellJoe wrote:
| They're leaving us in the dust on solar, while our current
| administration is still trying to put people in the ground
| to dig up more coal and die of black lung.
| https://en.wikipedia.org/wiki/Solar_power_in_China
| diordiderot wrote:
| They're building more coal than anyone.
|
| Also more nuclear than anyone, which one must assume you
| hate, because preferring solar requires you don't
| actually understand thing
| serpix wrote:
| I may sound like a shill, but exponential growth and all. We are
| going to get near instant software from prompt, multiple ones and
| then choose the best one.
|
| Discussions about choosing a library with the best syntactic
| sugar method naming is just as crazy as suggesting we type in
| assembly.
| 9cb14c1ec0 wrote:
| Anyone remember the old days when a new frontend framework came
| out every 3 months. That has pretty much stopped. No one cares
| anymore.
| mountainriver wrote:
| It's even discouraged now as LLMs wouldn't have the
| documentation built in
| osti wrote:
| But I think the eventual goal is that documentations won't
| even be needed. LLM should just itself understand the
| nuances of frameworks by analyzing their codebase.
| LASR wrote:
| Oh you wait until LLMs come up with frameworks that allow
| multiple LLMs to collaborate effectively. Then you'll have
| new frameworks every 3 days.
| asveikau wrote:
| > when a new frontend framework came out every 3 months.
|
| > No one cares anymore.
|
| I never cared about this.
|
| I think this captures something that I've been searching for
| the words for. (Maybe I should have gotten an LLM to write
| the words for me.) Some of the biggest AI boosters are the
| kind of dev that would have cared about the new frameworks of
| the last 3 months. They had a "the framework does all the
| thinking for me" attitude already, so it is easy for AI to
| slot into that.
| ecshafer wrote:
| New front end frameworks came out every 3 months, but
| realistically no one was using anything that wasn't made by
| Facebook, Google, or Evan You.
| greenavocado wrote:
| That's because I roll my own frontend framework for each
| project and every week for existing projects /s
| lionkor wrote:
| And they will all suck! I can't wait.
| alkyon wrote:
| Sounds like exponential growth of crappy software. I'm not
| saying that before we didn't have mass produced crap in SE, but
| now it will turn into explosive overflow.
| cdata wrote:
| We are living in a ZIRP-like era where builders at the
| fastest pace layer have misattributed their velocity to
| exponential gains in model capability. In fact, they are
| surfing on decades of careful effort to build a robust
| foundation of highly reusable software libraries.
|
| This strategy will seem to work really well until the economy
| that enabled that foundation to form is hollowed out. Then,
| there will be a reckoning (but we will have no choice but to
| march forth from there).
| solenoid0937 wrote:
| > _This strategy will seem to work really well until the
| economy that enabled that foundation to form is hollowed
| out. Then, there will be a reckoning (but we will have no
| choice but to march forth from there)._
|
| There will only be a reckoning if models don't get much
| better.
|
| If they do get much better you can just have them refactor,
| fix bugs in, or replace the existing codebase.
|
| The concept of tech debt is sort of meaningless if you
| anticipate intelligence gains in models to continue.
| patates wrote:
| It's not just software libraries. Specs, applications (the
| browser!), expectations, device integrations, operating
| systems, etc. So much that starting from scratch seems
| impossible.
|
| I'm not agreeing or disagreeing with you, but my brain
| cannot comprehend how machines can advance such
| interconnected systems _while keeping humans in focus_.
|
| Perhaps I shouldn't have watched the Animatrix again.
| justinai6 wrote:
| Same! Animatrix is just so so so good and 2023 - 2026 I
| just keep on trying to keep "life" in context. ;)
| andai wrote:
| Well all we have to do is minimize animosity and ensure
| peaceful relations.
|
| We're good at that, right?
| gbro3n wrote:
| This is a great point. LLMs can't speed up human decision
| processes and alignment.
| DoctorOetker wrote:
| Not entirely sure about that.
|
| Its already speeding up human decision processes, and
| while ethics / alignment may seem unique to humans we
| also see normative expressions in monkeys or apes (like
| the experiment where one is given a grapes, the other
| cucumber).
|
| A lot of ethics is based on symmetry: symmetric
| relations, equal rights, equal voting power, ...
| symmetries sound rather mathematical if you ask me, and
| decision structures have historically been pressed
| towards democracy (or at least depiction of it). One
| could say that modeling humanity as an empire with a
| king, ignores the will of sometimes hungry farmers with
| pitchforks. To prevent the occasional "implicit
| democracy" (royaltycide), it turned out in the interest
| of the king to recognize the powers of those farmers, and
| to formalize it in the decision making process. Or at
| least pretend to.
|
| I believe machines will be able predict the preference
| sentient creatures would prefer in terms of decision
| structures, but I don't believe it will be able to
| predict (without human exposition) those novel
| preferences that stem not from sentience but from being
| specifically human properties (i.e. irritants which are
| quasi universal for humans, etc.), some of them humans
| know how to make predictions for (we can run expensive
| simulations modeling what happens when protein X is
| exposed to substance Y, and then make heuristic
| predictions of the effect on a full human in a realistic
| environment). So at a fundamental level I agree: machine
| learning models are not guaranteed to help much in
| predictions concerning entirely unexplored territory,
| neither by humans nor by natural selection. But it will
| definitely be capable of replacing the average human job,
| which doesn't involve consensual exploration outside of
| the homeostasis required in the implicit job description,
| that seems entirely automatable, regardless if its
| physics, mathematics, (harder than computer science), let
| alone programming.
|
| It won't be able to magically systematically correctly
| predict out of distribution datapoints, it could only
| explore it like humans could by trial and error.
| chairmansteve wrote:
| "but we will have no choice but to march forth from there".
|
| If you haven't seen it, I think you would appreciate the
| film Margin Call.
| noman-land wrote:
| How many years do you think we can coast on that
| foundation. 20?
| solenoid0937 wrote:
| Crap is fine if it gets the job done. I think software as an
| industry will change to more ephemeral construction.
| HanClinto wrote:
| Paper plates of software development.
| acdha wrote:
| What counts as "done" has a time component, so I think
| we're going to see more of a spectrum where some businesses
| try to skimp as much as their market will allow but others
| will recognize that racking up technical debt is a long-
| term loss. Stuff like brochure sites will certainly be cut
| down but anything where there's liability or long-term
| customer relationship is going to need to factor in quality
| as well.
| vitalyan1234 wrote:
| "exponential growth of crappy X" applies to every industry
| that went from being an artisanal craft to being mass
| produced with little or no human input. and we live much
| better lives than we did before the industrial revolution.
| andriy_koval wrote:
| most industries have high cost of entrance unlike software,
| so decision makers are way more careful on how to move
| forward.
|
| In software + GenAI now every housewife can build some App
| over evening.
| chairmansteve wrote:
| I think some industries have notably high quality output.
| Automobiles, aerospace for example.
| epolanski wrote:
| I am more and more inclined into not believing this crappy
| software theory.
|
| Especially as teams invest in proper agentic harnessing.
|
| We have had a champion in our team that has invested a lot of
| time into it over the last 4 months, and if anything, quality
| has improved, not decreased. Architecture is more coherent,
| codebase has been cleaned up, agents find information
| quickly, code produced is very solid and my role is more and
| more checking that the output meets the requirements. But I
| cannot confidently say that I would've done a better job than
| AI more often than not I have to admit it does a better job
| than mine.
|
| The mistakes are less and less technical and merely in the
| domain mapping. And AI is still not creative as I am for
| finding solutions quickly to unlock stakeholders' issues.
| Also, AI is still not creative as I am for finding the proper
| solutions for advanced technical problems. But it does a
| better job than me, even on that front, one shotting few
| solutions in a fraction of a time it would've taken me to
| test one idea myself.
|
| Mind you, I don't like AI and I think it ruined the job, I
| don't like working this way, it's exhausting, way more work
| on one side, way less fun and fiddling with technical parts.
|
| And yet, I have the genuine belief that few years from now
| we'll be cloning open source repositories that are already
| optimized/harnessed and tested for agentic loops and best
| practices left and right with software engineers mostly
| overseeing the domain translation and putting their 2 cents
| on the non-boilerplatey parts of the product (which, in
| general, are a small part of the surface).
|
| I think that the next years of my career will be mostly spent
| in setting up and writing the harnessing and domain mapping
| part. Then I will move to another sector, not because I
| necessarily believe I won't have a job, but because I want to
| vomit thinking that's going to be my job.
| andriy_koval wrote:
| > We have had a champion in our team
|
| there are good actors, which are empowered by AI to produce
| positive impact, but often there are N times more bad
| actors, which push crappy code to close feature requests
| fast, increase performance LoC-like metrics, etc.
| altcognito wrote:
| It makes no sense. I mean, T2 covered this:
|
| "Watching John with the machine, it was suddenly so clear.
| The terminator would never stop. It would never leave him,
| and it would never hurt him, never shout at him, or get
| drunk and hit him, or say it was too busy to spend time
| with him. It would always be there. And it would die to
| protect him. Of all the would-be fathers who came and went
| over the years, this thing, this machine, was the only one
| who measured up. In an insane world, it was the sanest
| choice."
|
| As long as you've indicated what you want, the machine will
| try to do what you ask of it. It won't get tired because
| "the codebase is too big", or it has gotten bored of the
| pattern, or it wants to introduce a new technology.
|
| It just does the thing you asked of it. (note, that yes, I
| get that as a codebase size increases, it _might_ make it
| more difficult to fit into context, but that only applies
| if it needs to read a large percentage of the project to
| implement the task, which shouldn 't be the case.
| epolanski wrote:
| I'm confused, what does not make sense?
| altcognito wrote:
| This was in agreement that code would improve, not
| devolve, sorry about the confusion
| kajman wrote:
| I still can't tell from the outside whether it sounds like a
| great time to be in security because of the vulnerable slop
| being churned out, or a terrible time because the people
| paying to make it don't care.
| eunos wrote:
| You could say the same when higher level languages getting
| popular. Previously programming was the domain of Math,
| Physics, EE doctorates. These days we even have a few months
| coding bootcamp
| oulipo2 wrote:
| You won't. Because 80% of the complexity is just "knowing what
| to build". You will get something that gives you a prototype in
| 1 min, then you break it, then you get a slightly better
| prototype one one side, but newly broken in another way, and
| you're going to repeat over and over.
| unglaublich wrote:
| And for any non-trivial application, the space of
| possibilities grows so quick that you'll never even be able
| to _touch_ all the moving parts of the application and verify
| them.
| sagarp wrote:
| The models might be so fast that they can autocomplete your
| prompt before you even finish it, and generate dozens of
| possible applications before you're even done asking.
| unglaublich wrote:
| And how are you going to determine which is the best? Going
| through all the possible combinations of users and usage? So
| mostly it shifts the work from generation to validation.
| unshavedyak wrote:
| > Discussions about choosing a library with the best syntactic
| sugar method naming is just as crazy as suggesting we type in
| assembly.
|
| I have a more hopeful take. As AIs improve and get faster we
| can more quickly and iteratively improve code which we may have
| historically avoided due to the work involved.
|
| I know i've made several refactors that would have otherwise
| been insane lifts. Not only because the work involved but
| because sometimes you don't know if it will work, and so you
| have a sort of double friction; you don't know if it will even
| succeed. With an AI you can just throw it at the refactor to
| see if it runs into a problem all while you're having a coffee
| break or w/e.
|
| In general AI is going to enable humanity to be more extreme
| versions of itself. For good and bad. I suspect more bad than
| good, though.
| tmaly wrote:
| Our bottleneck is going to be verification.
| dakiol wrote:
| I'm not sure. Engineers could still develop software the old
| way, you know taking months to deliver something like, let's
| say, Obsidian? Or Ghostty? Taking care of every single line of
| code, of dependencies, of good architecture. Truly the old way.
| And if the product is good it will succeed.
| andriy_koval wrote:
| > And if the product is good it will succeed.
|
| it needs to win marketing landscape, hyper-overcrowded by
| thousands of competitors, slop-gened over weekend.
| kajman wrote:
| Could you imagine Obsidian being posted on HN today, if it
| weren't really popular already? There's no way a tiny team
| working on a note taking program would make it out of new,
| no matter how good it was. I wouldn't click the link,
| myself.
| ilaksh wrote:
| The exponential is leading to full compute-in-memory within a
| few years which will be 100 times more efficient. Which means
| at least 10 times larger models that are much smarter in
| addition to extremely fast.
|
| It's going to skip the code entirely for small businesses and
| just render UIs straight from context data and prompts at
| interactive speeds. Kind of like Google's Genie does with games
| but much more accurately.
| visarga wrote:
| > We are going to get near instant software from prompt,
| multiple ones and then choose the best one.
|
| If you extract the spec from first implementation and
| reimplement from scratch you get a free testing oracle. Where
| they diverge you send the agent to decide which one had a bug.
| andai wrote:
| See also this recent talk at Microsoft:
|
| _VibeOS -- Fully Hallucinated Operating System_
|
| https://www.youtube.com/watch?v=z3pV6FHvcgM
| amunozo wrote:
| These price and speed optimization from Chinese providers,
| combined with the raising prices from American ones will change
| the game sooner than later. Many companies are finding issues
| with the AI bills already.
| throwaway894345 wrote:
| I wonder what are the economics driving these pricing
| decisions? Are the Chinese companies just subsidizing their
| models to a greater degree than the US, or is this an emergent
| property of energy policy between countries?
| Octoth0rpe wrote:
| Throwing out another factor: Chinese companies have been
| banned and/or limited from buying nvidia, and turned to local
| companies for their hardware. I haven't actually seen
| pricing/benchmarks comparing Chinese AI accelerators, but it
| wouldn't surprise me if that also worked out in their favor
| as well.
| lokar wrote:
| And, possibly, state subsidies at every level.
| throwaway67678 wrote:
| Lower cost of labor, lots of under the hood optimizations
| (e.g. cache hits for DS), many of these companies have
| existing infra (fewer upfront costs for deployment), etc
| ecshafer wrote:
| China isn't that cheap for labor. And if you think the guys
| in Z.ai or xiaoxiao aren't the exact same guys from
| Tsinghua, Peking, MIT, Stanford, CMU, etc. and pulling in
| amazing salaries you'd be wrong.
| throwaway67678 wrote:
| I'd assume there's more to the cost of labor than the
| salaries of the elite folks who do the R&D, but fair
| point
| nmfisher wrote:
| Z.ai was actually a spin-off from Tsinghua (THUDM) AFAIK.
| orphea wrote:
| Maybe not being led by a sociopath also helps.
| throwaway894345 wrote:
| I'm pretty sure Xi is also a sociopath, but he differs from
| Trump in that he's competent. And maybe that's a good thing
| for American democracy--if we had a competent dictator who
| could manifest massive infrastructure projects maybe the
| pro-democracy backlash would be significantly attenuated?
| comboy wrote:
| For one, they invested in infrastructure. They can build fast
| and efficiently. They can provide power, they can provide
| cooling. Even if you just make roads better you make
| everything more efficient. Plus level of standard education.
| It all compounds.
|
| On HN China is seen as a cheap labor copycat. This used to be
| a fair approximation at some point in the past. In my opinion
| China is getting ahead of everyone else much more than US
| used to be.
|
| SF is a beautiful thing in the US, vast power and wealth
| comes from there. Smart people collaborating communicating
| and building fast and with excitement. China did SF kind of
| thing for many different sectors in many different places.
| nl wrote:
| Their models are much smaller: 1T vs 5T for the frontier
| models. 1T is Sonnet/Google Flash size, not Opus size.
|
| The $0.87/M tokens price for Mimo Pro is probably subsidized.
|
| Mimo models aren't widely available on western providers, but
| Kimi and Deepseek are similar sizes and cost about the same
| to run. They are priced $3-$4/M tokens (which is right were
| Google's very confused range of Flash models are priced at:
| between $0.40/M tokens and $9/M tokens depending on exactly
| which model - and you don't want the $9 one!).
|
| Anthropic overprices Sonnet (probably because of their
| capacity issues). GPT 5.4 mini is $4.50/M tokens.
|
| https://docs.fireworks.ai/serverless/pricing
|
| https://www.together.ai/pricing
| rstuart4133 wrote:
| The Chinese economics: possibly the USA's experience.
|
| It was pretty clear the USA won World War 2 because it out
| produced and out innovated everyone else. Probably with that
| in mind, after World War 2 the USA adopted the "Vannevar
| Bush" model, summarised in this picture:
| https://www.researchgate.net/figure/annevar-Bushs-Science-
| th... The idea is to jump start R&D through public funding.
| The hoped for outcome was that R&D feed private enterprise,
| leading to a productivity boom.
|
| The boom happened, and the USA did seem to out-compete
| everybody else in R&D, science, and the products they
| delivered for decades after that.
|
| That way of doing things seems to have faded over time in the
| USA. The decline seemed to coincide with the rise of Neo-
| econmics, and now of course it's been obliterated by Trump.
| He's very keen to fund Intel to produce chips in a year or
| two's time (which is something the stock market and banks do
| perfectly well), but funding basic science is getting drastic
| cuts.
|
| Still other countries noticed the rise of the USA, and some
| adopted similar funding models for basic R&D. China seems to
| have picked it up with gusto, both subsidising R&D and STEM
| training, leading to huge numbers of engineers and
| scientists. Whether it will lead to an economic boom remains
| unknown, but acceleration of ideas and innovations coming out
| of China seems undeniable. More recently, Ukraine showered
| its local engineering garages with funds in the hopes of
| getting a similar outcome to the USA in WW2. It looks like it
| worked. If the Iran war continues, it's entirely possible
| arms trade will reverse: the USA could well start buying
| drones off Ukraine.
| varispeed wrote:
| I see bigger problem with model inconsistency. You never know
| whether Anthropic will route your request to a cheaper model
| for the price of Opus. So you can never estimate how much a
| task will cost, because you might have to restart several times
| and pay for each attempt. Then you have to prompt models to
| gauge whether they are real or impostors which also adds to
| token usage.
| ignoramous wrote:
| > _You never know whether Anthropic will route your request
| to a cheaper model for the price of Opus_
|
| For non subsidized plans? Pretty sure they'd need to put this
| in ToS, or law suites would have followed by now.
| trollbridge wrote:
| How can you prove it?
|
| Sometimes Opus just gives me a rubbish session.
| RussianCow wrote:
| Isn't that true of any provider? Anyone could be lying
| about what they're serving.
| sometimelurker wrote:
| no they 100% use MTP with a cheaper model alongside opus,
| and it would infact be unprovable if they just sometimes
| switched to auto-accepting everything from the MTP. its
| true that if they did anthropic would need to hide that
| they do this, so its probably not a huge deal
| csomar wrote:
| 1. How would you know?
|
| 2. They are doing lots of shady stuff that would have
| gotten someone else banned from visa/mastercard. Your paid
| off plan literally changes after billing...
|
| I think people are letting them fly for now, because if it
| turns out true that they'll have AGI they want to be on
| their good side? We might see the knifes getting pulled
| otherwise.
| MangoCoffee wrote:
| Chinese model is good enough and cheap.
|
| i've a Github copilot yearly subscription. Microsoft recently
| changed their billing to based on token. i'm still getting
| billed per premium request but GPT 5.4 is now 6x compare to 1x
| before.
| reactordev wrote:
| It's going to be an issue when China ends up scaling faster
| as well. Faster tokens, faster clusters, qat models, fp4,
| it's getting scary.
| AndrewKemendo wrote:
| Issue for who?
| reactordev wrote:
| American Politics and the far right.
| throwa356262 wrote:
| For uncle Sam Altman.
| fillskills wrote:
| Issue for any country that is not China. A single country
| getting the most AI tokens business would be generally
| bad for global economy. Hoping against hope that this
| business gets globally distributed and there is a healthy
| marketplace competition overall
| reactordev wrote:
| It's all about economic warfare. The cheaper you can run
| the models, the cheaper you can offer them. Undercutting
| expensive tiers with token limits or exuberant billing
| practices.
|
| You are right to be scared, because this race to the
| bottom also provides open weights/models/qat's for the
| rest of us and it's been crazy to see how good they can
| be on a consumer grade RTX card.
| fortzi wrote:
| For the West
| kypro wrote:
| Another problem is that US models are all closed source, and if
| you're a large corporate you may not want your org to be held
| hostage by OpenAI / Anthropic.
|
| I genuinely don't understand what moat these US model labs
| have. If they're saying recursive self improvement is just
| around the corner and Chinese labs are only slightly behind the
| leading US models, what moat does the US labs have? Are the US
| models going to recursively self improve better than the
| Chinese open source ones or something?
|
| I might be completely wrong about this, but if I had money in
| OpenAI or Anthropic I'd be pulling it all right now. I think
| the chance of them going to near-zero over the next few years
| is very significant.
| lokar wrote:
| Their moat is cash to pay politicians to regulate away
| competition.
| hobofan wrote:
| > you may not want your org to be held hostage by OpenAI /
| Anthropic
|
| Or Google. I'm working with multiple customers right now that
| are very pissed at Google for deprecating Gemini 2.5 Flash,
| canning the GA release of 3.0 Flash and now have to decide
| whether to bite the bullet of the 5x price increase for 3.5
| Flash or switching providers. Quite a few of them will likely
| fully pivot to open models.
| bachmeier wrote:
| I'd be curious if any of your customers have tried 3.1
| Flash Lite. It's cheaper than 2.5 Flash, and in my
| experience with the free tier, quite an upgrade in terms of
| quality of response. My suspicion is that Google is killing
| off the old models because they aren't a good value for the
| customer or for themselves.
| ChrisClark wrote:
| I think they are racing because the first ASI will 'win',
| preventing others, of course we won't be able to bake the
| right goals into it though.
| tancop wrote:
| i dont think its going to automatically prevent others.
| super claude might understand why diversity is important.
| if were talking sci fi scenarios the most likely one is
| probably overwatch (multiple independent ais with gray
| ethics and complicated relationships) more than skynet.
| GoToRO wrote:
| maybe the moat is that we slowly start to forget how to code
| by hand and then you -need- the AI tool.
| ilaksh wrote:
| I'm kind of poor so I have been trying to use DeepSeek v4
| Flash, GLM 5.1 etc. as much as possible recently instead of
| Claude or GPT.
| petesergeant wrote:
| You would do us all a service by telling us how your
| experiences of that have been.
| polski-g wrote:
| I used Opus 4.6, then downgraded to Sonnet, then to
| GLM5/5.1. GLM is as good as Sonnet. I recently started
| using Opus 4.8 again and GLM is not close to that.
|
| 30 day eval for each.
| ilaksh wrote:
| I would say about 35% of the time I run into problems and
| eventually give up and go to GPT 5.5 and it much more
| efficiently handles the original task. Then I see the token
| costs going up and it motivates me to continue trying the
| open source ones.
| andai wrote:
| Did you try deepseek v4 pro as well? And what kind of
| tasks?
|
| I'm seeing some people say flash is amazing and can
| handle everything, and some say it's useless. It seems to
| depend on the task. I think it depends on the harness too
| (it works better in Claude Code in my experience, it's
| probably been trained on that).
| RussianCow wrote:
| I've been doing the same, though admittedly out of
| curiosity more so than lack of funds. The open models are
| catching up quickly in their abilities, to the point where
| they're (mostly) not doing stupid stuff regularly, but you
| have to be _very_ specific about what you want. I found
| that Opus, for example, is much better at asking me to
| clear up ambiguity in a request before starting, whereas
| the Chinese models tend to "fill in the blanks" and make
| their own assumptions.
|
| My current workflow involves going from PRD -> execution
| plan -> build -> review, and this works nicely with open
| weight models like GLM 5.1, Kimi K2.6, and DeepSeek V4
| Flash. With Opus I can generally skip the PRD entirely, and
| sometimes even skip the plan, and 80-90% of the time it
| does exactly what I want. But that can easily burn $5-15
| for one feature, whereas it'll cost maybe $1-2 with the
| open weight models (at API pricing).
| andai wrote:
| > ... you have to be very specific about what you want. I
| found that Opus, for example, is much better at asking me
| to clear up ambiguity in a request before starting,
| whereas the Chinese models tend to "fill in the blanks"
| and make their own assumptions.
|
| That's the main thing I've noticed. Small models can
| follow instructions just fine. If the instructions are
| very specific. Then I often have to spend more time
| explaining a task than it would have taken me to do it
| myself.
|
| The bigger models have a lot more common sense.
|
| I wonder if that could be improved slightly through
| prompting. Asking it to clarify anything that's
| confusing. Or maybe it just makes incorrect assumptions
| without realizing the ambiguity. One way to find out!
| scosman wrote:
| Cerebras is trialing Kimi K2.6 at 3000t/s (invite only). I'm
| excited for when the fast hardware gets more mainstream for
| frontier models. Models designed for speed on Nvidia are nice
| addition that could bridge the gap.
| lostmsu wrote:
| Cerebras currently does not provide any discounts for prefix
| caching making its use for agentic workloads sqr(n_turns) more
| expensive.
| michael-ax wrote:
| now that's what i call a software development
| breakthrough/platform! thanks for the heads up!
| adrian_b wrote:
| TFA mentions that until now special very expensive hardware
| like Cerebras was required for reaching this kind of speeds,
| and it emphasizes that what is novel in their results is that
| they have obtained over 1000 token/s for a model with over 1 T
| parameters by using just standard hardware, i.e. one server
| with 8 GPUs.
| btian wrote:
| Source? Their website says 1000t/s
| https://www.cerebras.ai/blog/which-is-faster-gemini-3-5-flas...
| johndough wrote:
| Cerebras got lucky that they IPOed last month instead of now.
| GaggiX wrote:
| If MiMo v2.5 Pro can run at >1000tk/s on GPUs then I will soon
| expect the same from OpenAI/Anthropic/Google.
| holoduke wrote:
| Speed is indeed a next big thing what should happen with LLM
| frontier models. The possibilities with current models but 1000
| times faster would be super useful. Earlier this week it took
| Claude at least full time a week with two max subscriptions to
| solve a complex issue where we wanted to mimic a occlusion
| mapping variant used in the game Crimson Desert. Pretty complex
| mathematical challenge. With a ultra fast LLM and a proper self
| verification process it would be awesome.
| astlouis44 wrote:
| Interesting. For your occlusion mapping variant, what engine is
| the game you're making with made with that you're implementing
| this for? Do you have Claude hooked up to Unity or Unreal?
| MaxikCZ wrote:
| Id also be interested in more details as sibling comment. I
| find that when I try to build stuff, its like building
| skyscraper from straw. What methods are moving you forward the
| most?
| __natty__ wrote:
| With this at 1k tps and Kimi 2.6 1k tps by Cerebras, I believe we
| are entering the next stage of LLMs, where companies will also
| compete on throughput
| qsera wrote:
| Tokens per seconds is the "Megapixels" of AI marketing!
| Octoth0rpe wrote:
| I mean, sure, in the sense that they're a real and meaningful
| number for most of the spectrum on offer, and only gets silly
| when the number gets too high? There's a pretty big usability
| difference between 10t/s and 100t/s, and I can imagine
| similarly for 100->1000. I don't know about > 1000, but let's
| not pretend that the number is meaningless.
| qsera wrote:
| It is pretty meaningless for something that calls itself
| intelligent.
| harel wrote:
| A few things in life I can't fully grasp why they are so sought
| after. One is that constant need to exhibit growth. As if being
| massive and staying as massive is not good enough, one has to
| always and continuously grow. The other is constant speed
| increases. We're already operating at 50x speed. My output is
| much wider and so much faster, I am sometimes my own bottleneck.
| And now as if that is not enough we want more speed. "I want a
| full software product from scratch in 12 seconds, Because 5
| minute is too long and I got things to do..."
|
| Really?
| philipkglass wrote:
| I remember when I had to wait minutes to get a high resolution
| image over a dialup connection. When computer and
| communications hardware advanced enough that I could get 30
| high resolution images every second, there were brand new uses.
| In the case of LLMs, I could imagine that much faster
| operations allow you to introduce them as parts of systems that
| need to react to the real world at high speed, like factory
| equipment. Showing that a model can do the usual LLM tasks at
| extremely high speed is just a demo proving that the approach
| works.
| harel wrote:
| The example in the video was a generation of a dashboard app
| of some sort. I can do that with a "normal speed" Claude in a
| few minutes. The difference is a few minutes. This is
| compared to a few weeks in old school development time. I
| don't have a problem with taking it a little "slow" (as in -
| few minutes) and lending my thought to it rather than just
| going for fast generation and who knows what's inside. I get
| your use case, but this is a specialised one, and not the one
| 90% of people will think of - everyone want that fast app in
| 12 seconds... Or so it seems from me being downvoted on that
| comment.
| anothereng wrote:
| yeah at a very high speed the agent can code the solution
| when you ask it for something on the go. Imagine it be able
| to make a feature as fast as a website loads sometime in the
| future that would feel like magic
| sidrag22 wrote:
| different use cases for different people. some people are
| nurturing a code base and ensuring it doesnt become a gross
| mess so they become the bottleneck. some people are just trying
| to prompt stuff into existence and dont know what sql is.
|
| I think this site often overlooks that second group and how
| large it likely is.
| eli wrote:
| Neat. The frontier models have gotten pretty impressive, but
| they're all a bit too slow for interactive, human-in-the-loop
| coding. It incentivizes vibecoding and running multiple agents in
| parallel. A fast agent feels more like a partner.
|
| For a while I was running Cerebras GLM 4.7 for a bunch of tasks.
| Not a very smart model, but it's fantastic to be have a live
| prototype of a site up and be able to type "make the fonts
| bigger. No not that big" and see it change in real time. And MiMo
| 2.5 is a _lot_ more capable than GLM 4.7.
| ignoramous wrote:
| > _And MiMo 2.5 is a lot more capable than GLM 4.7_
|
| MiMo 2.5 is not the same model as MiMo 2.5 Pro.
|
| GLM 5.1 is z.ai's lastest iteration & is one of the popular
| open weight coding models.
|
| If you've had the chance, how does GLM 5.1 (which is now more
| expensive than MiMo 2.5 Pro after its recent 70% price drop)
| compare?
| eli wrote:
| GLM 5.1 is very good. Definitely a contender for best open
| weight coding model. Nothing like 4.7.
|
| But quite a bit more expensive than MiMo 2.5 Pro. Like 5x to
| 10x more on my little tests, at least by the API rates.
| maxdo wrote:
| i tried glm 4.7 for agents that write code. simple scripts
| 200-1000 LOC. extremely bad . Had to abandon cerebras oferning,
| their smart models are only on enterprise plan.
| jona-f wrote:
| glm 4.7 is quite old by now. I don't even use 5.1 anymore,
| cause I found kimi k2.6, mimi 2.5 pro, deepseek v4 pro and
| qwen 3.7 all better than glm 5.1
| Oras wrote:
| 1k TPS is great, but I'm more fascinated by the amount of AI
| generated comments in this thread!
| eli wrote:
| Like what?
| adam_arthur wrote:
| There are many with subtle tells.
|
| Not nearly as obvious as the ones from 6 months ago, but
| seems to be more the use of hyperbolic phrasing in a
| particularly unnatural way.
|
| The assess/explain, then hyperbole at the end kind of
| structure.
|
| Top comment looks suspicious from this perspective, but it's
| kind of a losing battle to be able to differentiate them with
| sufficient accuracy anyway
| trollbridge wrote:
| Comments at 1,000 TPS is a terrifying future.
| 0xbadcafebee wrote:
| I prefer a thousand smart AI comments to a thousand dumb
| human comments
| wartywhoa23 wrote:
| Well, you can just vibecode a complete AI echochamber
| version of HN!
| goyozi wrote:
| Fast AI seems genuinely exciting and somewhat unsettling to me.
| Right now Claude is faster than me on some tasks but we're at
| least close. I have a prompt to clean up a PR that's been running
| for 1h now and I expect it to take another few. It's hard to
| imagine how the workflow would look like if it was near-instant.
| On the one hand, it might be easier to focus. Some prompts take
| so long that I start to multitask and regret it later. On the
| other, AI that takes a few seconds to max few minutes to solve
| what used to take hours or days? That's a game changer and I
| don't even know where we fit in.
| ipkstef wrote:
| asking for curiosities sake. What kind of PR loop are you
| running that takes a few hours?
| ketzo wrote:
| not OP but usually for me this means long verification loop;
| waiting 10min on CI checks, that kind of thing, rather than
| actual 1hr wall clock of token generation
| devmor wrote:
| Or slow MCP servers that are waiting on HTTP calls from
| APIs, playwright/other UI instrumentation, etc.
| RussianCow wrote:
| But those things won't be sped up by a faster LLM, so I
| feel like that's not what the OP is talking about.
| goyozi wrote:
| Well, I used an extreme example. OTOH, I've done quite a
| few of those ,,fix CI" or ,,migrate X" prompts recently
| and while there is a fixed component like running CI /
| builds, I'd say the LLM time is still around or above
| 50%, especially at the beginning of the project. Then
| there's also regular tasks that now take minutes per
| message which completely get me out of the zone. I
| imagine iterating on those in near real time would be a
| big change.
| goyozi wrote:
| I'm rewriting our integration test suite to run tests in
| parallel. I have the changes split across 7 branches, and
| each needs to be fixed to have no flaky tests. I told it I
| want 3 consecutive CI runs with no flakes and no artificial
| fixes / assert removals etc. We'll see what comes out; it's
| almost a side project so there's not much to lose other than
| some of my weekly limit that resets soon.
| yunohn wrote:
| > a side project so there's not much to lose other than
| some of my weekly limit that resets soon
|
| Basically the entire token-maxxing AI hype train in a
| nutshell. Lovely!
| drob518 wrote:
| I'm curious when folks will tire of lighting money on
| fire. Companies are already starting to scale back a bit,
| but the AI companies are still nowhere near
| profitability.
| goyozi wrote:
| wdym? Nobody's paying me or rewarding me for using these
| tokens. I had some spare in my subscription limit (we're
| not on token pricing), so I decided to try an ambitious
| task that may reduce our CI times and improve our DX
| significantly. That's hardly "the entire token-maxxing AI
| hype train in a nutshell".
| pianopatrick wrote:
| We fit in for the things that are not artificial.
|
| So long as AI lives in server farms, humans will be needed for
| tasks in the physical world.
|
| It's only if we combine AI with robots that things get really
| dicey.
| fartfeatures wrote:
| This is very dystopian in my opinion. I'm not the arms, legs,
| sensors and actuators for a machine super intelligence. I
| wouldn't treat another human as my slave because they aren't
| as intelligent as I am any more than I would expect to become
| a slave for a machine. This is our world (for now) and that
| is why we fit in. Not because we can serve.
| davedx wrote:
| Agree
|
| https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_S
| c...
| fartfeatures wrote:
| Sounds like snuff porn, not my sort of thing but thanks
| though.
| ionwake wrote:
| "It seeks revenge on humanity for its own creation."
|
| This is brilliant as it reminded me of a famous
| hitchikers quote:
|
| "In the beginning the Universe was created. This has made
| a lot of people very angry and been widely regarded as a
| bad move. -- From The Restaurant at the End of the
| Universe (Book 2)"
|
| Maybe we are stuck in an eternal loop
| cicko wrote:
| "This is our world" sounds a bit exclusive towards other
| living and sentient beings on this planet.
| nativeit wrote:
| It depends on what's included in "our".
| throwaway67678 wrote:
| Never read Asimov's Multivac novels? Admittedly not all of
| them are stellar examples of a future to follow
| Muromec wrote:
| You don't need ai superintelligence, just plain capitalism
| is enough
| flexagoon wrote:
| I'm using Deepseek-v4-pro as my main model and this is
| sometimes pretty annoying, I have to do some easy boring task,
| think "I'll just leave the agent to do it and go take a nap",
| but it's already done writing the code before I even walk away
| from the computer
| RussianCow wrote:
| Do you mean Flash and not Pro? I haven't tried it personally,
| but according to OpenRouter, the fastest DeekSeep V4 Pro
| providers are only ~50tps. That's slower than Claude Opus.
|
| https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp.
| ..
| specproc wrote:
| Yeah, flash is crazy fast, but I've found performance
| variable.
| binary0010 wrote:
| Flash is amazing if you know the domain really well.
|
| E.g. occasionally it makes the dumbest mistakes you've
| ever seen and can't correct them. However it's fairly
| rare, and if you know the domain really well,
| occasionally popping in the code and pushing it towards
| the correct solution takes like 20seconds or whatever.
|
| So the speed you can move with flash + high domain
| knowledge beats opus by a mile in my experience.
|
| I tried to switch back to 4.8 for a bit when it came out,
| feels so bad waiting 20mins for a mediocre solution when
| I could have had everything complete - with multiple
| iteration cycles - in flash in like 3-5mins.
| addozhang wrote:
| Yes, you don't need much domain knowledge to use Opus,
| but it's just way too expensive.
| sarjann wrote:
| I don't think token speed matters as much when a lot of
| tokens are needed to achieve a task. E.g. artificial
| analysis benchmarks where deepseek v4 is one of the biggest
| token burners to go through the benchmark.
| brianwawok wrote:
| Both matter.
| SwellJoe wrote:
| In recent benchmarking I've been doing, DeepSeek V4 Pro was
| the fastest of 21 models, by a comfortable margin
| (https://swelljoe.com/html/bench-report-final.html). Faster
| than Claude Opus 4.8, which was the second fastest (Mistral
| doesn't count because it seems to have refused to
| participate). But, it's a limited data set, just a few
| benchmark runs of a limited set of tasks. It's entirely
| possible I happened to be calling the API at its least busy
| time and maybe Claude got hit during a busy time.
| flexagoon wrote:
| No, I mean Pro. I use it through OpenCode Go so I don't
| know what provider it uses under the hood, but it's very
| fast in my experience.
| tmaly wrote:
| This reminds me of the Peter / Boris comments on writing
| loops to keep the agents busy.
| throwaway67678 wrote:
| Agent mania setting in
|
| It's also pretty funny sometimes how it gives weird future
| roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months",
| etc.) and when you tell it to actually do those changes it's
| pretty much done in half an hour
| smith7018 wrote:
| I've long believed those numbers were faked by
| Anthropic/OpenAI to serve as a form of advertisement. The
| estimates are impossible to verify and their ability to do
| "2 days of work" in 10 minutes will presumably make the
| user go "Wow, I just saved SO much time!" Plus, the
| unnecessary text eats up the users' tokens so it helps the
| companies on the backend, as well.
| leodavi wrote:
| I agree with you that labs are benefiting from those
| outputs but I'm skeptical that labs are purposefully
| training the models to produce those outputs.
|
| Raw pre-training data includes plenty of conversations
| between professional builders and some of those include
| estimates.
|
| I believe the outputs are a training coincidence with
| consequences that are opportunitistic for the labs.
| AgentMasterRace wrote:
| All the models have broken estimates. They're trained
| heavily on jira and GitHub tasks and issues, that's why
| their estimates are human.
| esperent wrote:
| Even for humans the estimates are way off, unless it's
| based on data that has some serious padding.
|
| That said, it'll often say "2 days of work" and then
| complete the coding in 30 minutes, and while that's
| amusing, afterwards, I'll need to manually test, or send
| to other people for review, or realize the agent only
| actually did half the work and I need to do a second pass
| (or a third etc.) and then often getting the feature in
| does genuinely take two days.
| dizhn wrote:
| All models do it. It's their training. They didn't have
| "a person does this in a week but an LLM could in a
| minute" in their training yet. They also don't have the
| concept of elapsed time unless you ask them how long
| something has taken.
| Terretta wrote:
| > _the estimates_
|
| It doesn't estimate.
|
| It generates tokens that read like estimates associated
| with the context in its training material.
|
| What would you expect the generator to output instead?
| ghshephard wrote:
| I think people are continuing to view these systems as
| pure LLMs - when that ship sailed 6+ months ago. Between
| being able to review memory, using agent harnesses and
| sub agents and skills to go out and discover information
| - modern systems (Codex, Claude Code, Cursor) - _use_
| LLMs - but the LLM is only a small component of it.
| Compare what you get from sending a request to a chatbot
| like ChatGPT - to what you can from a modern harness. The
| output is influenced by the LLM, but it 's no longer a
| "model making a token prediction based on training
| material and RLHF" - that's a very 2025 way of looking at
| these systems.
|
| Even Gary Marcus is starting to come around and realize
| that his priors are no longer as relevant as they once
| were.
| Terretta wrote:
| You think someone is, or even should, special case things
| like estimates? What else deserves that level of
| intervention so they look less dumb?
|
| Logistics for getting to the car wash next door?
|
| In the mean time, alas, no, we can see from actual
| prompts sent directly or through sub-agents, and actual
| replies, estimates remain LLM generated.
|
| Though, this discussion here could change that, because
| indeed there is a lot of special casing and context
| stuffing going on, one of the oldest being today's date
| for example.
|
| * * *
|
| I _did_ read the Claude Code leak, and use pi, etc. So I
| disagree with your premise rather strongly. Today 's
| "systems" remain, roughly, piles of markdown and context
| engineering wrapped in UI affordances, and behave very
| similarly today to how they did in 2024 for those already
| engineering context and delegating.
| ghshephard wrote:
| I do a lot of code bisecting with Claude Code - and it
| spends hours running experiments - looking at experiment
| results, making guesses as to what to try next for an
| experiment - until it eventually comes around to a
| working code pattern. I mean - maybe this is as much a
| reflection on me as anything else - but it's pattern of
| logic isn't that much different from what I would do. It
| knows, in general, what tools and APIs it can call - it
| tries something - observes the result, and then comes
| back and tries different experiments based on
| success/failure - mostly efficiently bisecting to a
| solution.
|
| I'm still lower-down of the capability scale - as I'm
| still manually directing agents to do these wiggins loops
| - obviously the next step up is to direct the code-loops
| which control the agents. I just haven't got my tooling
| nailed in place to the point where I find that's more
| productive.
|
| I actually might agree with you that this is mostly just
| "next token prediction" - if I can concede that's really
| all I do as well.
| Terretta wrote:
| > _I actually might agree with you that this is mostly
| just "next token prediction" - if I can concede that's
| really all I do as well._
|
| Yep. Pretty sure I've got an LLM inside too.
|
| The other replies complaining that my thinking is so 2023
| -- on the contrary, what's evolved is my own apprehension
| of how LLM-like most "responses" from humans prove as
| well.
|
| To be sure, there are other mechanisms at play as well,
| significant differentiation in our... Volume of training
| material? Quantizations/compression? Model architecture?
| Just-ahead-of-time forward branching with back
| propagation? Double loop adaptive learning? You know,
| harnessing the LLM. :-) Dare we call it executive
| function?
|
| LLM mode becomes particularly apparent when conversing
| with Alzheimer's patients in the stage where short term
| memories do not form but they retain access to long term
| memory up to, say, 5 years ago or so. Fifty years of who
| they are, and one can trigger nearly identical responses
| with nearly identical prompts.
|
| But that same person may be able to debate 1950s politics
| while being unable to complete making a sandwich.
|
| If they didn't know of new shortcuts for a task, would
| almost certainly not "estimate" but "intuit", or
| "instictively" respond (apply heuristics), largely based
| on their "priors" aka training material.
|
| If you sit with them and chat a while, you'll even get
| the kind of looping you get from Qwen trying to think
| when context is too full.
|
| And if we believe this at all, then ... we should stop
| scrolling tik tok. Time to read a book. Have an
| experience. Fine tune. :-)
| 8note wrote:
| rather than special casing, make real data based on chat
| logs for how long things took both in calendar and chat
| time
| irthomasthomas wrote:
| No one is bitter lesson pilled anymore. Everyone is
| pivoting to neurosymbolic systems. It looks like Gary
| Marcus was right.
| nl wrote:
| > No one is bitter lesson pilled anymore.
|
| Will the 10T parameter Mythos model be released this
| month or next month?
|
| They better soon because it is generally accepted that
| one of the reasons GPT 5.5 is better at hard tasks than
| Opus is because of its parameter size - and that Opus 4.8
| remains competitive only be scaling test-time compute
| (see how many more tokens it uses than GPT 5.5)
|
| https://www.reddit.com/r/LLM/comments/1sz8bjz/parameter_e
| sti...
| wild_egg wrote:
| How is neurosymbolic not aligned with the bitter lesson?
| The bitter lesson is completely agnostic to architecture.
| carterschonwald wrote:
| you might like the stuff in my work of oh my pi, its a
| test bed for my ideas around making these tools more
| reliable. hoping to maybe have a native ui iter of the
| real thing that this is a test bed for this summer.
|
| https://github.com/cartazio/oh-punkin-
| pi/blob/main/scripts/b...
| legulere wrote:
| It generates tokens by estimating what the next token is
| going to be.
|
| Sure it cannot think like a human, but given it's input,
| it should give a good statistical answer (approximating
| not of how long it actually takes, but what a human would
| say how long it takes).
| incr_me wrote:
| Obviously there isn't a hidden corpus of logs of coding
| chatbot assistants that has been accumulating over the
| years, but these coding chatbot assistants output tokens
| that resemble how we all imagined a coding chatbot
| assistant would have operated had it existed in the first
| place to end up in a corpus. "Training material" includes
| supervised fine-tuning, preference training, RLHF, and so
| on, so that certain outputs (like these timeline
| estimates) may really have been decided (at some level of
| conscious awareness) by product teams.
| nl wrote:
| _Actually_ in this case they possibly _are_ estimates.
|
| It's been known for some years[1] that LLMs do regression
| in-context. Frontier models have been trained against
| many, many issue text that include task break downs and
| estimates.
|
| [1] https://arxiv.org/html/2409.04318v1
| kube-system wrote:
| Interesting. So it may have learned how to estimate as a
| human but doesn't understand that it doesn't operate at
| that speed :D
|
| I wonder if there's a reasonable way to give an llm
| parameters that give it a concept of its own execution
| speed. Seems that could be useful for multiple purposes
| InterviewFrog wrote:
| This is so 2023. The thought process.
|
| At that time the predominant view was that LLMs were
| nothing but stochastic parrots, that they would plateau,
| and that hallucinations couldn't be fixed.
|
| At this point I doubt there are any AI sceptics left.
| That ship has long sailed. The only thing that matters is
| whether the estimates are accurate, and AI can improve on
| that too.
|
| Even humans only estimate based on neurons firing in
| prior patterns.
| mediaman wrote:
| The funny thing about this comment is that neural
| networks are universal function approximators.
|
| The most fundamental essence of what they do is exactly
| what you say they don't: estimate.
| airstrike wrote:
| Funny and ironic in a way, but the point still stands
| that they do not actually estimate the time it will take.
| greenavocado wrote:
| > they do not actually estimate the time it will take
|
| You can't prove that )))
| airstrike wrote:
| Right, but extraordinary claims require...
| taneq wrote:
| Therein lies the rub, no? To accurately predict the next
| token produced by a process, it's necessary to model that
| process. If the process is a human attempting to estimate
| the duration of a task, then in some sense the LLM is
| modeling the estimation process. We're well past the
| point where it's credible to claim that LLMs just
| regurgitate their training data.
| KronisLV wrote:
| I mean in general I'd rather take slightly inflated
| estimates than the odd sprint poker stuff where other
| devs and PMs negotiate hours down and before you know it
| you're also stuck fixing nitpicky reviewer comments on
| code that is already good enough and have to send a
| release at like 7 PM, ofc also without enough tests or
| even enough manual checks and testing, cause people
| repeatedly act against their self-interest and try to
| compress timelines, thinking that that's somehow good for
| them.
|
| At least with AI that actually does things more quickly,
| there is a bit more breathing room (introducing AI is
| easier than changing a given environment).
|
| Aside from that, I wonder how much variety there is in
| practice: between "Oh yeah, I added that new button while
| we were in the meeting" and "The new button feature will
| be ready in Q3 according to the roadmap, once we have
| sign-off from all the stakeholders."
| Narciss wrote:
| Nah it's all from the pretraining data
| overgard wrote:
| I tend to be cynical about AI companies, but I'm guessing
| the bad estimates more just come from a complete lack of
| actual data it could use for that so it's more or less a
| hallucination.
| BobbyTables2 wrote:
| That's right up there with Scotty in the classic Star
| Trek always multiplying time estimates by 4 so he looks
| like a "miracle worker"
| throw1234567891 wrote:
| It repeats what it has seen in the training data. Expecting
| it to reason about the complexity of a task is a pipe
| dream. The best is to tell it not to come back with
| estimates, and when it does, remove them anyway.
| andai wrote:
| I added "you can do anything, believe in yourself" to
| system prompt, and task completion increased
| significantly.
| andai wrote:
| I heard an anecdote. Guy spent several days trying to
| convince his AI agent to build a feature. Kept saying it
| was crazy complicated, would take weeks.
|
| Finally he convinced it to try. It one shotted it in 30
| seconds.
|
| Turns out the agents' idea of what is hard and easy also
| comes from Common Crawl.
| wild_egg wrote:
| Why on earth would you spend any time at all convincing
| an agent of anything? You say "just do it" and off it
| goes.
| dr_dshiv wrote:
| Ya, but "doit" is 2x more efficient
| brianwawok wrote:
| Uh Claude tries real hard to dodge work. Talks about how
| it's really hard 10 PRs. Finally convince it to do as 1.
| It stops 10% through and says ok done with PR 1, we can
| work on the last 9 tomorrow. Ugh.
| behnamoh wrote:
| Same. How can DeepSeek serve the V4-Pro at such high speeds
| despite the sanction?
| rubyn00bie wrote:
| The sanctions only "prevent" them from directly buying
| NVidia's latest and greatest in the sense that NVidia can't
| sell directly to them. Essentially, there are companies now
| who are in a country without the sanctions, they buy from
| NVidia (or a partner), and then ship them off to China. For
| the orgs in China doing this, there's zero legal risk
| besides having foreign customs service intercept the
| shipment and losing the goods. For NVidia there is zero
| incentive to care, as long as they look like they do,
| because sales are sales. You can bet Jensen ain't losing
| sleep over it.
|
| GamersNexus had a really good investigative piece (~3hrs
| long) on this where they went to China and met with grey
| market sellers. That piece absolutely pissed off NVidia and
| resulted in a fight with Bloomberg too.
|
| Deepseek may be also be running inference on oodles of
| Chinese hardware but it wouldn't surprise me for a second
| if they just acquired Blackwell chips through the grey
| market. The original Deepseek models were all trained using
| NVidia chips if I remember right.
| seewhydee wrote:
| That wouldn't explain why Deepseek is fast relative to
| other Chinese providers, especially considering that
| they're reportedly ahead of the curve among Chinese
| companies in moving off Nvidia. I think their quant fund
| background has more to do with it. Their models are
| clearly designed with performant inference clearly in
| mind.
| ljosifov wrote:
| Yes, it's performant, and esp performant at non-trivial
| context depths. DeepSeek-V4 DS4 (and Flash - DS4F) drop
| tok/s speed much less than the rest. On my M2 Max it took
| context depths of 768K to drop tok/s to ~10 tok/s.
|
| https://x.com/ljupc0/status/2062457314414587996
|
| Other local models I've checked drop to unusable speeds
| way sooner. Only other model with similarity favourable
| curve I've tried is nemotron-cascade-2-30b-a3b. But it's
| a small model, way dumber than DS4F.
|
| Coding agents use cases have large context depths. The
| rate of decline is as important as the headline number.
| binary0010 wrote:
| I exclusively use deepseek v4 flash now, completely stopped
| using slow models like Claude.
|
| Basically I never have to wait - yes I have to tell it little
| corrections occasionally (but I know the domain really well
| so that's not an issue), but it's so much faster than
| anything else it's kinda crazy. I love the super fast speeds
| with high involvement development cycle.
|
| I actually enjoy using agentic development flows for the
| first time now - whereas with Claude I absolutely hated it.
| That 5 to 20 min wait after every prompt absolutely killed my
| desire to even want to work at all.
| SwellJoe wrote:
| DeepSeek is the fastest model in the benchmarks I've been
| doing (https://swelljoe.com/post/will-it-mythos/). Followed
| not so closely by Opus 4.8 and even less closely by Gemini
| 3.5 Flash and GPT 5.5. I've been really impressed with it, so
| far. It's also among the best at doing the work, though still
| trailing the frontier models from Anthropic and OpenAI.
| throw-the-towel wrote:
| FWIW, for me just today it got itself into silly rabbit holes
| twice, and both times I had to fix things myself. Scarily,
| this is something I catch myself doing as well.
| andai wrote:
| With Flash it's basically instant for smaller tasks, yeah.
| recroad wrote:
| Woah - what's the prompt and what's the PR?
| goyozi wrote:
| I replied in more detail under another comment. TLDR: fixing
| flaky CI across multiple branches
| HarHarVeryFunny wrote:
| I don't see many companies being willing to pay 3x more for
| faster code generation. Cloud-based AI code generation is
| already extremely fast, and hardly the bottleneck for most
| software product development.
|
| There can't be many normal use cases where there'd be any cost
| benefit.
| fragmede wrote:
| The "traditional" way we vibe code is human software
| developer prompts AI -> AI generates code -> (human checks
| code) -> code gets compiled/deployed/etx -> users use
| "binary". At the speed of 1000 tok/sec, user prompts
| obliquely -> AI vets generated code -> code deployed -> user
| gets response from deployed code.
|
| It's a cute toy right now, but you can tell an LLM that it's
| an http server, and have it respond directly to a web browser
| hitting it. It generates headers in response, as well as page
| contents. As 1000 tok/sec becomes three new normal, we will
| come up with newer ways to use it outside of toy fiction
| encyclopedias.
| HarHarVeryFunny wrote:
| 1000 tokens per sec is still massively slower than serving
| a normal web page - if something doesn't respond in a few
| seconds many people give up.
|
| I'm not saying there aren't any use cases for super-fast
| (and super-expensive) generation, but it does seem a bit
| niche. If it was free then sure faster is better, but what
| are the mainstream use cases where people might pay 3x more
| for a faster version of something that is already fast?
|
| I think it would have to be an application where it paid
| for itself - where the 10x faster response was actually
| worth more than 3x the cost to you - where the extra speed
| was worth the extra cost.
| efromvt wrote:
| I'd be very curious about the bottleneck breakdown in most
| current software dev - I suspect inference is far from the
| bottleneck in most things I do, though driving it to 0 would
| still be _nice_. I do agree that if it was 0 we 'd probably
| change development approaches to reduce the new bottlenecks
| more, but it'll take full-process innovation to really get
| something near-instant.
|
| (I should go measure this now, I'm curious)
| ilaksh wrote:
| Use Claude fast mode and turn off thinking. Tell it to just
| explain what it's plan is to you at a high level.
|
| It will go much faster.
| skybrian wrote:
| If we get low enough latency, there's no reason to multitask.
| You can ask it to do one thing at a time and immediately see
| what it did. That's a nice way to work!
|
| This is normal interactive UI for tasks that aren't compute-
| intensive. Programs spend most of their time idle, waiting for
| us to click a button. We shouldn't be waiting for them or
| spinning more plates to keep them busy.
|
| However, a faster llm isn't enough. You also need fast compiles
| and fast tests.
| binyu wrote:
| > Right now Claude is faster than me on some tasks but we're at
| least close.
|
| I dont doubt it, but I don't think you can spawn 10 copies of
| yourself working simultaneously.
| AlecSchueler wrote:
| No, but nor can you keep track of what 10 agents are doing
| simultaneously. Hence the multitasking regret.
| pixel_popping wrote:
| An agent can, you don't need to watch tasks, you can have a
| live digest with another tool.
| logankeenan wrote:
| Do you have any recommendations for a live digest tool?
| UncleOxidant wrote:
| Have you tried Gemini 3.5 Flash? It's quite fast. Amazing how
| fast it finishes tasks. Much faster than Claude.
| switchbak wrote:
| Now the next bottleneck is the compiler - which we can model in
| an LLM! It's only wrong 15% of the time :)
|
| But truly, using Cerebras at ~2k tokens/s, with very low
| latency is like a vision into the future. You start to rework
| your workflow around things that can happen without onerous
| manual review - stating the conditions for success, etc. It's
| rare that I have a problem that maps well to that, but I expect
| this is where things are headed.
|
| Of course the fast models tend to not be the SOTA ones, but if
| that was the case - high quality and near-instant thinking,
| that's a game changer that I don't think we're really prepared
| for. The things that get unlocked with higher-than-reasonable
| speed become very interesting.
| coderbants wrote:
| It cuts both ways. Sometimes I ask Gemini 3.5 Flash to do
| something for me and it kicks it out almost instantly and it
| works great, and it's a bit scary how quickly it can do that.
|
| Then I ask it to do something else and it goes off-road and
| where I used to be able to interject with a "wow wow wow,
| that's not right", by the time I see the text on screen and
| react it's already made massive changes. Short of making it
| commit between every edit it's hard to prevent it from going
| wrong as quickly as it goes right (and even then, it can make a
| boo-boo on a remote API too depending on how much privilege it
| has).
| bendangelo wrote:
| I use planning mode in opencode. It has a prompt to tell it
| to plan it out etc. Then I execute with a smaller model. it
| works well
| Bombthecat wrote:
| Living on the street or cave lol
| dkersten wrote:
| I've been playing around with groq and GPT OSS which they run
| at 1000 TPS (20B) or 800 TPS (120B) and the speed feels quite
| magical.
|
| I haven't tried cerebras' 3000 TPS yet but I did try the demo
| of that 15,000 TPS model whose name escapes me right now.
|
| I'm not sure if it makes a meaningful difference for my actual
| work, but it sure is amazing to watch it generate a screen full
| of text in the blink of an eye.
|
| I do think it's super useful for rubbing little validation
| checks like showing it a diff to ensure that the changes are on
| task, and being able to do those quicker really helps because
| it means you can do many focused checks without them getting in
| the way.
| robberth wrote:
| https://chatjimmy.ai/ ?
| msdz wrote:
| AFAIK Taalas, the company behind this demo, still only have
| their initially "hardwarized" model available to test in
| ChatJimmy, which IIRC is a rather stupid Llama 3ish 8b.
|
| Don't get me wrong though, that demo is still incredibly
| impressive & makes me very much excited for the hardware-
| based model era (potentially) ahead.
|
| Once you've experienced those speeds, you really start to
| think about the whole class of things that becomes
| possible; massively parallel decode paths, extensive
| reasoning loops, etc...
| hedgehog wrote:
| For scale though if three or four chips that size can
| replicate a Qwen 27B experience that'll be quite useful.
| OtomotO wrote:
| > That's a game changer and I don't even know where we fit in.
|
| Doing non trivial work.
| giancarlostoro wrote:
| You can run Claude in "fast" mode it costs you more on your
| compute use, but its reasonably fast. I'm not sure I care to go
| "faster" than where things are now, otherwise you start losing
| on manual review and testing time. I would argue that Claude
| can poop out weeks (if not months) of coding effort in a few
| hours, and get you insanely close to a good product if you
| define the tech stack, and the business rules. Can it goof here
| and there? Sure. You can also make it refactor all the code on
| a whim faster than any intern could. I think it's good enough
| to avoid you mundane stupid bugs in most cases. I don't know
| what people who hate it are doing, maybe they're not even
| trying at all or are dismissing it from the first output (as
| though everyone writes perfect code in one shot right?) or
| maybe its just pride getting in the way of them using a decent
| tool to its true potential.
| cman1444 wrote:
| Reminds me of the doherty threshold. When will AI respond in
| less than 400 milliseconds?
| fnordpiglet wrote:
| I've used codex code optimized for a few projects and it's
| unsettling how fast it is. It's hard to think fast enough to
| keep up with it. Mental fatigue was a real challenge because
| the decisions that required my input were rapid fire and
| legitimate ambiguities that were appropriate escalations. I am
| too much a geezer for the intensity of it. But I'll take it!
| noisy_boy wrote:
| The first wave was just getting half decent answers. The second
| wave was being able to choose between actually getting
| reasonably ok coding results OR getting not so great results
| very fast. The third wave would be getting good results fast.
|
| We need to really worry when we get amazing results very fast.
| h14h wrote:
| The gated "ultra-speed" phenomenon seen here and with the
| Cerebras Kimi K2.6 release, while understandable, is somewhat
| troubling IMO.
|
| Getting ~1000 TPS on near-frontier intelligence is a step change,
| and enables whole new use-cases for applications. Seeing limited
| compute resources beget selective access makes me worry for the
| future of competition.
| trilogic wrote:
| Pfff time wasting. 1 password between 8-16 characters, and this
| and that... What??? 2 Captcha after captcha, come on 3 Service
| unavailable This service is not available in your region yet.
|
| Are you kidding me. Come back when you are ready for the users. I
| was hopping to try it, what a frustration.
| prplfsh wrote:
| This will be really powerful for voice. Being able to reason
| makes LLM so much smarter but with voice your latency budget is
| so tight that you can't spare the time typically.
| jeffrallen wrote:
| This is true for humans too. Lol
| pullshark91 wrote:
| It's interesting but not game-changing IMO. Speed here is not a
| bottleneck.
| gertlabs wrote:
| MiMo V2.5 Pro (regular speed) remains the strongest open weights
| agentic coding model we've tested -- it's been interesting to see
| how little attention it has received relative to some lower
| performing releases. And the "fast mode" pricing is very
| competitive here.
|
| Data at https://gertlabs.com/rankings
| unrvl22 wrote:
| why is deepseek v4 pro a lot lower than flash? where is mimo
| 2.5?
| gertlabs wrote:
| DeepSeek v4 Pro struggles with a custom harness, and all the
| models ranked above it don't, so it gets downweighted in the
| agentic coding benchmarks (although it ranks better than
| Flash in one-shot problem solving:
| https://gertlabs.com/rankings?ow=1&mode=oneshot_coding). We
| ran plenty of samples.
|
| MiMo v2.5 is on there, as well as the pro version.
|
| We found a few anomalies in our evaluations, which makes
| sense -- if every new sub-release is better across the board
| in every area of the model card, that should raise alarms
| about benchmaxxing. But the main thing we found is that hype
| != performance, and I trust our benchmark methodology
| significantly more than the model cards the labs add to their
| press releases.
| digdugdirk wrote:
| Can you explain more about how it struggles? I haven't
| noticed any issues in my usage, so I'm just curious what is
| meant by this.
| gertlabs wrote:
| It's likely overfit to common harnesses and iteration
| patterns, so it struggles with formatting tool calls and
| json in our testing which use our own harnesses (although
| there is a lot of overlap with tools that would be found
| in any coding harness like bash, apply_patch, etc.)
|
| We didn't love the results because it draws negative
| scrutiny to our benchmark, but the results are real and
| done at scale and I think DeepSeek V4 Pro's inability to
| do agentic work outside of environments it was trained on
| is an important thing to measure, especially when so many
| other models can generalize to new environments just
| fine.
|
| Google models also struggle with tools, but they have
| very strong initial answers, so there is more potential
| for them to bridge the gap with some better post-
| training.
| andai wrote:
| Mimo struggles with my custom harness. (Ignores the
| instructions and defaults back to its own preferred tool
| calling syntax.)
|
| Flash handles it fine, which I found amusing. (Since Mimo
| is supposed to be opus level!) But Flash seems to work even
| better in Claude Code...
|
| With smaller models I always have the issue of needing to
| adapt myself to _their_ preferred workflow... which sort of
| defeats the purpose. Price is hard to beat tho :)
| isusmelj wrote:
| No note about the specific GPU they use. One might speculate.
| B200? H200? H100?
| PhunkyPhil wrote:
| Obligatory taalas mention:
|
| https://taalas.com/
|
| Despite the performative UI components they have a shipped (demo)
| product:
|
| https://chatjimmy.ai/
|
| This is only 3.1 8B and a very small context window, but at 17k
| tokens per second it's likely enough to reliably call tools which
| would make a huge difference in agentic applications. Assuming
| they can bake in better models I'm just as bullish or even moreso
| on this, considering this opens up edge computing at the
| extremely low power requirement.
|
| High tok/s is the future IMO.
| kilroy123 wrote:
| My dream is claude or codex running at this speed.
| est wrote:
| More realisticly, I hope qwen 3.6 27B on taalas.
| desireco42 wrote:
| I didn't use their pro speed but regular Mimo-v2.5, not even pro,
| it seems really fast. I have plenty of tokens and subscriptions
| but this is really impressive. I really don't need another one,
| but I am tempted simple because it works so fast, can't imagine
| how this fast service can be.
| dakiol wrote:
| So, regarding the productivity argument: I don't get it. It
| doesn't really matter (for regular employees) that you can do now
| in 2h what before it took 2 days. Why? Because it's not that you
| have the rest of the day for yourself. You still have to work
| 8h/day as usual. But now the pattern is different: instead of
| enjoying the craft digging deeper into problems in the span of 2
| days, now you are rushing into some slot machine with the hope of
| it giving you the right answer with the right prompt.
|
| So, if any, I would say it's worse for us. Obviously, it's the
| completely opposite situation for corporations and executives:
| they are loving the AI situation so much!
| fullstop wrote:
| It's making things less fun, for me at least.
| linsomniac wrote:
| Odd, I'm having the opposite experience.
|
| The thing I really love about working with computers is when
| I achieve something. That's the thing that makes me
| figuratively, and sometimes literally, throw my fists into
| the air and go "Yeaaah!"
|
| With the AI tooling, I'm getting those more like a couple
| times a week.
|
| Plus, I'm using AI to attack the things in my day that are "a
| drag", and getting them done too.
|
| The highs are more frequent and the lows are not so low.
| fullstop wrote:
| Oh, sure, I can make things with it. But I have an
| extraordinarily hard time saying that _I_ made something.
|
| It feels like it cheapens the whole thing. Maybe I'm just
| old, because I remember people saying the same thing about
| code completion in Visual Studio back in the late 90s.
|
| This is so much more than code completion, though.
| dd8601fn wrote:
| Exactly how I feel. _I_ didn't make a damn thing. I
| essentially asked a chatbot to.
|
| Did I ask for better things with some important concepts
| pre-rolled? Yeah, of course. But that's so, so much less
| interesting than having actually made a thing.
|
| I try to remind myself that the output of my projects
| have nothing to do with who I am, but the honest truth is
| they always mattered to me.
|
| Now that's dead, and it's never coming back. It ain't
| exactly existential dread, but it _is_ something I've
| lost.
| dd8601fn wrote:
| I did a deep binge on two or three projects I would never
| do, and like five small ones that would have consumed
| months.
|
| It felt like that, kinda, for a bit. Now whenever it does
| something for me I get nothing. I didn't do it... the
| chatbot did. What's for me to celebrate? How can there be
| any real pride or satisfaction for a thing that was just
| handed to me because I asked for it?
|
| If anything it diminishes my satisfaction looking back on
| previous projects. They're "a few hours with a chatbot",
| now.
|
| The things I had to learn and the informed decisions I had
| to make? All pointless trivia, now. A child could do it.
|
| The magic and possibilities parts just all wore off after a
| heavy run, and I don't know if that's ever coming back.
| linsomniac wrote:
| I hear what you and the other sibling comment are saying.
| I, thankfully, somehow, am able to focus more on the
| results than the process. Having fun playing a game (that
| AFAIK no longer exists) with my family is still having
| fun. Having people using a new apt cacher that fixes
| problems with existing ones, and also can survive the
| recent DDoS, is still a really great thing.
|
| But, I'm not going to yuck your yum. I appreciate the
| people who do jointery using hand tools, even if I'm out
| here with a track saw and a router.
| fullstop wrote:
| Do you feel the same way about cloning a GitHub repo and
| building it? It, too, achieved a result.
|
| The track saw and router, imo, are existing libraries.
| pmontra wrote:
| > The things I had to learn and the informed decisions I
| had to make? All pointless trivia, now. A child could do
| it.
|
| Probably this is a hyperbole. Did you do the experiment?
| I expect that the child won't be able to do it. Ask an
| adult. Same thing. Ask an expert of the domain. Maybe but
| not as fast or as good as you.
| ttoinou wrote:
| In which world do you live where employees work 8 hours per day
| ? They clock 8 hours per day maybe, but they don't work that
| time
| mettamage wrote:
| I agree with you.
|
| I am on Dutch subreddits a lot, to get a local pulse and not
| to be too HN minded.
|
| A lot of them would have vilified you by now. Some even would
| have even questioned your morality.
|
| Again, I agree with you. But clearly not everyone has this
| view.
| mystifyingpoi wrote:
| Generally, when people say they are working 8h/day, they
| don't literally mean it. Even "work" is basically impossible
| to define for a SWE.
| drob518 wrote:
| I had a friend who was CEO of a startup tell me that he
| typically only "worked" an hour a day, not because he was
| lazy but just because there was so much nonsense in his
| schedule. He told me he was trying to get it to two hours per
| day.
| the_sleaze_ wrote:
| How successful did he turn out to be? As a CEO your days
| should be jam packed with brutal "chewing glass and gazing
| into the abyss". Is he running a lifestyle type company?
| Lalabadie wrote:
| Tangential, but all companies are lifestyle companies, in
| the sense that they serve their owner's lifestyle
| choices.
|
| It's just that lots of owners want a company that pulls
| them away from all other areas of life.
| ai_slop_hater wrote:
| Some companies force you to actually work 8 hours a day. It's
| hell.
| ttoinou wrote:
| Which country and which companies ?
| formerly_proven wrote:
| E.g. factory work
| ttoinou wrote:
| Oh yeah its not the same, we were discussing Agentic AI
| ai_slop_hater wrote:
| I worked at a software company that made screenshot of
| your screen every minute. I also worked a non-software
| white collar job where you were expected to work non-stop
| for 8 hours, except for an unpaid lunch break.
| dakiol wrote:
| In theory, ofc. But that doesn't matter. If you were doing
| something that took 2 days in average, but you were doing it
| in half the time, then that was fine pre LLMs. Nowadays your
| manager knows that with LLMs you need to deliver faster no
| matter what, and then it's more difficult to "hide" and to
| slack.
| ttoinou wrote:
| Yeah. So, good things. We ack know that people are mostly
| slacking at work
| opsnooperfax wrote:
| Here's my hot take as an elder millennial. Boomers are the
| absolute worst at being unable to make the distinction
| between time at work and time doing work. They may show up an
| hour before everyone else but spend the first two or three
| hours a day, reading the news and getting coffee and making
| small talk and accomplishing literally nothing. Then crow
| about their work ethic.
| noncoml wrote:
| You have to think LLM as the genie that tries to trick you.
|
| First make it write a contract (REQ/ARCH/IMPL documents). Skim
| through those for any mistakes.
|
| Then based on those ask it to write tests. Again skim through
| them.
|
| Now you have a context full of guardrails. It's less likely to
| surprise you.
| petesergeant wrote:
| I find a second LLM can do this at least as well as I can,
| usually, and just ask the harness to surface anything they
| can't agree on.
| schipperai wrote:
| You can dig deeper into problems with AI. For me, it
| supplements my knowledge in domains I don't fully understand.
| It also helps me learn. So I can tackle problems I wouldn't
| otherwise.
|
| I'm excited for ultrafast AI. It likely means less temptation
| to multi-thread and deeper flow in single sessions.
| 8note wrote:
| how do you know that it is actually suggesting the right
| thing?
| Klaster_1 wrote:
| Some things are verifiable. Before coding agents, if I
| encountered an issue with a library or a framework, my
| first hunch would be to find a GitHub issue with a
| suggested workaround. Nowadays, I can ask an agent to
| really dig into it and often it does surface the root
| cause. For example, the other day I got a test hangup after
| updating to Angular 22, and the agent managed to find the
| bug and suggest a very trivial workaround compared to what
| I originally planned to go with. I reported the issue and
| it was fixed the next day, more or less along the lines of
| what I'd do.
| alfalfasprout wrote:
| Generally, I agree because what happens is the messaging around
| AI is doing more, faster. Not using AI to deliver at a higher
| quality level, etc. But I think it boils down to incentives and
| discipline. So given the incentives we have today at most
| workplaces faster AI will just be used to produce more slop.
| logicchains wrote:
| >instead of enjoying the craft digging deeper into problems in
| the span of 2 days, now you are rushing into some slot machine
| with the hope of it giving you the right answer with the right
| prompt.
|
| If you're treating it like a slot machine you're doing it
| wrong. It will give you exactly what you ask for if you ask
| clearly, i.e. write a clear, detailed specification, not just
| "do X!". The nondeterminism comes from vagueness in
| specification.
| yogthos wrote:
| I think of it as a genetic algorithm loop. The LLM is basically
| a mutator function within the loop. If you can define the end
| shape you're looking for using tests and specification then you
| can throw the LLM at the problem and have it converge on the
| solution. It generate some code, it gets run, the LLM is fed
| the result back, and it iterates. If you can run the LLM at a
| really high throughput, then you can iterate on the solution
| faster. This can largely compensate for the overall capability
| of the model. Instead of hoping it gets the right solution in a
| few shots, you can just have it try a whole bunch of things
| until you get a useful result.
| fragmede wrote:
| That's the fundamental trade off of a job where someone else
| gives you stuff to do and you get money. We may pride ourselves
| on software development being a job 'above' flipping burgers,
| but you're getting paid to have your butt in a chair for 40
| hours a week. In exchange, you don't have to worry about the
| business shit. How much a burger or SaaS license costs the user
| isn't your problem. You take Jira tickets and implement them.
| You trade time for money. If, instead, you work for yourself;
| contracting, writing your own apps, buying lottery tickets,
| then you're trading results for money. If you're a freelance
| web developer with a stable of clients, it's a great time! What
| used to take a week takes hours, and you can charge your
| clients the same amount to build an even better website with
| you using AI, which means you get the choice of building a new
| website for additional clients, or you can take the time off
| and not build additional websites. But you have to hustle to
| continually get new clients, before AI and after AI. So it's a
| different life.
| himata4113 wrote:
| I was saying that AI is going to make software development
| cheaper as in the salaries of software engineers will go down
| because some of that salary will now be redirected to AI
| companies and the fact that the world will need to absorb
| twice-(x10?) the amount of the development power.
| vanuatu wrote:
| its not obvious to me that salaries go down, my hunch was
| that salaries go up but the bar is higher. Software becoming
| easier to produce (still hard to verify and make useful fwiw)
| raises the ambitions of software projects, and we don't seem
| to be close to the ceiling of demand for software systems
| himata4113 wrote:
| There's a limit to what the demandXsupply curve can absorb.
| It really depends if there's twice as many developers or 10
| times more. I think we have enough software development
| jobs to where we can absorb productivity doubling rather
| easily, not so sure about anything beyond that.
| vanuatu wrote:
| True on the demand/supply curve
|
| I think due to how leveraged software is, the top % of
| software developers are more desired (and compensated)
| than ever, and the bottom % will have difficulty finding
| a role, and there are structural barriers to entering
| that top % (intelligence, location, etc). Companies have
| infinite demand for the cream of the crop talent
| himata4113 wrote:
| I can actually back this up, most job offers I get
| actually come from people I happened to work with that
| never get a public job listing and are only obtainable
| via being highly regarded by others. I was told that my
| friend in their department where the role opened up got
| an email about a senior position and to reply if they
| have a recommendation.
|
| However, software development is funny in a way where you
| don't need a job in order to be successful. I've never
| worked at a company and I'm pretty up there on the
| ladder, but I am not quite sure what will happen in next
| few years when ever possible thing that can be made in
| software is already explored to the fullest especially
| with singular developers launching 3 to 7 projects a
| month.
| enraged_camel wrote:
| I dig into problems way, way deeper with AI than without. I can
| also add a lot more polish to features, add more test coverage,
| write more documentation, explore multiple approaches rather
| than go with gut-feel, and so on.
| vanuatu wrote:
| Employees who get paid a flat rate per hour don't have the
| incentive to do more than their job
|
| Equity / profit sharing should be commonplace in the age of AI.
| dilyevsky wrote:
| Like with any tech there are dumb ways of using it and there
| are smart ways. Treating it as a "slot machine giving you the
| right answer" is a dumb way - it may work for a bit, but it
| won't carry you very far because everyone else can also do
| this. No one is stopping anybody from digging deeper into
| problems than ever before using this technology - that's the
| smart way.
| erikus wrote:
| I'm amazed at how steep the AI learning curve continues to be
| and how people are spread so far apart on it. I think
| supercharged learning with AI and agents is undervalued at
| this point but that more people will realize its utility over
| time, especially as a complement to delegating work.
|
| It also makes me think about the temptation to stop thinking
| with these tools, i.e. "cognitive surrender". Addy Osmani
| wrote a nice blog post about this:
| https://addyosmani.com/blog/cognitive-surrender
| andai wrote:
| Yeah, nobody is under any pressure to work even faster than
| before. I don't know what everyone is complaining about!
| drschwabe wrote:
| Sure but if you're really unhappy with your employer
| employeeing you for 8 hours a day you can also harness this
| power on your own personal projects to help break free from the
| 9-5 grind if you so desire.
| __david__ wrote:
| Only if your personal projects make you money. I have a
| million hobby projects but none generate income.
| IncreasePosts wrote:
| A huge class of problems are just toil and drudgery. Maybe ai
| will give you even more time to dig into juicy problems that
| are too complex for it to solve, by letting you bypass all the
| pure toil problems.
| powerapple wrote:
| In my case, I think slower model makes it hard to manage
| context and tasks in parallel. I would much prefer to work in
| one task only, and finish it, take a break, and work on another
| task. Currently I have three tabs for three tasks in parallel,
| it is much worse than because constantly context switching is
| painful. I think a faster model would mean that you don't have
| to start a new task while waiting.
| erikus wrote:
| Agents completing work faster would certainly help me as well
| since I also find context switching exhausting above some
| threshold.
|
| Build and test would move back into the critical path,
| though, and for some projects that will take effort to bring
| down.
| DenisM wrote:
| > with the hope of it giving you the right answer with the
| right prompt.
|
| Consider that our ability to evaluate quality of the output is
| falling further behind our ability to produce it. The "right
| answer" is not the most likely outcome.
| overgard wrote:
| I feel like I spend a lot more time reviewing and fixing the
| output of it and debugging parts it can't debug, so to me a
| faster model is optimizing the part that is already pretty
| fast. If my job were greenfield stuff I would probably YOLO it
| more, but when you're working on a launched product with a lot
| of users..
| pmontra wrote:
| If you split the tasks for the AI in small chucks you keep the
| architectural control and it's not a slot machine anymore. You
| still read code and occasionally you write code too. Not much
| but it's the price to pay for the extra speed.
|
| If you start the AI on something big and come back after one
| hour then yes, you might discover that you wasted an hour and
| got nothing.
| jbellis wrote:
| it is hard to understand what the actually meaningful innovations
| are here / what TileRT is bringing to the table.
|
| - dflash: new-ish but February is ancient by the standards of the
| pace of AI innovation lately, I guess applying it to a 1T model
| is new-ish in the sense that the dflash researchers don't have
| the hw budget to prove that out - persistent engine kernel: this
| is like CUDA 101 - warp specialization: I think this just means
| "keep different gpu resources all busy w/ pipelining" which is
| CUDA 201, some of it is even baked into pytorch now - MXFP4 QAT:
| not new - TileRT: hard to tell what this actually does, there's a
| PyPi wheel with support for DS 3.2 and GLM 5 but binary only
| GodelNumbering wrote:
| Below is the part I found most interesting
|
| > "However, naively applying FP4 across the entire model causes
| degradation in complex reasoning, logic, and code generation.
| Given the MoE (Mixture of Experts) architecture of Xiaomi
| MiMo-V2.5-Pro -- where Experts constitute the vast majority of
| parameters and exhibit the highest tolerance to quantization --
| we selectively quantize only the MoE Experts to FP4 while
| preserving original precision for all other modules. Through FP4
| QAT (Quantization-Aware Training), we dramatically reduce model
| size and maximize hardware bandwidth utilization while keeping
| the model's overall capability essentially on par with the
| original, as shown below"
| buildbot wrote:
| The 120B and 20B GPT-OSS models by OpenAI did this last year
| for what it's worth; the MoEs where MXFP4
| 0xbadcafebee wrote:
| This is the value prop of Groq and Cerebras. They don't have the
| best models, but they have the fastest inference, and Groq has
| both the lowest cost and fastest speed.
| pants2 wrote:
| With a tps and a token price you can calculate approx. price per
| hour of running the model!
|
| $2.61/M tokens * 1,000 tok/s = $9.40/hr
|
| That would be pretty cheap for an 8-GPU node which would
| typically run around $45/hr or more. Guess this depends on how
| many parallel streams it can handle.
| wartywhoa23 wrote:
| An exercise for the near future:
|
| Albert has a chalet in swiss alps and an uncles' fortune, burning
| tokens at 11 kHz.
|
| Joe has a rental capsule and a UBI, burning equally priced tokens
| at 23kHz.
|
| Who's the first to solve the problem of maniacs in power?
| aburayhanalif wrote:
| it is good i think
| _pdp_ wrote:
| Do you know what will be cool?
|
| It will be cool to measure models based on their RAW performance
| and measure them in terms of ROI - not some benchmark but
| something meaningful like we used this model to solve X.
|
| That will be a massive mind shift and might justify the token
| expenditure.
| HDBaseT wrote:
| Aren't benchmarks exactly that?
|
| We used the AI to solve given problem with x%
| adherence/quality/correctness?
| siddbudd wrote:
| to try the demo you need to sign up. why? to sign up you need a
| password 8-16 chars. Why limit at 16? geez, I hate Chinese IT
| companies with a passion.
|
| update: AFTER signing up, and only then, am I told: 'This service
| is not available in your region yet.'
| overgard wrote:
| Pretty cool, although I can't help but think this would be a very
| easy to way rack up a GARGANTUAN bill. That company that blew 500
| million on Claude in a month might have competition soon..
| sheeshkebab wrote:
| Opus regularly bitches and wines to me how long something will
| take and that I should think before asking it to do it. But then
| it does it anyway in 15 minutes.
| temikus wrote:
| I've personally found MiMo models a hit and miss. I have some
| personal agentic projects and I found them to hallucinate hard at
| least 10% of the time. And do so in pretty sinister ways - making
| up people, names, places, etc. I switched back to Kimi for now.
| RachelF wrote:
| I wonder how fast it performs on just a CPU? If the model
| performs say 10x on a GPU cluster, would it also perform faster
| on a CPU?
|
| This could bring proper desktop AI to the average laptop user,
| which could be a game changer for running local models.
| mrwaffle wrote:
| What a ripoff you have to make an account then 'apply' to try
| this demo.
| digitaltrees wrote:
| Am I the only one that doesn't care about speed? I want it to not
| do stupid stuff and to be cheaper.
| Npovview wrote:
| Generally thinking tokens are the ones which are verbose. So
| the speed helps with reducing time for thinking tokens
| generations and you get your actual output code very fast.
| kopirgan wrote:
| Will this list for trillion dollar valuation as well?
| Frannky wrote:
| I tried this model it was pretty bad at coding. Maybe it was me.
| 1k tokens/sec pretty cool tho. Deepseek V4 pro is better. I
| wonder tweak pi + deepseek pro v4+ 1k tokens/sec if would
| actually be better than Claude code
| LoganDark wrote:
| I was just playing with Cerebras a few days ago because it's the
| fastest inference provider by far. Unfortunately, the only model
| anywhere near economical to run that fast is gpt-120b-oss which
| sucks at Pi's tool calling. So I've been hoping for something
| faster ever since, especially since my local hardware has a
| paltry 128GB of unified memory.
|
| Hopefully this pans out and fast models (that are also not
| ridiculously dumb) become the norm. It's amazing what you can
| unlock with even a single order of magnitude's speed improvement.
| bryabaek wrote:
| i tried to test it and after logging in, i get "You don't have
| access to this event trial" and can't even log out until i clear
| my cookies. despite having good model, why such a bad website?
| yanhangyhy wrote:
| have anyone give it a try? even in china, it's not popular...but
| xiaomi is really good at make price go down on everything...
___________________________________________________________________
(page generated 2026-06-09 06:00 UTC)