hngopher.com

       [HN Gopher] Tokens are getting more expensive
       ___________________________________________________________________
        
       Tokens are getting more expensive
        
       Author : admp
       Score  : 178 points
       Date   : 2025-08-03 11:01 UTC (11 hours ago)
        
 (HTM) web link (ethanding.substack.com)
 (TXT) w3m dump (ethanding.substack.com)
        
       | michaelbuckbee wrote:
       | A major current problem is that we're smashing gnats with
       | sledgehammers via undifferentiated model use.
       | 
       | Not every problem needs a SOTA generalist model, and as we get
       | systems/services that are more "bundles" of different models with
       | specific purposes I think we will see better usage graphs.
        
         | mustyoshi wrote:
         | Yeah this is the thing people miss a lot. 7,32b models work
         | perfectly fine for a lot of things, and run on previously high
         | end consumer hardware.
         | 
         | But we're still in the hype phase, people will come to their
         | senses once the large model performance starts to plateau
        
           | _heimdall wrote:
           | I expect people to come to their senses when LLM companies
           | stop subsidizing cost and start charging customers what it
           | actually costs them to train and run these models.
        
         | simonjgreen wrote:
         | Completely agree. It's worth spending time to experiment too. A
         | reasonably simple chat support system I build recently uses 5
         | different models dependent on the function it it's in. Swapping
         | out different models for different things makes a huge
         | difference to cost, user experience, and quality.
        
         | alecco wrote:
         | If there was an option to have Claude Opus guide Sonnet I'd use
         | it for most interactions. Doing it manually is a hassle and
         | breaks the flow, so I end up using Opus too often.
         | 
         | This shouldn't be that expensive even for large prompts since
         | input is cheaper due to parallel processing.
        
           | isoprophlex wrote:
           | You can define subagents that are forced to run on eg.
           | Sonnet, and call these from your main Opus backed agent.
           | /agent in CC for more info...
        
             | danielbln wrote:
             | That's what I do. I used to use Opus for the dumbest stuff,
             | writing commits and such, but now that' all subagent
             | business that run on Sonnet (or even Haiku sometimes). Same
             | for running tests, executing services, docker etc. All
             | Sonnet subagents. Positive side effect: my Opus allotment
             | lasts a lot longer.
        
               | illusive4080 wrote:
               | I'm just sitting here on my $20 subscription hoping one
               | day we will get to use Opus
        
         | nateburke wrote:
         | generalist = fungible?
         | 
         | In the food industry is it more profitable to sell whole cakes
         | or just the sweetener?
         | 
         | The article makes a great point about replit and legacy ERP
         | systems. The generative in generative AI will not replace
         | storage, storage is where the margins live.
         | 
         | Unless the C in CRUD can eventually replace the R and U, with
         | the D a no-op.
        
           | marcosdumay wrote:
           | > In the food industry is it more profitable to sell whole
           | cakes or just the sweetener?
           | 
           | I really don't understand where you are trying to get. But on
           | that example, cakes have a higher profit margin, and
           | sweeteners have larger scale.
        
         | empiko wrote:
         | Yeah, but the juiciest tasks are still far from solved. The
         | amount of tasks where people are willing to accept low accuracy
         | answers is not that high. It is maybe true for some text
         | processing pipelines, but all the user facing use cases require
         | good performance.
        
       | comrade1234 wrote:
       | I'm kind of curious what IntelliJ's deal is with the different
       | providers. I usually just keep it set to Claude but there are
       | others that you can pick. I don't pay extra for the AI assistant
       | - it's part of my regular subscription. I don't think I use the
       | AI features as heavily as many others, but it does feed my code
       | base to whoever I'm set to...
        
         | louthy wrote:
         | Are you sure you don't pay extra? I'm on Rider and it's an
         | additional cost. Unless us C# and F# devs are subsidising
         | everyone else :D
         | 
         | Edit: It says on the Jetbrains website:
         | 
         | "The AI Assistant plugin is not bundled and is not enabled in
         | IntelliJ IDEA by default. AI Assistant will not be active and
         | will not have access to your code unless you install the
         | plugin, acquire a JetBrains AI Service license and give your
         | explicit consent to JetBrains AI Terms of Service and JetBrains
         | AI Acceptable Use Policy while installing the plugin."
        
           | comrade1234 wrote:
           | When they first added the assistant it was $100/yr to enable
           | it. However, it's now part of the subscription and they even
           | reimbursed me a portion of the $100 that I paid.
        
             | terminalbraid wrote:
             | You're one of the lucky ones. They just outright stole from
             | many of the people who did pay for it.
        
           | double051 wrote:
           | If you pay for the all products subscription, their AI
           | features are now bundled in. I believe that may be a
           | relatively recent change, and I would not have known about it
           | if I hadn't been curious and checked.
        
             | louthy wrote:
             | I've just checked. I have the 'dotUltimate' bundle, which
             | now appears to include 'AI Pro'.
             | 
             | They didn't cancel my existing 'AI Pro' subscription
             | though, and have just let it keep running with no refunds.
             | 
             | Thanks, Jetbrains. You get worse every day.
        
         | terminalbraid wrote:
         | Considering they didn't significantly change their pricing when
         | they bundled the equivalent of a ~$10-20/mo subscription to
         | their Ultimate pack (which I pay something around $180/year
         | for), I'm guessing they're eating a lot of the cost out of
         | desperation for an imagined problem. That or they were fleecing
         | everyone from the beginning.
        
       | ath3nd wrote:
       | Mathematics are not relevant when we have hype and vibes. We
       | can't have facts and projections and no path to profitability
       | distract us from our final goal.
       | 
       | Which, of course, is to donate money to Sama so he can create AGI
       | and be less lonely with his robotic girlfriend, I mean...change
       | the world for the better somehow. /s
        
         | NitpickLawyer wrote:
         | I get your point but I think it's debatable. As long as the
         | capabilities increase (and they have, IMO) cost isn't really
         | relevant. If you can reasonably solve problems of a given
         | difficulty (and we're starting to see that), then suddenly you
         | can do stuff that you simply can't with humans. You can "hire"
         | 100 agents / servers / API bundles, whatever and "solve" all
         | tasks with difficulty x in your business. Then you cancel and
         | your bottom is suddenly raised. You can't do that with humans.
         | You can't suddenly hire 100 entry-level SWEs and fire them
         | after 3 months.
         | 
         | Then you can think about automated labs. If things pan out, we
         | can have the same thing in chemistry/bio/physics. Having
         | automated labs definitely seems closer now than 2.5 years ago.
         | Is cost relevant when you can have a lab test formulas
         | 24/7/365? Is cost a blocker when you can have a cure to
         | cancer_type_a? And then _b_c...etc?
         | 
         | Also, remember that costs go down within a few generations.
         | There's no reason to think this will stop.
        
           | ath3nd wrote:
           | > You can "hire" 100 agents / servers / API bundles, whatever
           | and "solve" all tasks with difficulty x in your business.
           | 
           | In that bright AGI future, who does my business serve, like
           | who actually are my actual paying clients? Like, the robots
           | are farming, the robots are driving, the robots are
           | "creating" and robots are "thinking", right? In that awesome
           | future, what paid jobs do us humans have, so my clients can
           | afford my amazing entrepreneurial business that I just
           | bootstrapped with the help of 100s of agents? And how did I
           | get the money to hire those 100s of agents in the first
           | place?
           | 
           | > Is cost a blocker when you can have a cure to
           | cancer_type_a? And then _b_c...etc?
           | 
           | Yes, it very much is. The fact that even known and long
           | discovered solutions like insulin for diabetes management are
           | being sold to people at 9x its actual price should speak to
           | you volumes that while it's great to have cures for X, Y and
           | Z, it's the control over the production and development of
           | these cures that is equally, if not much more important for
           | the cure to actually reach people. In this rosy world of
           | yours, do you think Zuck will give you his LLAMAGI-generated
           | cancer cure out of the goodness of his heart? We are talking
           | about the same dude that helped a couple of genocides and
           | added ads in Whatsapp to squeeze the last cent of the people
           | who are trapped with an app that gets progressively worse and
           | more invasive.
           | 
           | https://www.rand.org/news/press/2024/02/01/index1.html
           | 
           | https://systemicjustice.org/article/facebook-and-genocide-
           | ho...
           | 
           | > Also, remember that costs go down within a few generations.
           | There's no reason to think this will stop.
           | 
           | The destruction of the natural world, the fires all around
           | us, the rise of fascism and nationalism, the wars that are
           | spawning all over the place and the fact that white and blue
           | collar jobs are being automated out while soil erosion and
           | PFAS make our land infertile point to a different future. But
           | yeah, I am simply ecstatic at the possibility that the costs
           | of generating a funny picture Ghibli style with a witty
           | caption could go down by 10 to 30%.
        
       | flyinglizard wrote:
       | The truth is we're brute forcing some problems via tremendous
       | amount of compute. Especially for apps that use AI backends
       | (rather than chats where you interface with the LLM directly),
       | there needs to be hybridization. I haven't used Claude Code
       | myself but I did a screenshare session with someone who does and
       | I think I saw it running old fashioned keyword search on the
       | codebase. That's much more effective than just pushing more and
       | more raw data into the chat context.
       | 
       | On one of the systems I'm developing I'm using LLMs to compile
       | user intents to a DSL, without every looking at the real data to
       | be examined. There are ways; increased context length is bad for
       | speed, cost and scalability.
        
       | mark_l_watson wrote:
       | I have already thought a lot about the large packaged inference
       | companies hitting a financial brick wall, but I was surprised by
       | material near the end of the article: the discussions of lock in
       | for companies that can't switch and about Replit making money on
       | the whole stack. Really interesting.
       | 
       | I managed a deep learning team at Capital One and the lock-in
       | thing is real. Replit is an interesting case study for me because
       | after a one week free agent trial I signed up for a one year
       | subscription, had fun the their agent LLM-based coding assistant
       | for a few weeks, and almost never used their coding agent after
       | that, but I still have fun with Replit as an easy way to spin up
       | Nix based coding environments. Replit seems to offer something
       | for everyone.
        
       | raincole wrote:
       | First of all the title is click-bait. Tokens are getting cheaper
       | and cheaper. People just use more and more tokens.
       | 
       | And everything, I mean everything after the title is only a
       | downhill:
       | 
       | > saying "this car is so much cheaper now!" while pointing at a
       | 1995 honda civic misses the point. sure, that specific car is
       | cheaper. but the 2025 toyota camry MSRPs at $30K.
       | 
       | Cars got cheaper. The only reason you don't feel it is trade
       | barrier that stops BYD from flooding your local dealers.
       | 
       | > charge 10x the price point > $200/month when cursor charges
       | $20. start with more buffer before the bleeding begins.
       | 
       | What does this even mean? The cheapest Cursor plan is $20, just
       | like Claude Code. And the most expensive Cursor plan is $200,
       | just like Claude Code. So clearly they're at the _exact_ same
       | price point.
       | 
       | > switch from opus ($75/m tokens) to sonnet ($15/m) when things
       | get heavy. optimize with haiku for reading. like aws autoscaling,
       | but for brains.
       | 
       | > they almost certainly built this behavior directly into the
       | model weights, which is a paradigm shift we'll probably see a lot
       | more of
       | 
       | "I don't know how Claude built their models and I have no insider
       | knowledge, but I have very strong opinions."
       | 
       | > 3. offload processing to user machines
       | 
       | What?
       | 
       | > ten. billion. tokens. that's 12,500 copies of war and peace. in
       | a month.
       | 
       | Unironically quoting data from viberank leaderboard, which is
       | just user-submitted number...
       | 
       | > it's that there is no flat subscription price that works in
       | this new world.
       | 
       | The author doesn't know what throttling is...?
       | 
       | I've stopped reading here. I should've just closed the tab when I
       | saw the first letter in each sentence isn't capitalized. This is
       | so far the most glaring signal of slop. More than the overuse of
       | em-dash and lists.
        
         | WA wrote:
         | All good points, but:
         | 
         | > _when I saw the first letter in each sentence isn 't
         | capitalized. This is so far the most glaring signal of slop._
         | 
         | How so? It's the exact opposite imho. Lowercase everything with
         | a staccato writing style to differentiate from AI slop, because
         | LLMs usually don't write lowercase.
        
           | lelanthran wrote:
           | I think GP is drawing a distinction between "slop" and "AI
           | slop".
           | 
           | This comes across as sloppily written, but not sloppily
           | generated.
        
           | Semaphor wrote:
           | Human slop instead of AI. Our race is catching up to the
           | machines again.
        
           | ankit219 wrote:
           | Likely op does not mean ai slop, but more a signal of human
           | carelessness that they could not write it in a proper manner.
        
       | djhworld wrote:
       | Over the past year or two I've just been paying for the API
       | access and using open source frontends like LibreChat to access
       | these models.
       | 
       | This has been working great for the occasional use, I'd probably
       | top up my account by $10 every few months. I figured the amount
       | of tokens I use is vastly smaller than the packaged plans so it
       | made sense to go with the cheaper, pay-as-you-go approach.
       | 
       | But since I've started dabbling in tooling like Claude Code, hoo-
       | boy those tokens burn _fast_, like really fast. Yesterday I
       | somehow burned through $5 of tokens in the space of about 15
       | minutes. I mean, sure, the Code tool is vastly different to
       | asking an LLM about a certain topic, but I wasn't expecting such
       | a huge leap, a lot of the token usage is masked from you I guess
       | wrapped up in the ever increasing context + back/forth tool
       | orchestration, but still
        
         | TechDebtDevin wrote:
         | $20.00 via Deepseek's api (Yes China, can have my code idc),
         | has lasted me almost a year. Its slow, but better quality
         | output than any of the independently hosted Deepseek models
         | (ime). I don't really use agents or anything tho.
        
         | zurfer wrote:
         | The simple reason for this is that Claude Code uses way more
         | context and repetitions than what you would use in a typical
         | chat.
        
       | senko wrote:
       | Insisting on flaunting English spelling rules (by not starting a
       | sentence with a capital letter) in a think piece is a dead
       | giveaway that the author thinks too highly of themselves, and
       | results in me automatically discounting whatever they're saying.
       | 
       | If I (and billions others) can be bothered to learn your damn
       | language so we can all communicate, do us a service and actually
       | use it properly, FFS.
        
         | SoftTalker wrote:
         | Well his name is Ethan so...
        
       | furyofantares wrote:
       | > claude code has had to roll back their original unlimited
       | $200/mo tier this week
       | 
       | The article repeats this throughout but isn't it a straight lie?
       | The plan was named 20x because it's 20x usage limits, it always
       | had enforced 5 hour session limits, it always had (unenforced?
       | soft?) 50 session per month limits.
       | 
       | It was limited, but not enough and very very probably still
       | isn't, judging by my own usage. So I don't think the argument
       | would even suffer from telling the truth.
        
         | Aurornis wrote:
         | You're right, the Max plan was never advertised as unlimited.
         | 
         | I can't believe how many comments and articles I've read that
         | assume it was unlimited.
         | 
         | It's like it has been repeated so many times that it's assumed
         | to be true.
        
       | robertclaus wrote:
       | My team is debating this exact question for a new product we have
       | in early access. Ultimately we realized the issue early on, so
       | even our plans option would include at-cost usage limits.
        
       | ysofunny wrote:
       | and the AIs stupider!
       | 
       | I am seeing problems with formatting that seemed 'solved'
       | already.
       | 
       | I mean, I have seen "the same" model get better and worse
       | already.
       | 
       | clearly somebody is calibrating the stupidity level relative to
       | energy cost and monetary gain
        
       | Havoc wrote:
       | The combination of "thinking models" plus the blind focus on
       | incremental benchmarking gains was a mistake for practical use.
       | 
       | You definitely want that for some tasks, but for the majority of
       | tasks there is a lot of space for cheap & cheerful (and non-
       | thinking)
        
       | mystraline wrote:
       | From the article:
       | 
       | > consumers hate metered billing. they'd rather overpay for
       | unlimited than get surprised by a bill.
       | 
       | Yes and no.
       | 
       | Take Amazon. You think your costs are known and WHAMMO surprise
       | bill. Why do you get a surprise bill? Because you cannot say
       | 'Turn shit off at X money per month'. Can't do it. Not an option.
       | 
       | All of these 'Surprise Net 30' offerings are the same. You think
       | you're getting a stable price until GOTAHCA.
       | 
       | Now, metered billing can actually be good, when the user knows
       | exactly where they stand on the metering AND can set maximums so
       | their budget doesn't go over.
       | 
       | Taken realistically, as an AI company, you provide a 'used
       | tokens/total tokens' bar graph, tokens per response, and
       | estimated amount of responses before exceeding.
       | 
       | Again, don't surprise the user. But that's an anathema to
       | companies who want to hide tokens to dollars, the same way
       | gambling companies obfuscate 'corporate bux' to USD.
        
         | ikari_pl wrote:
         | I often find Amazon pricing to be vague and cryptic, sometimes
         | there's literally no way to tell ehy, for example, your
         | database cost is fluctuating all the time
        
           | joseda-hg wrote:
           | Amazon pricing is nice if you compare it to Azure...
        
           | crinkly wrote:
           | Yeah that. We moved to AWS using their best practices and
           | enterprise cost estimation stuff and got a 6x cost increment
           | on something that was supposed to be cheaper and now we're
           | fucked because we can't get out.
           | 
           | It's nearly impossible to tell what the hell is going where
           | and we are mostly surviving on enterprise discounts from
           | negotiations.
           | 
           | The worst thing is they worked out you can blend costs in
           | using AWS marketplace without having to raise due diligence
           | on a new vendor or PO. So up it goes even more.
           | 
           | Not my department or funeral fortunately. Our AWS account is
           | about $15 a month.
        
             | ajb wrote:
             | Are you using separate accounts per use case? That's the
             | only real way to get a cost breakdown, otherwise you have
             | no idea what piece of infrastructure is for what. They
             | provide a tagging system but it's only informative if
             | someone spends several hours a month tracking down the
             | stuff that didn't get tagged properly.
        
               | crinkly wrote:
               | Yeah we have many accounts. Hence why I know ours cheap.
               | Difficult to break it down within the account as you say
               | without tag maintenance.
        
             | AtheistOfFail wrote:
             | > The worst thing is they worked out you can blend costs in
             | using AWS marketplace without having to raise due diligence
             | on a new vendor or PO. So up it goes even more.
             | 
             | Not a bug, a feature.
        
           | graemep wrote:
           | If your AWS costs are too complex for you to understand you
           | need to employ a finops person or AWS specialist to handle it
           | for you.
           | 
           | I am not saying this is desirable, but it is necessary IFF
           | you chose to use these services. They are complex by design,
           | and intended primarily for large scale users who do have the
           | expertise to handle the complexity.
        
             | ajsnigrutin wrote:
             | But they're also simple and cheap if you're a "one man
             | band" trying out some personal idea that might or might not
             | take off. Those people have no budgets for specialists.
             | 
             | Pricing schemes like these just make them move back to
             | virtual machines with "unlimited" shared cpu usage and
             | setting up services (db,...) manually.
        
               | mort96 wrote:
               | I'm 100% on team "just rent VMs and run the software on
               | there". It's not that hard, it has predictable price and
               | performance, and you don't lock yourself into one
               | provider. If you build your whole service on top of some
               | weird Amazons -specific thing, and Amazon jacks up their
               | prices, you don't have any recourse. With VMs, you can
               | just spin up new VMs with another provider.
               | 
               | You could also have potential customers who would be
               | interested in your solution, but don't want it hosted by
               | an American company. Spinning up a few Hetzner VMs is
               | easy. Finding European alternatives to all the different
               | "serverless" services Amazon offers is hard.
        
               | graemep wrote:
               | > You could also have potential customers who would be
               | interested in your solution, but don't want it hosted by
               | an American company.
               | 
               | Not happened yet. The nearest I have come to it was a
               | requirement that certain medical information stays in the
               | UK, and that is satisfied by using AWS (or other American
               | suppliers) as long as its hosted in the UK.
        
               | mort96 wrote:
               | I've worked in places where customers (especially
               | municipalities in Germany) have questioned the use of
               | American hosting providers. I don't know whether it has
               | actually _prevented_ a deal from going through (I wasn 't
               | close enough to sales to know), but it was consistently
               | an obstacle in some markets. This is despite everything
               | being hosted in EU datacenters.
        
               | graemep wrote:
               | Yes, definitely.
               | 
               | Most small business I have dealt with use AWS do just
               | need a VPS. If they are willing to move to a scary
               | unknown supplier I suggest (unknown to them, very often
               | one that would be well known to people on HN) then I
               | suggest AWS Lightsail which is pretty much a normal VPS
               | with VPS pricing - it significantly cheaper than an
               | instance plus storage, just from buying them bundled
               | (which, to be fair to Amazon, is common practice).
               | 
               | My own stuff goes on VPSs.
        
             | lelanthran wrote:
             | > If your AWS costs are too complex for you to understand
             | you need to employ a finops person or AWS specialist to
             | handle it for you
             | 
             | At that point wouldn't it simply be cheaper to do VMs?
        
               | graemep wrote:
               | Yes, very likely, but then why are you using AWS at all?
               | 
               | I think a lot of people are missing a key part of the
               | wording of my comment, that capitalised for emphasis
               | "IFF" (which means "if and only if").
               | 
               | I am absolutely certain a lot of people would save money
               | using VMs - or at scale bare metal.
               | 
               | IMO a lot of people are using AWS because it is a "safe"
               | choice management buy into that is not expensive in
               | context (its not a big proportion of costs).
        
             | Shank wrote:
             | > If your AWS costs are too complex for you to understand
             | you need to employ a finops person or AWS specialist to
             | handle it for you.
             | 
             | The point where you get sticker shock from AWS is often
             | significantly lower than the point where you have enough
             | money to hire in either of those roles. AWS is obviously
             | the infrastructure of choice if you plan to scale. The
             | problem is that scaling on expertise isn't instant and
             | that's where you're more likely to make a careless mistake
             | and deploy something relatively costly.
        
               | graemep wrote:
               | If you plan to scale to that extent, then why do you not
               | have the money to hire the people who can use AWS? At
               | least part time or as temporary consultants.
               | 
               | This:
               | 
               | > The point where you get sticker shock from AWS is often
               | significantly lower than the point where you have enough
               | money to hire in either of those roles
               | 
               | makes me doubt this:
               | 
               | > AWS is obviously the infrastructure of choice if you
               | plan to scale.
        
             | motorest wrote:
             | > If your AWS costs are too complex for you to understand
             | you need to employ a finops person or AWS specialist to
             | handle it for you.
             | 
             | What a baffling comment. Is it normal to even consider
             | hiring someone to figure out how you are being billed by a
             | service? You started with one problem and now you have at
             | least two? And what kind of perverse incentive are you
             | creating? Don't you think your "finops" person has a vested
             | interest in preserving their job by ensuring billing
             | complexity will always be there?
        
               | dvfjsdhgfv wrote:
               | Paradoxically you are both right. Yes, the situation
               | seems dystopian. Yes, hiring a finops person is a sound
               | advice once your cloud bill gets big enough.
        
               | motorest wrote:
               | > Yes, hiring a finops person is a sound advice once your
               | cloud bill gets big enough.
               | 
               | Is it, though? At best someone wearing that hat will
               | explain the bill you're getting. What value do you get
               | from that?
               | 
               | To cut costs, either you microoptimize things, of you
               | redesign systems to shed expenses. The former gets you
               | nothing, the latter is not something a "finops" (whatever
               | that is supposed to mean) brings to the table.
        
               | graemep wrote:
               | You need to know what to optimise which means you need to
               | know what you are spending on.
               | 
               | I did say it applies IFF and only IFF you choose to use
               | these services, and if you have chosen to use these
               | services you have presumably decided they are good value
               | for money. If not, why are they using AWS.
               | 
               | Of course the complexity and extra cost of managing the
               | billing is something that someone who has chosen to use
               | AWS has already factored in, right?
               | 
               | The alternative is to not use AWS.
        
               | quesera wrote:
               | > _IFF and only IFF_
               | 
               | If and only if and only if and only if? :)
               | 
               | (also, while on the topic, I think a simple "if" covers
               | it here, since the relationship is not bidirectional)
        
               | mulmen wrote:
               | If the cost of hiring the finops person is less than the
               | savings over operating without one then you hire one, if
               | it isn't then you don't.
        
               | SoftTalker wrote:
               | > Is it normal to even consider hiring someone to figure
               | out how you are being billed by a service?
               | 
               | Absolutely. This was common for complicated services like
               | telecom/long distance even in the pre-cloud days. Big
               | companies would have a staff or hire a service to review
               | telecom bills and make sure they weren't overpaying.
        
           | UltraSane wrote:
           | AWS pricing is actually extremely clearly specified but it is
           | hard to predict your costs unless you have a good
           | understanding of your expected usage.
        
         | scoreandmore wrote:
         | You can set billing alerts and write a lambda function to
         | respond and disable resources. Of course they don't make it
         | easy but if you don't learn how to use limits what do you
         | expect? This argument amazes me. Cloud services require some
         | degree of responsibility on the users side.
        
           | esafak wrote:
           | So you're okay with turning your site off...
        
             | mystraline wrote:
             | This a logical fallacy of false dilemma.
             | 
             | I made it clear that you ask the user to choose between
             | 'accept risk of overrun and keep running stuff', 'shut down
             | all stuff on exceeding $ number', or even a 'shut down
             | these services on exceeding number', or other possible ways
             | to limit and control costs.
             | 
             | The cloud companies do not want to permit this because they
             | would lose money over surprise billing.
        
             | verbify wrote:
             | Isn't that the definition of metered billing?
        
             | dd36 wrote:
             | Cats doing tricks has a limited budget.
        
           | gray_-_wolf wrote:
           | Last time I was looking into this, is there not up to an hour
           | of delay for the billing alerts? It did not seem possible to
           | ensure you do not run over your budget.
        
           | mystraline wrote:
           | This is complete utter hogwash.
           | 
           | Up until recently, you could hit somebody else's S3 endpoint,
           | no auth, and get 403's that would charge them 10s of
           | thousands of dollars. Coudnt even firewall it. And no way to
           | see, or anything. Number go up every 15-30 minutes in cost
           | dashboard.
           | 
           | Real responsibility is 'I have 100$ a month for cloud
           | compute'. Give me a easy way to view it, and shut down if I
           | exceed that. That's real responsibility, that Scamazon,
           | Azure, Google - none of them 'permit'.
           | 
           | They (and well, you) instead say "you can build some shitty
           | clone of the functionality we should have provided, but we
           | would make less money".
           | 
           | Oh, and your lambda job? That too costs money. It should not
           | cost more money to detect and stop stuff on 'too much cost'
           | report.
           | 
           | This should be a default feature of cloud: uncapped costs, or
           | stop services
        
             | HelloImSteven wrote:
             | Lambda has 1mil free requests per month, so there's a
             | chance it would be free depending on your usage. But still,
             | it's not straightforward at all, so I get it.
             | 
             | Perhaps requiring support for bill capping is the right way
             | to go, but honestly I don't see why providers don't compete
             | at all here. Customers would flock to any platform with
             | something like "You set a budget and uptime requirements,
             | we'll figure out what needs to be done", with some sort of
             | managed auto-adjustment and a guarantee of no overage
             | charges.
             | 
             | Ah well, one can only dream.
        
               | RussianCow wrote:
               | > but honestly I don't see why providers don't compete at
               | all here
               | 
               | Because the types of customers that make them the most
               | money don't care about any of this stuff. They'll happily
               | pay whatever AWS (or other cloud provider) charges them,
               | either because "scale" or because the decision makers
               | don't realize there are better options for them. (And
               | depending on the use case, sometimes there aren't.)
        
         | mhitza wrote:
         | > Again, don't surprise the user. But that's an anathema to
         | companies who want to hide tokens to dollars, the same way
         | gambling companies obfuscate 'corporate bux' to USD.
         | 
         | This is the exact same thing that frustrates me with GitHub's
         | AI rollout. Been trialing the new Copilot agent, and it's cost
         | is fully opaque. Multiple references to "premium requests" that
         | don't show up real-time in my dashboard, not clear how many I
         | have in total/left, and when these premium requests are
         | referenced in the UI they link to the documentation that also
         | doesn't talk about limits (instead of linking to the associated
         | billing dashboard).
        
           | llbbdd wrote:
           | Highly recommend getting the $20/month OpenAI sub and letting
           | copilot use that. Quality-wise I feel like I'm getting the
           | same results but OAIs limits are a little more sane.
        
             | debian3 wrote:
             | How do you link the openai sub to Gh copilot? I thought you
             | needed to use OpenAI api
        
             | mhitza wrote:
             | I'm talking about this new agent mode
             | https://github.blog/news-insights/product-news/github-
             | copilo... for which as far as I'm aware there's no option
             | to switch the underlying model used.
        
           | saratogacx wrote:
           | They don't make it easy to figure out but after researching
           | it for my Co. this is what I came to.                   * One
           | chat message -> one premium credit (most at 1 credit but some
           | are less and some, like opus, are 10x)         * Edit mode is
           | the same as Ask/chat         * One agent session (meaning you
           | start a new agent chat) is one "request" so you can have
           | multiple messages and they cost the credit cost of one chat
           | message.
           | 
           | Microsoft's Copilot offerings are essentially a masterclass
           | in cost opaqueness. Nothing in any offering is spelled out
           | and they always seem to be just short of the expectation they
           | are selling.
        
             | 9dev wrote:
             | But how much is one premium request in real currency, and
             | how many do I have per month?
        
         | siva7 wrote:
         | it's surprising that YC has a gazillion companies doing some ai
         | infrastructure observability product yet i have to see a
         | product that really presents me and the user easily token usage
         | and pricing estimations which for me is the #1 criteria to use
         | that. make billing and pricing for me and the user easier.
         | instead they run their heads into evals and niche features.
        
         | chrisweekly wrote:
         | GOTAHCA?
        
           | arcanemachiner wrote:
           | Maybe GOTCHA?
        
         | Spooky23 wrote:
         | Metering is great for defined processes. I love AWS because I
         | can align cost with business. In the old days it was often hard
         | and an internal political process. Some saleschick would shake
         | the assets at a director and now I'm eating the cost for some
         | network gear i don't need.
         | 
         | But for users, that fine grained cost is not good, because
         | you're forcing a user to be accountable with metrics that
         | aren't tied to their productivity. When I was an intern in the
         | 90s, I was at a company that required approval to make long
         | distance phone calls. Some bureaucrat would assess whether my
         | 20 minute phone call was justified and could charge me if my
         | monthly expense was over some limit. Not fun.
         | 
         | Flat rate is the way to go for _user ai_ , until you understand
         | the value in the business and the providers start looking for
         | margin. If I make a $40/hr analyst 20% more productive, that's
         | worth $16k of value - the $200/mo ChatGPT Pro is a steal.
        
         | ajb wrote:
         | Amazon is worse than this, though the AWS bait and switch is
         | that you are supposed to save over the alternatives. So it
         | should be worth switching if you would save more than the dev
         | time you would invest in doing so right? But your company isn't
         | going to do that. Because of opportunity cost. Your company
         | expects to get some multiple of a the cost of dev time back,
         | that they invest in their own business. And because of various
         | uncertainties - in return, in the time taken to develop, in
         | competition, etc - they will only invest dev time when that
         | multiple is not small. I'm not a business manager, but I'd
         | guess a factor of 5.
         | 
         | But that means that if you were conned into using
         | infrastructure that actually costs more than the alternative,
         | making your cost structure worse, you're still going to eat the
         | loss because it's not worth taking your devs time to switch
         | back.
         | 
         | But tokens don't quite have this problem -yet. Most of us can
         | still do development the old way, and it's not a project to
         | turn it off. Expect this to change though.
        
       | abtinf wrote:
       | Lack of proper capitalization makes the text unreadable for me.
        
         | blamestross wrote:
         | https://convertcase.net/browser-extension/
         | 
         | This extension might make the internet more accessible for you!
        
           | machomaster wrote:
           | If the writer is that lazy to press Shift and do it manually,
           | then it is him who should have used autocapitalization
           | software.
        
         | machomaster wrote:
         | You are not the only one. I really don't understand this trend
         | of wanting to share opinions, but purposefully making it
         | harder/impossible for other to read it. Might as well write
         | with dark-grey font over black background, just to make readers
         | struggle extra hard.
         | 
         | If you don't care to trivially make your text readable, then we
         | for sure don't care to spend time to struggle through your text
         | to see if there is any useful substance there.
        
       | happytoexplain wrote:
       | While reading this, every time I started a paragraph and saw a
       | lowercase, my brain and eyes were stalling or jumping up, to
       | reflexively look for the text that got cut off. My brain has been
       | trained for decades that, when reading full prose, a paragraph
       | starting with lowercase means I'm starting in the middle of a
       | sentence, and something happened in the layout or HTML to
       | interrupt it.
       | 
       | And, I know this seems dramatic, but besides being cognitively
       | distracting, it also makes me feel sad. Chatroom formatting in
       | published writings is clearly a developing trend at this point,
       | and I love my language so much. Not in a linguistic capacity -
       | I'm not an English expert or anything, nor do I follow every rule
       | - I mean in an emotional capacity.
       | 
       | I'm not trying to be condescending. This is a style choice, not
       | "bad writing" in the typical sense. I realize there is often a
       | lot of low-quality bitterness on both sides about this kind of
       | thing.
       | 
       | Edit:
       | 
       | I also fear that this is exactly the kind of thing where any
       | opinion in opposition to this style will feel like the kind of
       | attack that makes a writer want to push back in a "oh yeah? fuck
       | you" kind of way. I.e. even just my writing this opinion may give
       | an author using the style in question the desire to "double
       | down". Though this conundrum is appropriate (ironic?) - the
       | intensely personal nature of language is part of why I love it.
        
         | simianwords wrote:
         | It's to draw contrast against extremely polished and sterile
         | looking slop content. Think of it like avoiding em dash but
         | going a bit far.
        
           | rafram wrote:
           | > all content here is generated by ai
        
           | tanseydavid wrote:
           | It is lazy.
        
         | egypturnash wrote:
         | IT COULD BE WORSE, YOU COULD BE READING A LENGTHY ESSAY
         | PRESENTED ENTIRELY IN ALL CAPS WITH MINIMAL PUNCTUATION TO
         | BREAK IT UP
         | 
         | SEARCH FOR "FILM CRIT HULK" FOR SOME EXAMPLES
        
           | majewsky wrote:
           | HULK SMASH INFERENCE PRICES
        
           | braebo wrote:
           | POTUS, is that you?
        
         | benhurmarcel wrote:
         | Weirdly I'm not really bothered by the absence of capitals.
        
         | scoofy wrote:
         | My degrees all were in philosophy, focused on philosophy of
         | language.
         | 
         | Descriptive language is how language evolves, and the internet
         | is the first real regional conflict area that Americans have
         | really ever encountered without traveling.
         | 
         | History, you would have just been in your linguistic local,
         | with your own rules, and differences could easily been
         | attributed to outsiders being outsiders. The internet flattens
         | physical distance.
         | 
         | Thus we have a real parallel to the different regions of Italy,
         | where no one came understand each other, or at least the UK,
         | where different cities have extreme pronunciation differences.
         | 
         | The same exists for written language, and it will continue to
         | diverge culturally. The way I look at it is that language isn't
         | a thing, trapped in amber, but a river we are all wading
         | through. Different people enter at different times, and we all
         | subtly affect the flow.
         | 
         | I distinctly remember thinking "email" was the dumbest sounding
         | word ever. Now I don't even hear it.
         | 
         | It's still fine to nitpick, we're all battling in the
         | descriptive war for correctness. My own personal hobbyhorse is
         | how stupid American quotations syntax is, when learning at
         | graduate school in the UK that you use single quotes and leave
         | the punctuation outside of the quoted sections, which is
         | entirely sensible!
        
         | dang wrote:
         | " _Please don 't complain about tangential annoyances--e.g.
         | article or website formats, name collisions, or back-button
         | breakage. They're too common to be interesting._"
         | 
         | https://news.ycombinator.com/newsguidelines.html
        
           | happytoexplain wrote:
           | Yeah, sorry. That was probably my last comment on this trend,
           | since I think I've said all I have to say. However, I do
           | think "too common" implicitly narrows the definition of
           | "tangential annoyances" - I believe this is a new phenomenon
           | (though I understand the spirit of the rule is to not have
           | comment threads about things other than the content of the
           | submission).
        
       | strangescript wrote:
       | We haven't reached a peak on scaling/performance, so even if an
       | old model can be commoditized, a new one will be created to take
       | advantage of the newly freed infra. Until we hit a ceiling on
       | scaling, tokens are going to remain expensive relative to what
       | people are trying to do with them because the underlying compute
       | is expensive.
        
       | dcre wrote:
       | Vibes-based analysis. We have no idea how much these models cost
       | to serve.
        
       | Waterluvian wrote:
       | On the topic of cost per token, is it accurate to represent a
       | token as, ideally, a composable atomic unit of information. But
       | because we're (often) using English as the encoding format, it
       | can only be as efficient as English can encode the data.
       | 
       | Does this mean that other languages might offer better
       | information density per token? And does this mean that we could
       | invent a language that's more efficient for these purposes, and
       | something humans (perhaps only those who want a job as a prompt
       | engineer) could be taught?
       | 
       | Kevin speak good?
       | https://youtu.be/_K-L9uhsBLM?si=t3zuEAmspuvmefwz
        
         | deegles wrote:
         | Human speech has a bit rate of around 39 bits per second, no
         | matter how quickly you speak. assuming reading is similar, I
         | guess more "dense" tokens would just take longer for humans to
         | read.
         | 
         | https://www.science.org/content/article/human-speech-may-hav...
        
           | __s wrote:
           | Sure, but that link has Japanese at 5 bits per syllable &
           | Vietnamese at 8 bits per syllable, so if billing was based on
           | syllables per prompt you'd want Vietnamese prompts
           | 
           | Granted English is probably going to have better quality
           | output based on training data size
        
         | r_lee wrote:
         | Sure, for example Korean is unicode heavy, e.g. gyeongcal =
         | police, but its just 2 unicode chars. Not too familiar with how
         | things are encoded but it could be more efficient
        
         | joseda-hg wrote:
         | IIRC, in linguistics there's a hypothesis for "Uniform
         | Information density" languages seem to follow on a human level
         | (Denser languages slow down, sparse languages speed up) so you
         | might have to go for an Artificial encoding, that maps
         | effectively to english
         | 
         | English (And any of the dominant languages that you could use
         | in it's place) work significantly better than other languages
         | purely by having significantly larger bodies of work for the
         | LLM to work from
        
           | Waterluvian wrote:
           | Yeah I was wondering about it basically being a dialect or
           | the CoffeeScript of English.
           | 
           | Maybe even something anyone can read and maybe write... so...
           | Kevin English.
           | 
           | Job applications will ask for how well one can read and write
           | Kevin.
        
         | fy20 wrote:
         | English often has a lot of redundancy, you could rewrite your
         | comment to this and still have it convey the original meaning:
         | 
         | Regarding cost per token: is a token ideally a composable,
         | atomic unit of information? Since English is often used as an
         | encoding format, efficiency is limited by English's encoding
         | capacity.
         | 
         | Could other languages offer higher information density per
         | token? Could a more efficient language be invented for this
         | purpose, one teachable to humans, especially aspiring prompt
         | engineers?
         | 
         | 67 tokens vs 106 for the original.
         | 
         | Many languages don't have articles, you could probably strip
         | them from this and still understand what it's saying.
        
       | GiorgioG wrote:
       | I tried Gemini CLI and in 2 hours somehow spent $22 just messing
       | around with a very small codebase. I didn't find out until the
       | next day from Google's billing system. That was enough for me - I
       | won't touch it again.
        
         | adrianbooth17 wrote:
         | Isn't Gemini CLI free? Or did you BYOK?
        
       | ankit219 wrote:
       | Interesting article, full of speculation and some logical
       | follows, but feels like it feels short of admitting what the true
       | conclusion is. Model building companies can build thinner wrapper
       | / harness and can offer better prices than third party companies
       | (the article assumes it costs anthropic same price per token as
       | it does for their customers) because their costs per token is
       | lower than app layer companies. Anthropic has a decent margin
       | (likely higher than openai) on sale of every token, and with more
       | scale, they can sell at a lower cost (or some unlimited plans
       | with limits that keeps out 1%-5% of the power users).
       | 
       | I don't agree with the Cognition conclusion either. Enterprises
       | are fighting super hard to not have a long term buying contract
       | when they know SOTA (app or model) is different every 6 months.
       | They are keeping their switching costs low and making sure they
       | own the workflow, not the tool. This is even more prominent after
       | Slack restricted API usage for enterprise customers.
       | 
       | Making money on the infra is possible, but that again
       | misunderstands the pricing power of Anthropic. Lovable, Replit
       | etc. work because of Claude. Openai had codex, google had jules,
       | both aren't as good in terms of taste compared to Claude. It's
       | not the cli form factor which people love, it's the outcome they
       | like. When Anthropic sees the money being left on the table in
       | infra play, they will offer the same (at presumably better rates
       | given Amazon is an investor) and likely repeat this strategy.
       | Abstraction is a good play, only if you abstract it to the
       | maximum possible levels.
        
       | xrd wrote:
       | This is the moment an open source solution could pop in and say
       | just "uv add aider" and then make sure you have a 24gb card for
       | Qwen3 for each dev, and you are future proofed for at least the
       | next year. It seems like the only way out.
        
       | ej88 wrote:
       | The article just isn't that coherent for me.
       | 
       | > when a new model is released as the SOTA, 99% of the demand
       | immediately shifts over to it
       | 
       | 99% is in the wrong ballpark. Lots of users use Sonnet 4 over
       | Opus 4, despite Opus being 'more' SOTA. Lots of users use 4o over
       | o3 or Gemini over Claude. In fact it's never been a closer race
       | on who is the 'best': https://openrouter.ai/rankings
       | 
       | >switch from opus ($75/m tokens) to sonnet ($15/m) when things
       | get heavy. optimize with haiku for reading. like aws autoscaling,
       | but for brains.
       | 
       | they almost certainly built this behavior directly into the model
       | weights
       | 
       | ???
       | 
       | Overall the article seems to argue that companies are running
       | into issues with usage-based pricing due to consumers not
       | accepting or being used to usage based pricing and it's difficult
       | to be the first person to crack and switch to usage based.
       | 
       | I don't think it's as big of an issue as the author makes it out
       | to be. We've seen this play out before in cloud hosting.
       | 
       | - Lots of consumers are OK with a flat fee per month and using an
       | inferior model. 4o is objectively inferior to o3 but millions of
       | people use it (or don't know any better). The free ChatGPT is
       | even worse than 4o and the vast majority of chatgpt visitors use
       | it!
       | 
       | - Heavy users or businesses consume via API and usage based
       | pricing (see cloud). This is almost certainly profitable.
       | 
       | - Fundamentally most of these startups are B2B, not B2C
        
         | motorest wrote:
         | > In fact it's never been a closer race on who is the 'best'
         | 
         | Thank you for pointing out that fact. Sometimes it's very hard
         | to keep perspective.
         | 
         | Sometimes I use Mistral as my main LLM. I know it's not lauded
         | as the top performing LLM but the truth of the matter is that
         | it's results are just as useful as the best models that
         | ChatGPT/Gemini/Claude outputs, and it is way faster.
         | 
         | There is indeed diminished returns on the current blend of
         | commercial LLMs. Deep seek already proved that cost can be a
         | major factor and quality can even improve. I think we're very
         | close to see competition based on price, which might be the
         | reason there is so much talk about mixture of experts
         | approaches and how specialized models can drive down cost while
         | improving targeted output.
        
           | torginus wrote:
           | Yeah, my biggest problem with CC is that it's slow, prone to
           | generating tons of bullshit exposition, and often goes down
           | paths that I can tell almost immediately will yield no useful
           | result.
           | 
           | It's great if you can leave it unattended, but personally,
           | coding's an active thing for me, and watching it go is really
           | frustrating.
        
       | jsnell wrote:
       | > now look at the actual pricing history of frontier models, the
       | ones that 99% of the demand is for at any given time:
       | 
       | The meaningful frontier isn't scalar on just the capability, it's
       | on capability for a given cost. The highest capability models are
       | not where 99% of the demand is on. Actually the opposite.
       | 
       | To get an idea of what point on the frontier people prefer, have
       | a look at the OpenRouter statistics
       | (https://openrouter.ai/rankings). Claude Opus 4 has about 1% of
       | their total usage, not 99%. Claude Sonnet 4 is the single most
       | popular model at about 18%. The runners up in volume are Gemini
       | Flash 2.0 and 2.5, which are in turn significantly cheaper than
       | Sonnet 4.
        
       | codr7 wrote:
       | This is such a nice setup!
       | 
       | They can deliver pretty much whatever they feel like. Who can
       | tell a trash token from an hallucination? And tracking token
       | usage is a pita.
       | 
       | Sum it up and it translates to: sell whatever you feel like at
       | whatever price you feel like.
       | 
       | Nice!
        
       | sshine wrote:
       | > _nobody opens claude and thinks, "you know what? let me use the
       | shitty version"_
       | 
       | Sure I do!
       | 
       | I will consistently pick the fastest and cheapest model that will
       | do the job.
       | 
       | Sonnet > Opus when coding
       | 
       | Haiku > Sonnet when fusing kitchen recipes, or answering
       | questions where search results deliver the bulk of the value, and
       | the LLM part is really just for summarizing.
        
       | esafak wrote:
       | This is wrong. People are not dropping old models when new ones
       | come out. I'm always on the lookout for cost effective models.
       | The logical thing is to use the cheapest model that gets the job
       | done, and you get a sense for that once you the model for a
       | while.
       | 
       | It is standard practice with some coding agents to have different
       | models for different tasks, like building and planning.
        
       | farkin88 wrote:
       | Even though tokens are getting cheaper, I think the real killer
       | of "unlimited" LLM plans isn't token costs themselves, it's the
       | shape of the usage curve that's unsustainable. These products see
       | a Zipf-like distribution: thousands of casual users nibble a few-
       | hundred tokens a day while a tiny group of power automations
       | devour tens of millions. Flat pricing works fine until one of
       | those whales drops a repo-wide refactor or a 100 MB PDF into chat
       | and instantly torpedoes the margin. Unless vendors turn those
       | extreme loops into cheaper, purpose-built primitives (search,
       | static analyzers, local quantized models, etc.), every "all-you-
       | can-eat" AI subscription is just a slow-motion implosion waiting
       | for its next whale.
        
       | blotfaba wrote:
       | We're not going to be using tokens forever, and inevitably
       | specialized hardware will solve this bottleneck. Underestimating
       | how much proprietary advancements are loaded into Google TPUs is
       | sort of like thinking the best we've got are Acura TSXs when
       | somebody's driving around in a Ferrari.
        
       | jstummbillig wrote:
       | This is silly? The important metric is value per token, which is
       | obviously increasing, and thus the relative token is getting
       | cheaper because you need far less of them to produce anything of
       | value.
       | 
       | Which then might lead to you using a lot more, because it offsets
       | some other thing that costs even more still, like your time.
        
         | acedTrex wrote:
         | "Which is obviously increasing"
         | 
         | With the primary advancements over the past two years being
         | Chain Of Thought which absolutely obliterates token counts in
         | what world would the "per token" value of a model be going
         | up...
        
           | jstummbillig wrote:
           | If you are able to cogently explain how you would instruct
           | GPT 3.5 with ANY amount of tokens to do what Sonnet 4 is able
           | to do, I am sure there's a lot of wealthy people that would
           | be very interested in having a talk with you.
        
       | _0ffh wrote:
       | Dunno, I'm happy to pay for API access based on token usage. I'd
       | never so much as look at flat pricing, but maybe that's just me.
        
       | torginus wrote:
       | First of all, do they shoot you in San Francisco, if you use
       | capital letters and punctuation?
       | 
       | Second, why are SV people obsessed with fake exponentials? It's
       | very clear that AI progress has only been exponential in the
       | sense that people are throwing a lot more resources at AI then
       | they did a couple years ago.
        
         | 369548684892826 wrote:
         | > First of all, do they shoot you in San Francisco, if you use
         | capital letters and punctuation?
         | 
         | Is it done like this just to show it wasn't written by a LLM?
        
         | luqtas wrote:
         | oh no! i can't deal with the natural morphing a lingua-franca
         | has! /j
         | 
         | Thou needst to live in the archaic.
        
       | mensetmanusman wrote:
       | They will soon be subsidized by ads or people will run their own.
        
       ___________________________________________________________________
       (page generated 2025-08-03 23:00 UTC)