[HN Gopher] Tokens are getting more expensive
___________________________________________________________________
Tokens are getting more expensive
Author : admp
Score : 178 points
Date : 2025-08-03 11:01 UTC (11 hours ago)
(HTM) web link (ethanding.substack.com)
(TXT) w3m dump (ethanding.substack.com)
| michaelbuckbee wrote:
| A major current problem is that we're smashing gnats with
| sledgehammers via undifferentiated model use.
|
| Not every problem needs a SOTA generalist model, and as we get
| systems/services that are more "bundles" of different models with
| specific purposes I think we will see better usage graphs.
| mustyoshi wrote:
| Yeah this is the thing people miss a lot. 7,32b models work
| perfectly fine for a lot of things, and run on previously high
| end consumer hardware.
|
| But we're still in the hype phase, people will come to their
| senses once the large model performance starts to plateau
| _heimdall wrote:
| I expect people to come to their senses when LLM companies
| stop subsidizing cost and start charging customers what it
| actually costs them to train and run these models.
| simonjgreen wrote:
| Completely agree. It's worth spending time to experiment too. A
| reasonably simple chat support system I build recently uses 5
| different models dependent on the function it it's in. Swapping
| out different models for different things makes a huge
| difference to cost, user experience, and quality.
| alecco wrote:
| If there was an option to have Claude Opus guide Sonnet I'd use
| it for most interactions. Doing it manually is a hassle and
| breaks the flow, so I end up using Opus too often.
|
| This shouldn't be that expensive even for large prompts since
| input is cheaper due to parallel processing.
| isoprophlex wrote:
| You can define subagents that are forced to run on eg.
| Sonnet, and call these from your main Opus backed agent.
| /agent in CC for more info...
| danielbln wrote:
| That's what I do. I used to use Opus for the dumbest stuff,
| writing commits and such, but now that' all subagent
| business that run on Sonnet (or even Haiku sometimes). Same
| for running tests, executing services, docker etc. All
| Sonnet subagents. Positive side effect: my Opus allotment
| lasts a lot longer.
| illusive4080 wrote:
| I'm just sitting here on my $20 subscription hoping one
| day we will get to use Opus
| nateburke wrote:
| generalist = fungible?
|
| In the food industry is it more profitable to sell whole cakes
| or just the sweetener?
|
| The article makes a great point about replit and legacy ERP
| systems. The generative in generative AI will not replace
| storage, storage is where the margins live.
|
| Unless the C in CRUD can eventually replace the R and U, with
| the D a no-op.
| marcosdumay wrote:
| > In the food industry is it more profitable to sell whole
| cakes or just the sweetener?
|
| I really don't understand where you are trying to get. But on
| that example, cakes have a higher profit margin, and
| sweeteners have larger scale.
| empiko wrote:
| Yeah, but the juiciest tasks are still far from solved. The
| amount of tasks where people are willing to accept low accuracy
| answers is not that high. It is maybe true for some text
| processing pipelines, but all the user facing use cases require
| good performance.
| comrade1234 wrote:
| I'm kind of curious what IntelliJ's deal is with the different
| providers. I usually just keep it set to Claude but there are
| others that you can pick. I don't pay extra for the AI assistant
| - it's part of my regular subscription. I don't think I use the
| AI features as heavily as many others, but it does feed my code
| base to whoever I'm set to...
| louthy wrote:
| Are you sure you don't pay extra? I'm on Rider and it's an
| additional cost. Unless us C# and F# devs are subsidising
| everyone else :D
|
| Edit: It says on the Jetbrains website:
|
| "The AI Assistant plugin is not bundled and is not enabled in
| IntelliJ IDEA by default. AI Assistant will not be active and
| will not have access to your code unless you install the
| plugin, acquire a JetBrains AI Service license and give your
| explicit consent to JetBrains AI Terms of Service and JetBrains
| AI Acceptable Use Policy while installing the plugin."
| comrade1234 wrote:
| When they first added the assistant it was $100/yr to enable
| it. However, it's now part of the subscription and they even
| reimbursed me a portion of the $100 that I paid.
| terminalbraid wrote:
| You're one of the lucky ones. They just outright stole from
| many of the people who did pay for it.
| double051 wrote:
| If you pay for the all products subscription, their AI
| features are now bundled in. I believe that may be a
| relatively recent change, and I would not have known about it
| if I hadn't been curious and checked.
| louthy wrote:
| I've just checked. I have the 'dotUltimate' bundle, which
| now appears to include 'AI Pro'.
|
| They didn't cancel my existing 'AI Pro' subscription
| though, and have just let it keep running with no refunds.
|
| Thanks, Jetbrains. You get worse every day.
| terminalbraid wrote:
| Considering they didn't significantly change their pricing when
| they bundled the equivalent of a ~$10-20/mo subscription to
| their Ultimate pack (which I pay something around $180/year
| for), I'm guessing they're eating a lot of the cost out of
| desperation for an imagined problem. That or they were fleecing
| everyone from the beginning.
| ath3nd wrote:
| Mathematics are not relevant when we have hype and vibes. We
| can't have facts and projections and no path to profitability
| distract us from our final goal.
|
| Which, of course, is to donate money to Sama so he can create AGI
| and be less lonely with his robotic girlfriend, I mean...change
| the world for the better somehow. /s
| NitpickLawyer wrote:
| I get your point but I think it's debatable. As long as the
| capabilities increase (and they have, IMO) cost isn't really
| relevant. If you can reasonably solve problems of a given
| difficulty (and we're starting to see that), then suddenly you
| can do stuff that you simply can't with humans. You can "hire"
| 100 agents / servers / API bundles, whatever and "solve" all
| tasks with difficulty x in your business. Then you cancel and
| your bottom is suddenly raised. You can't do that with humans.
| You can't suddenly hire 100 entry-level SWEs and fire them
| after 3 months.
|
| Then you can think about automated labs. If things pan out, we
| can have the same thing in chemistry/bio/physics. Having
| automated labs definitely seems closer now than 2.5 years ago.
| Is cost relevant when you can have a lab test formulas
| 24/7/365? Is cost a blocker when you can have a cure to
| cancer_type_a? And then _b_c...etc?
|
| Also, remember that costs go down within a few generations.
| There's no reason to think this will stop.
| ath3nd wrote:
| > You can "hire" 100 agents / servers / API bundles, whatever
| and "solve" all tasks with difficulty x in your business.
|
| In that bright AGI future, who does my business serve, like
| who actually are my actual paying clients? Like, the robots
| are farming, the robots are driving, the robots are
| "creating" and robots are "thinking", right? In that awesome
| future, what paid jobs do us humans have, so my clients can
| afford my amazing entrepreneurial business that I just
| bootstrapped with the help of 100s of agents? And how did I
| get the money to hire those 100s of agents in the first
| place?
|
| > Is cost a blocker when you can have a cure to
| cancer_type_a? And then _b_c...etc?
|
| Yes, it very much is. The fact that even known and long
| discovered solutions like insulin for diabetes management are
| being sold to people at 9x its actual price should speak to
| you volumes that while it's great to have cures for X, Y and
| Z, it's the control over the production and development of
| these cures that is equally, if not much more important for
| the cure to actually reach people. In this rosy world of
| yours, do you think Zuck will give you his LLAMAGI-generated
| cancer cure out of the goodness of his heart? We are talking
| about the same dude that helped a couple of genocides and
| added ads in Whatsapp to squeeze the last cent of the people
| who are trapped with an app that gets progressively worse and
| more invasive.
|
| https://www.rand.org/news/press/2024/02/01/index1.html
|
| https://systemicjustice.org/article/facebook-and-genocide-
| ho...
|
| > Also, remember that costs go down within a few generations.
| There's no reason to think this will stop.
|
| The destruction of the natural world, the fires all around
| us, the rise of fascism and nationalism, the wars that are
| spawning all over the place and the fact that white and blue
| collar jobs are being automated out while soil erosion and
| PFAS make our land infertile point to a different future. But
| yeah, I am simply ecstatic at the possibility that the costs
| of generating a funny picture Ghibli style with a witty
| caption could go down by 10 to 30%.
| flyinglizard wrote:
| The truth is we're brute forcing some problems via tremendous
| amount of compute. Especially for apps that use AI backends
| (rather than chats where you interface with the LLM directly),
| there needs to be hybridization. I haven't used Claude Code
| myself but I did a screenshare session with someone who does and
| I think I saw it running old fashioned keyword search on the
| codebase. That's much more effective than just pushing more and
| more raw data into the chat context.
|
| On one of the systems I'm developing I'm using LLMs to compile
| user intents to a DSL, without every looking at the real data to
| be examined. There are ways; increased context length is bad for
| speed, cost and scalability.
| mark_l_watson wrote:
| I have already thought a lot about the large packaged inference
| companies hitting a financial brick wall, but I was surprised by
| material near the end of the article: the discussions of lock in
| for companies that can't switch and about Replit making money on
| the whole stack. Really interesting.
|
| I managed a deep learning team at Capital One and the lock-in
| thing is real. Replit is an interesting case study for me because
| after a one week free agent trial I signed up for a one year
| subscription, had fun the their agent LLM-based coding assistant
| for a few weeks, and almost never used their coding agent after
| that, but I still have fun with Replit as an easy way to spin up
| Nix based coding environments. Replit seems to offer something
| for everyone.
| raincole wrote:
| First of all the title is click-bait. Tokens are getting cheaper
| and cheaper. People just use more and more tokens.
|
| And everything, I mean everything after the title is only a
| downhill:
|
| > saying "this car is so much cheaper now!" while pointing at a
| 1995 honda civic misses the point. sure, that specific car is
| cheaper. but the 2025 toyota camry MSRPs at $30K.
|
| Cars got cheaper. The only reason you don't feel it is trade
| barrier that stops BYD from flooding your local dealers.
|
| > charge 10x the price point > $200/month when cursor charges
| $20. start with more buffer before the bleeding begins.
|
| What does this even mean? The cheapest Cursor plan is $20, just
| like Claude Code. And the most expensive Cursor plan is $200,
| just like Claude Code. So clearly they're at the _exact_ same
| price point.
|
| > switch from opus ($75/m tokens) to sonnet ($15/m) when things
| get heavy. optimize with haiku for reading. like aws autoscaling,
| but for brains.
|
| > they almost certainly built this behavior directly into the
| model weights, which is a paradigm shift we'll probably see a lot
| more of
|
| "I don't know how Claude built their models and I have no insider
| knowledge, but I have very strong opinions."
|
| > 3. offload processing to user machines
|
| What?
|
| > ten. billion. tokens. that's 12,500 copies of war and peace. in
| a month.
|
| Unironically quoting data from viberank leaderboard, which is
| just user-submitted number...
|
| > it's that there is no flat subscription price that works in
| this new world.
|
| The author doesn't know what throttling is...?
|
| I've stopped reading here. I should've just closed the tab when I
| saw the first letter in each sentence isn't capitalized. This is
| so far the most glaring signal of slop. More than the overuse of
| em-dash and lists.
| WA wrote:
| All good points, but:
|
| > _when I saw the first letter in each sentence isn 't
| capitalized. This is so far the most glaring signal of slop._
|
| How so? It's the exact opposite imho. Lowercase everything with
| a staccato writing style to differentiate from AI slop, because
| LLMs usually don't write lowercase.
| lelanthran wrote:
| I think GP is drawing a distinction between "slop" and "AI
| slop".
|
| This comes across as sloppily written, but not sloppily
| generated.
| Semaphor wrote:
| Human slop instead of AI. Our race is catching up to the
| machines again.
| ankit219 wrote:
| Likely op does not mean ai slop, but more a signal of human
| carelessness that they could not write it in a proper manner.
| djhworld wrote:
| Over the past year or two I've just been paying for the API
| access and using open source frontends like LibreChat to access
| these models.
|
| This has been working great for the occasional use, I'd probably
| top up my account by $10 every few months. I figured the amount
| of tokens I use is vastly smaller than the packaged plans so it
| made sense to go with the cheaper, pay-as-you-go approach.
|
| But since I've started dabbling in tooling like Claude Code, hoo-
| boy those tokens burn _fast_, like really fast. Yesterday I
| somehow burned through $5 of tokens in the space of about 15
| minutes. I mean, sure, the Code tool is vastly different to
| asking an LLM about a certain topic, but I wasn't expecting such
| a huge leap, a lot of the token usage is masked from you I guess
| wrapped up in the ever increasing context + back/forth tool
| orchestration, but still
| TechDebtDevin wrote:
| $20.00 via Deepseek's api (Yes China, can have my code idc),
| has lasted me almost a year. Its slow, but better quality
| output than any of the independently hosted Deepseek models
| (ime). I don't really use agents or anything tho.
| zurfer wrote:
| The simple reason for this is that Claude Code uses way more
| context and repetitions than what you would use in a typical
| chat.
| senko wrote:
| Insisting on flaunting English spelling rules (by not starting a
| sentence with a capital letter) in a think piece is a dead
| giveaway that the author thinks too highly of themselves, and
| results in me automatically discounting whatever they're saying.
|
| If I (and billions others) can be bothered to learn your damn
| language so we can all communicate, do us a service and actually
| use it properly, FFS.
| SoftTalker wrote:
| Well his name is Ethan so...
| furyofantares wrote:
| > claude code has had to roll back their original unlimited
| $200/mo tier this week
|
| The article repeats this throughout but isn't it a straight lie?
| The plan was named 20x because it's 20x usage limits, it always
| had enforced 5 hour session limits, it always had (unenforced?
| soft?) 50 session per month limits.
|
| It was limited, but not enough and very very probably still
| isn't, judging by my own usage. So I don't think the argument
| would even suffer from telling the truth.
| Aurornis wrote:
| You're right, the Max plan was never advertised as unlimited.
|
| I can't believe how many comments and articles I've read that
| assume it was unlimited.
|
| It's like it has been repeated so many times that it's assumed
| to be true.
| robertclaus wrote:
| My team is debating this exact question for a new product we have
| in early access. Ultimately we realized the issue early on, so
| even our plans option would include at-cost usage limits.
| ysofunny wrote:
| and the AIs stupider!
|
| I am seeing problems with formatting that seemed 'solved'
| already.
|
| I mean, I have seen "the same" model get better and worse
| already.
|
| clearly somebody is calibrating the stupidity level relative to
| energy cost and monetary gain
| Havoc wrote:
| The combination of "thinking models" plus the blind focus on
| incremental benchmarking gains was a mistake for practical use.
|
| You definitely want that for some tasks, but for the majority of
| tasks there is a lot of space for cheap & cheerful (and non-
| thinking)
| mystraline wrote:
| From the article:
|
| > consumers hate metered billing. they'd rather overpay for
| unlimited than get surprised by a bill.
|
| Yes and no.
|
| Take Amazon. You think your costs are known and WHAMMO surprise
| bill. Why do you get a surprise bill? Because you cannot say
| 'Turn shit off at X money per month'. Can't do it. Not an option.
|
| All of these 'Surprise Net 30' offerings are the same. You think
| you're getting a stable price until GOTAHCA.
|
| Now, metered billing can actually be good, when the user knows
| exactly where they stand on the metering AND can set maximums so
| their budget doesn't go over.
|
| Taken realistically, as an AI company, you provide a 'used
| tokens/total tokens' bar graph, tokens per response, and
| estimated amount of responses before exceeding.
|
| Again, don't surprise the user. But that's an anathema to
| companies who want to hide tokens to dollars, the same way
| gambling companies obfuscate 'corporate bux' to USD.
| ikari_pl wrote:
| I often find Amazon pricing to be vague and cryptic, sometimes
| there's literally no way to tell ehy, for example, your
| database cost is fluctuating all the time
| joseda-hg wrote:
| Amazon pricing is nice if you compare it to Azure...
| crinkly wrote:
| Yeah that. We moved to AWS using their best practices and
| enterprise cost estimation stuff and got a 6x cost increment
| on something that was supposed to be cheaper and now we're
| fucked because we can't get out.
|
| It's nearly impossible to tell what the hell is going where
| and we are mostly surviving on enterprise discounts from
| negotiations.
|
| The worst thing is they worked out you can blend costs in
| using AWS marketplace without having to raise due diligence
| on a new vendor or PO. So up it goes even more.
|
| Not my department or funeral fortunately. Our AWS account is
| about $15 a month.
| ajb wrote:
| Are you using separate accounts per use case? That's the
| only real way to get a cost breakdown, otherwise you have
| no idea what piece of infrastructure is for what. They
| provide a tagging system but it's only informative if
| someone spends several hours a month tracking down the
| stuff that didn't get tagged properly.
| crinkly wrote:
| Yeah we have many accounts. Hence why I know ours cheap.
| Difficult to break it down within the account as you say
| without tag maintenance.
| AtheistOfFail wrote:
| > The worst thing is they worked out you can blend costs in
| using AWS marketplace without having to raise due diligence
| on a new vendor or PO. So up it goes even more.
|
| Not a bug, a feature.
| graemep wrote:
| If your AWS costs are too complex for you to understand you
| need to employ a finops person or AWS specialist to handle it
| for you.
|
| I am not saying this is desirable, but it is necessary IFF
| you chose to use these services. They are complex by design,
| and intended primarily for large scale users who do have the
| expertise to handle the complexity.
| ajsnigrutin wrote:
| But they're also simple and cheap if you're a "one man
| band" trying out some personal idea that might or might not
| take off. Those people have no budgets for specialists.
|
| Pricing schemes like these just make them move back to
| virtual machines with "unlimited" shared cpu usage and
| setting up services (db,...) manually.
| mort96 wrote:
| I'm 100% on team "just rent VMs and run the software on
| there". It's not that hard, it has predictable price and
| performance, and you don't lock yourself into one
| provider. If you build your whole service on top of some
| weird Amazons -specific thing, and Amazon jacks up their
| prices, you don't have any recourse. With VMs, you can
| just spin up new VMs with another provider.
|
| You could also have potential customers who would be
| interested in your solution, but don't want it hosted by
| an American company. Spinning up a few Hetzner VMs is
| easy. Finding European alternatives to all the different
| "serverless" services Amazon offers is hard.
| graemep wrote:
| > You could also have potential customers who would be
| interested in your solution, but don't want it hosted by
| an American company.
|
| Not happened yet. The nearest I have come to it was a
| requirement that certain medical information stays in the
| UK, and that is satisfied by using AWS (or other American
| suppliers) as long as its hosted in the UK.
| mort96 wrote:
| I've worked in places where customers (especially
| municipalities in Germany) have questioned the use of
| American hosting providers. I don't know whether it has
| actually _prevented_ a deal from going through (I wasn 't
| close enough to sales to know), but it was consistently
| an obstacle in some markets. This is despite everything
| being hosted in EU datacenters.
| graemep wrote:
| Yes, definitely.
|
| Most small business I have dealt with use AWS do just
| need a VPS. If they are willing to move to a scary
| unknown supplier I suggest (unknown to them, very often
| one that would be well known to people on HN) then I
| suggest AWS Lightsail which is pretty much a normal VPS
| with VPS pricing - it significantly cheaper than an
| instance plus storage, just from buying them bundled
| (which, to be fair to Amazon, is common practice).
|
| My own stuff goes on VPSs.
| lelanthran wrote:
| > If your AWS costs are too complex for you to understand
| you need to employ a finops person or AWS specialist to
| handle it for you
|
| At that point wouldn't it simply be cheaper to do VMs?
| graemep wrote:
| Yes, very likely, but then why are you using AWS at all?
|
| I think a lot of people are missing a key part of the
| wording of my comment, that capitalised for emphasis
| "IFF" (which means "if and only if").
|
| I am absolutely certain a lot of people would save money
| using VMs - or at scale bare metal.
|
| IMO a lot of people are using AWS because it is a "safe"
| choice management buy into that is not expensive in
| context (its not a big proportion of costs).
| Shank wrote:
| > If your AWS costs are too complex for you to understand
| you need to employ a finops person or AWS specialist to
| handle it for you.
|
| The point where you get sticker shock from AWS is often
| significantly lower than the point where you have enough
| money to hire in either of those roles. AWS is obviously
| the infrastructure of choice if you plan to scale. The
| problem is that scaling on expertise isn't instant and
| that's where you're more likely to make a careless mistake
| and deploy something relatively costly.
| graemep wrote:
| If you plan to scale to that extent, then why do you not
| have the money to hire the people who can use AWS? At
| least part time or as temporary consultants.
|
| This:
|
| > The point where you get sticker shock from AWS is often
| significantly lower than the point where you have enough
| money to hire in either of those roles
|
| makes me doubt this:
|
| > AWS is obviously the infrastructure of choice if you
| plan to scale.
| motorest wrote:
| > If your AWS costs are too complex for you to understand
| you need to employ a finops person or AWS specialist to
| handle it for you.
|
| What a baffling comment. Is it normal to even consider
| hiring someone to figure out how you are being billed by a
| service? You started with one problem and now you have at
| least two? And what kind of perverse incentive are you
| creating? Don't you think your "finops" person has a vested
| interest in preserving their job by ensuring billing
| complexity will always be there?
| dvfjsdhgfv wrote:
| Paradoxically you are both right. Yes, the situation
| seems dystopian. Yes, hiring a finops person is a sound
| advice once your cloud bill gets big enough.
| motorest wrote:
| > Yes, hiring a finops person is a sound advice once your
| cloud bill gets big enough.
|
| Is it, though? At best someone wearing that hat will
| explain the bill you're getting. What value do you get
| from that?
|
| To cut costs, either you microoptimize things, of you
| redesign systems to shed expenses. The former gets you
| nothing, the latter is not something a "finops" (whatever
| that is supposed to mean) brings to the table.
| graemep wrote:
| You need to know what to optimise which means you need to
| know what you are spending on.
|
| I did say it applies IFF and only IFF you choose to use
| these services, and if you have chosen to use these
| services you have presumably decided they are good value
| for money. If not, why are they using AWS.
|
| Of course the complexity and extra cost of managing the
| billing is something that someone who has chosen to use
| AWS has already factored in, right?
|
| The alternative is to not use AWS.
| quesera wrote:
| > _IFF and only IFF_
|
| If and only if and only if and only if? :)
|
| (also, while on the topic, I think a simple "if" covers
| it here, since the relationship is not bidirectional)
| mulmen wrote:
| If the cost of hiring the finops person is less than the
| savings over operating without one then you hire one, if
| it isn't then you don't.
| SoftTalker wrote:
| > Is it normal to even consider hiring someone to figure
| out how you are being billed by a service?
|
| Absolutely. This was common for complicated services like
| telecom/long distance even in the pre-cloud days. Big
| companies would have a staff or hire a service to review
| telecom bills and make sure they weren't overpaying.
| UltraSane wrote:
| AWS pricing is actually extremely clearly specified but it is
| hard to predict your costs unless you have a good
| understanding of your expected usage.
| scoreandmore wrote:
| You can set billing alerts and write a lambda function to
| respond and disable resources. Of course they don't make it
| easy but if you don't learn how to use limits what do you
| expect? This argument amazes me. Cloud services require some
| degree of responsibility on the users side.
| esafak wrote:
| So you're okay with turning your site off...
| mystraline wrote:
| This a logical fallacy of false dilemma.
|
| I made it clear that you ask the user to choose between
| 'accept risk of overrun and keep running stuff', 'shut down
| all stuff on exceeding $ number', or even a 'shut down
| these services on exceeding number', or other possible ways
| to limit and control costs.
|
| The cloud companies do not want to permit this because they
| would lose money over surprise billing.
| verbify wrote:
| Isn't that the definition of metered billing?
| dd36 wrote:
| Cats doing tricks has a limited budget.
| gray_-_wolf wrote:
| Last time I was looking into this, is there not up to an hour
| of delay for the billing alerts? It did not seem possible to
| ensure you do not run over your budget.
| mystraline wrote:
| This is complete utter hogwash.
|
| Up until recently, you could hit somebody else's S3 endpoint,
| no auth, and get 403's that would charge them 10s of
| thousands of dollars. Coudnt even firewall it. And no way to
| see, or anything. Number go up every 15-30 minutes in cost
| dashboard.
|
| Real responsibility is 'I have 100$ a month for cloud
| compute'. Give me a easy way to view it, and shut down if I
| exceed that. That's real responsibility, that Scamazon,
| Azure, Google - none of them 'permit'.
|
| They (and well, you) instead say "you can build some shitty
| clone of the functionality we should have provided, but we
| would make less money".
|
| Oh, and your lambda job? That too costs money. It should not
| cost more money to detect and stop stuff on 'too much cost'
| report.
|
| This should be a default feature of cloud: uncapped costs, or
| stop services
| HelloImSteven wrote:
| Lambda has 1mil free requests per month, so there's a
| chance it would be free depending on your usage. But still,
| it's not straightforward at all, so I get it.
|
| Perhaps requiring support for bill capping is the right way
| to go, but honestly I don't see why providers don't compete
| at all here. Customers would flock to any platform with
| something like "You set a budget and uptime requirements,
| we'll figure out what needs to be done", with some sort of
| managed auto-adjustment and a guarantee of no overage
| charges.
|
| Ah well, one can only dream.
| RussianCow wrote:
| > but honestly I don't see why providers don't compete at
| all here
|
| Because the types of customers that make them the most
| money don't care about any of this stuff. They'll happily
| pay whatever AWS (or other cloud provider) charges them,
| either because "scale" or because the decision makers
| don't realize there are better options for them. (And
| depending on the use case, sometimes there aren't.)
| mhitza wrote:
| > Again, don't surprise the user. But that's an anathema to
| companies who want to hide tokens to dollars, the same way
| gambling companies obfuscate 'corporate bux' to USD.
|
| This is the exact same thing that frustrates me with GitHub's
| AI rollout. Been trialing the new Copilot agent, and it's cost
| is fully opaque. Multiple references to "premium requests" that
| don't show up real-time in my dashboard, not clear how many I
| have in total/left, and when these premium requests are
| referenced in the UI they link to the documentation that also
| doesn't talk about limits (instead of linking to the associated
| billing dashboard).
| llbbdd wrote:
| Highly recommend getting the $20/month OpenAI sub and letting
| copilot use that. Quality-wise I feel like I'm getting the
| same results but OAIs limits are a little more sane.
| debian3 wrote:
| How do you link the openai sub to Gh copilot? I thought you
| needed to use OpenAI api
| mhitza wrote:
| I'm talking about this new agent mode
| https://github.blog/news-insights/product-news/github-
| copilo... for which as far as I'm aware there's no option
| to switch the underlying model used.
| saratogacx wrote:
| They don't make it easy to figure out but after researching
| it for my Co. this is what I came to. * One
| chat message -> one premium credit (most at 1 credit but some
| are less and some, like opus, are 10x) * Edit mode is
| the same as Ask/chat * One agent session (meaning you
| start a new agent chat) is one "request" so you can have
| multiple messages and they cost the credit cost of one chat
| message.
|
| Microsoft's Copilot offerings are essentially a masterclass
| in cost opaqueness. Nothing in any offering is spelled out
| and they always seem to be just short of the expectation they
| are selling.
| 9dev wrote:
| But how much is one premium request in real currency, and
| how many do I have per month?
| siva7 wrote:
| it's surprising that YC has a gazillion companies doing some ai
| infrastructure observability product yet i have to see a
| product that really presents me and the user easily token usage
| and pricing estimations which for me is the #1 criteria to use
| that. make billing and pricing for me and the user easier.
| instead they run their heads into evals and niche features.
| chrisweekly wrote:
| GOTAHCA?
| arcanemachiner wrote:
| Maybe GOTCHA?
| Spooky23 wrote:
| Metering is great for defined processes. I love AWS because I
| can align cost with business. In the old days it was often hard
| and an internal political process. Some saleschick would shake
| the assets at a director and now I'm eating the cost for some
| network gear i don't need.
|
| But for users, that fine grained cost is not good, because
| you're forcing a user to be accountable with metrics that
| aren't tied to their productivity. When I was an intern in the
| 90s, I was at a company that required approval to make long
| distance phone calls. Some bureaucrat would assess whether my
| 20 minute phone call was justified and could charge me if my
| monthly expense was over some limit. Not fun.
|
| Flat rate is the way to go for _user ai_ , until you understand
| the value in the business and the providers start looking for
| margin. If I make a $40/hr analyst 20% more productive, that's
| worth $16k of value - the $200/mo ChatGPT Pro is a steal.
| ajb wrote:
| Amazon is worse than this, though the AWS bait and switch is
| that you are supposed to save over the alternatives. So it
| should be worth switching if you would save more than the dev
| time you would invest in doing so right? But your company isn't
| going to do that. Because of opportunity cost. Your company
| expects to get some multiple of a the cost of dev time back,
| that they invest in their own business. And because of various
| uncertainties - in return, in the time taken to develop, in
| competition, etc - they will only invest dev time when that
| multiple is not small. I'm not a business manager, but I'd
| guess a factor of 5.
|
| But that means that if you were conned into using
| infrastructure that actually costs more than the alternative,
| making your cost structure worse, you're still going to eat the
| loss because it's not worth taking your devs time to switch
| back.
|
| But tokens don't quite have this problem -yet. Most of us can
| still do development the old way, and it's not a project to
| turn it off. Expect this to change though.
| abtinf wrote:
| Lack of proper capitalization makes the text unreadable for me.
| blamestross wrote:
| https://convertcase.net/browser-extension/
|
| This extension might make the internet more accessible for you!
| machomaster wrote:
| If the writer is that lazy to press Shift and do it manually,
| then it is him who should have used autocapitalization
| software.
| machomaster wrote:
| You are not the only one. I really don't understand this trend
| of wanting to share opinions, but purposefully making it
| harder/impossible for other to read it. Might as well write
| with dark-grey font over black background, just to make readers
| struggle extra hard.
|
| If you don't care to trivially make your text readable, then we
| for sure don't care to spend time to struggle through your text
| to see if there is any useful substance there.
| happytoexplain wrote:
| While reading this, every time I started a paragraph and saw a
| lowercase, my brain and eyes were stalling or jumping up, to
| reflexively look for the text that got cut off. My brain has been
| trained for decades that, when reading full prose, a paragraph
| starting with lowercase means I'm starting in the middle of a
| sentence, and something happened in the layout or HTML to
| interrupt it.
|
| And, I know this seems dramatic, but besides being cognitively
| distracting, it also makes me feel sad. Chatroom formatting in
| published writings is clearly a developing trend at this point,
| and I love my language so much. Not in a linguistic capacity -
| I'm not an English expert or anything, nor do I follow every rule
| - I mean in an emotional capacity.
|
| I'm not trying to be condescending. This is a style choice, not
| "bad writing" in the typical sense. I realize there is often a
| lot of low-quality bitterness on both sides about this kind of
| thing.
|
| Edit:
|
| I also fear that this is exactly the kind of thing where any
| opinion in opposition to this style will feel like the kind of
| attack that makes a writer want to push back in a "oh yeah? fuck
| you" kind of way. I.e. even just my writing this opinion may give
| an author using the style in question the desire to "double
| down". Though this conundrum is appropriate (ironic?) - the
| intensely personal nature of language is part of why I love it.
| simianwords wrote:
| It's to draw contrast against extremely polished and sterile
| looking slop content. Think of it like avoiding em dash but
| going a bit far.
| rafram wrote:
| > all content here is generated by ai
| tanseydavid wrote:
| It is lazy.
| egypturnash wrote:
| IT COULD BE WORSE, YOU COULD BE READING A LENGTHY ESSAY
| PRESENTED ENTIRELY IN ALL CAPS WITH MINIMAL PUNCTUATION TO
| BREAK IT UP
|
| SEARCH FOR "FILM CRIT HULK" FOR SOME EXAMPLES
| majewsky wrote:
| HULK SMASH INFERENCE PRICES
| braebo wrote:
| POTUS, is that you?
| benhurmarcel wrote:
| Weirdly I'm not really bothered by the absence of capitals.
| scoofy wrote:
| My degrees all were in philosophy, focused on philosophy of
| language.
|
| Descriptive language is how language evolves, and the internet
| is the first real regional conflict area that Americans have
| really ever encountered without traveling.
|
| History, you would have just been in your linguistic local,
| with your own rules, and differences could easily been
| attributed to outsiders being outsiders. The internet flattens
| physical distance.
|
| Thus we have a real parallel to the different regions of Italy,
| where no one came understand each other, or at least the UK,
| where different cities have extreme pronunciation differences.
|
| The same exists for written language, and it will continue to
| diverge culturally. The way I look at it is that language isn't
| a thing, trapped in amber, but a river we are all wading
| through. Different people enter at different times, and we all
| subtly affect the flow.
|
| I distinctly remember thinking "email" was the dumbest sounding
| word ever. Now I don't even hear it.
|
| It's still fine to nitpick, we're all battling in the
| descriptive war for correctness. My own personal hobbyhorse is
| how stupid American quotations syntax is, when learning at
| graduate school in the UK that you use single quotes and leave
| the punctuation outside of the quoted sections, which is
| entirely sensible!
| dang wrote:
| " _Please don 't complain about tangential annoyances--e.g.
| article or website formats, name collisions, or back-button
| breakage. They're too common to be interesting._"
|
| https://news.ycombinator.com/newsguidelines.html
| happytoexplain wrote:
| Yeah, sorry. That was probably my last comment on this trend,
| since I think I've said all I have to say. However, I do
| think "too common" implicitly narrows the definition of
| "tangential annoyances" - I believe this is a new phenomenon
| (though I understand the spirit of the rule is to not have
| comment threads about things other than the content of the
| submission).
| strangescript wrote:
| We haven't reached a peak on scaling/performance, so even if an
| old model can be commoditized, a new one will be created to take
| advantage of the newly freed infra. Until we hit a ceiling on
| scaling, tokens are going to remain expensive relative to what
| people are trying to do with them because the underlying compute
| is expensive.
| dcre wrote:
| Vibes-based analysis. We have no idea how much these models cost
| to serve.
| Waterluvian wrote:
| On the topic of cost per token, is it accurate to represent a
| token as, ideally, a composable atomic unit of information. But
| because we're (often) using English as the encoding format, it
| can only be as efficient as English can encode the data.
|
| Does this mean that other languages might offer better
| information density per token? And does this mean that we could
| invent a language that's more efficient for these purposes, and
| something humans (perhaps only those who want a job as a prompt
| engineer) could be taught?
|
| Kevin speak good?
| https://youtu.be/_K-L9uhsBLM?si=t3zuEAmspuvmefwz
| deegles wrote:
| Human speech has a bit rate of around 39 bits per second, no
| matter how quickly you speak. assuming reading is similar, I
| guess more "dense" tokens would just take longer for humans to
| read.
|
| https://www.science.org/content/article/human-speech-may-hav...
| __s wrote:
| Sure, but that link has Japanese at 5 bits per syllable &
| Vietnamese at 8 bits per syllable, so if billing was based on
| syllables per prompt you'd want Vietnamese prompts
|
| Granted English is probably going to have better quality
| output based on training data size
| r_lee wrote:
| Sure, for example Korean is unicode heavy, e.g. gyeongcal =
| police, but its just 2 unicode chars. Not too familiar with how
| things are encoded but it could be more efficient
| joseda-hg wrote:
| IIRC, in linguistics there's a hypothesis for "Uniform
| Information density" languages seem to follow on a human level
| (Denser languages slow down, sparse languages speed up) so you
| might have to go for an Artificial encoding, that maps
| effectively to english
|
| English (And any of the dominant languages that you could use
| in it's place) work significantly better than other languages
| purely by having significantly larger bodies of work for the
| LLM to work from
| Waterluvian wrote:
| Yeah I was wondering about it basically being a dialect or
| the CoffeeScript of English.
|
| Maybe even something anyone can read and maybe write... so...
| Kevin English.
|
| Job applications will ask for how well one can read and write
| Kevin.
| fy20 wrote:
| English often has a lot of redundancy, you could rewrite your
| comment to this and still have it convey the original meaning:
|
| Regarding cost per token: is a token ideally a composable,
| atomic unit of information? Since English is often used as an
| encoding format, efficiency is limited by English's encoding
| capacity.
|
| Could other languages offer higher information density per
| token? Could a more efficient language be invented for this
| purpose, one teachable to humans, especially aspiring prompt
| engineers?
|
| 67 tokens vs 106 for the original.
|
| Many languages don't have articles, you could probably strip
| them from this and still understand what it's saying.
| GiorgioG wrote:
| I tried Gemini CLI and in 2 hours somehow spent $22 just messing
| around with a very small codebase. I didn't find out until the
| next day from Google's billing system. That was enough for me - I
| won't touch it again.
| adrianbooth17 wrote:
| Isn't Gemini CLI free? Or did you BYOK?
| ankit219 wrote:
| Interesting article, full of speculation and some logical
| follows, but feels like it feels short of admitting what the true
| conclusion is. Model building companies can build thinner wrapper
| / harness and can offer better prices than third party companies
| (the article assumes it costs anthropic same price per token as
| it does for their customers) because their costs per token is
| lower than app layer companies. Anthropic has a decent margin
| (likely higher than openai) on sale of every token, and with more
| scale, they can sell at a lower cost (or some unlimited plans
| with limits that keeps out 1%-5% of the power users).
|
| I don't agree with the Cognition conclusion either. Enterprises
| are fighting super hard to not have a long term buying contract
| when they know SOTA (app or model) is different every 6 months.
| They are keeping their switching costs low and making sure they
| own the workflow, not the tool. This is even more prominent after
| Slack restricted API usage for enterprise customers.
|
| Making money on the infra is possible, but that again
| misunderstands the pricing power of Anthropic. Lovable, Replit
| etc. work because of Claude. Openai had codex, google had jules,
| both aren't as good in terms of taste compared to Claude. It's
| not the cli form factor which people love, it's the outcome they
| like. When Anthropic sees the money being left on the table in
| infra play, they will offer the same (at presumably better rates
| given Amazon is an investor) and likely repeat this strategy.
| Abstraction is a good play, only if you abstract it to the
| maximum possible levels.
| xrd wrote:
| This is the moment an open source solution could pop in and say
| just "uv add aider" and then make sure you have a 24gb card for
| Qwen3 for each dev, and you are future proofed for at least the
| next year. It seems like the only way out.
| ej88 wrote:
| The article just isn't that coherent for me.
|
| > when a new model is released as the SOTA, 99% of the demand
| immediately shifts over to it
|
| 99% is in the wrong ballpark. Lots of users use Sonnet 4 over
| Opus 4, despite Opus being 'more' SOTA. Lots of users use 4o over
| o3 or Gemini over Claude. In fact it's never been a closer race
| on who is the 'best': https://openrouter.ai/rankings
|
| >switch from opus ($75/m tokens) to sonnet ($15/m) when things
| get heavy. optimize with haiku for reading. like aws autoscaling,
| but for brains.
|
| they almost certainly built this behavior directly into the model
| weights
|
| ???
|
| Overall the article seems to argue that companies are running
| into issues with usage-based pricing due to consumers not
| accepting or being used to usage based pricing and it's difficult
| to be the first person to crack and switch to usage based.
|
| I don't think it's as big of an issue as the author makes it out
| to be. We've seen this play out before in cloud hosting.
|
| - Lots of consumers are OK with a flat fee per month and using an
| inferior model. 4o is objectively inferior to o3 but millions of
| people use it (or don't know any better). The free ChatGPT is
| even worse than 4o and the vast majority of chatgpt visitors use
| it!
|
| - Heavy users or businesses consume via API and usage based
| pricing (see cloud). This is almost certainly profitable.
|
| - Fundamentally most of these startups are B2B, not B2C
| motorest wrote:
| > In fact it's never been a closer race on who is the 'best'
|
| Thank you for pointing out that fact. Sometimes it's very hard
| to keep perspective.
|
| Sometimes I use Mistral as my main LLM. I know it's not lauded
| as the top performing LLM but the truth of the matter is that
| it's results are just as useful as the best models that
| ChatGPT/Gemini/Claude outputs, and it is way faster.
|
| There is indeed diminished returns on the current blend of
| commercial LLMs. Deep seek already proved that cost can be a
| major factor and quality can even improve. I think we're very
| close to see competition based on price, which might be the
| reason there is so much talk about mixture of experts
| approaches and how specialized models can drive down cost while
| improving targeted output.
| torginus wrote:
| Yeah, my biggest problem with CC is that it's slow, prone to
| generating tons of bullshit exposition, and often goes down
| paths that I can tell almost immediately will yield no useful
| result.
|
| It's great if you can leave it unattended, but personally,
| coding's an active thing for me, and watching it go is really
| frustrating.
| jsnell wrote:
| > now look at the actual pricing history of frontier models, the
| ones that 99% of the demand is for at any given time:
|
| The meaningful frontier isn't scalar on just the capability, it's
| on capability for a given cost. The highest capability models are
| not where 99% of the demand is on. Actually the opposite.
|
| To get an idea of what point on the frontier people prefer, have
| a look at the OpenRouter statistics
| (https://openrouter.ai/rankings). Claude Opus 4 has about 1% of
| their total usage, not 99%. Claude Sonnet 4 is the single most
| popular model at about 18%. The runners up in volume are Gemini
| Flash 2.0 and 2.5, which are in turn significantly cheaper than
| Sonnet 4.
| codr7 wrote:
| This is such a nice setup!
|
| They can deliver pretty much whatever they feel like. Who can
| tell a trash token from an hallucination? And tracking token
| usage is a pita.
|
| Sum it up and it translates to: sell whatever you feel like at
| whatever price you feel like.
|
| Nice!
| sshine wrote:
| > _nobody opens claude and thinks, "you know what? let me use the
| shitty version"_
|
| Sure I do!
|
| I will consistently pick the fastest and cheapest model that will
| do the job.
|
| Sonnet > Opus when coding
|
| Haiku > Sonnet when fusing kitchen recipes, or answering
| questions where search results deliver the bulk of the value, and
| the LLM part is really just for summarizing.
| esafak wrote:
| This is wrong. People are not dropping old models when new ones
| come out. I'm always on the lookout for cost effective models.
| The logical thing is to use the cheapest model that gets the job
| done, and you get a sense for that once you the model for a
| while.
|
| It is standard practice with some coding agents to have different
| models for different tasks, like building and planning.
| farkin88 wrote:
| Even though tokens are getting cheaper, I think the real killer
| of "unlimited" LLM plans isn't token costs themselves, it's the
| shape of the usage curve that's unsustainable. These products see
| a Zipf-like distribution: thousands of casual users nibble a few-
| hundred tokens a day while a tiny group of power automations
| devour tens of millions. Flat pricing works fine until one of
| those whales drops a repo-wide refactor or a 100 MB PDF into chat
| and instantly torpedoes the margin. Unless vendors turn those
| extreme loops into cheaper, purpose-built primitives (search,
| static analyzers, local quantized models, etc.), every "all-you-
| can-eat" AI subscription is just a slow-motion implosion waiting
| for its next whale.
| blotfaba wrote:
| We're not going to be using tokens forever, and inevitably
| specialized hardware will solve this bottleneck. Underestimating
| how much proprietary advancements are loaded into Google TPUs is
| sort of like thinking the best we've got are Acura TSXs when
| somebody's driving around in a Ferrari.
| jstummbillig wrote:
| This is silly? The important metric is value per token, which is
| obviously increasing, and thus the relative token is getting
| cheaper because you need far less of them to produce anything of
| value.
|
| Which then might lead to you using a lot more, because it offsets
| some other thing that costs even more still, like your time.
| acedTrex wrote:
| "Which is obviously increasing"
|
| With the primary advancements over the past two years being
| Chain Of Thought which absolutely obliterates token counts in
| what world would the "per token" value of a model be going
| up...
| jstummbillig wrote:
| If you are able to cogently explain how you would instruct
| GPT 3.5 with ANY amount of tokens to do what Sonnet 4 is able
| to do, I am sure there's a lot of wealthy people that would
| be very interested in having a talk with you.
| _0ffh wrote:
| Dunno, I'm happy to pay for API access based on token usage. I'd
| never so much as look at flat pricing, but maybe that's just me.
| torginus wrote:
| First of all, do they shoot you in San Francisco, if you use
| capital letters and punctuation?
|
| Second, why are SV people obsessed with fake exponentials? It's
| very clear that AI progress has only been exponential in the
| sense that people are throwing a lot more resources at AI then
| they did a couple years ago.
| 369548684892826 wrote:
| > First of all, do they shoot you in San Francisco, if you use
| capital letters and punctuation?
|
| Is it done like this just to show it wasn't written by a LLM?
| luqtas wrote:
| oh no! i can't deal with the natural morphing a lingua-franca
| has! /j
|
| Thou needst to live in the archaic.
| mensetmanusman wrote:
| They will soon be subsidized by ads or people will run their own.
___________________________________________________________________
(page generated 2025-08-03 23:00 UTC)