[HN Gopher] OpenAI dropped the price of o3 by 80%
       ___________________________________________________________________
        
       OpenAI dropped the price of o3 by 80%
        
       Author : mfiguiere
       Score  : 222 points
       Date   : 2025-06-10 17:41 UTC (5 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | minimaxir wrote:
       | ...how? I'd understand a 20-30% price drop from infra
       | improvements for a model as-is, but 80%?
       | 
       | I wonder if "we quantized it lol" would classify as false
       | advertising for modern LLMs.
        
         | tofof wrote:
         | Presumably because the price was about 5x higher to begin with
         | than any the competitors at the same tier of performance?
         | Perhaps it's better to get paid anything at all than to just
         | lose 100% of the customers.
        
         | drexlspivey wrote:
         | Deepseek made a few major innovations allowing them to achieve
         | major compute efficiency and then published them. My guess is
         | that OpenAI just implemented these themselves.
        
           | vitaflo wrote:
           | Wouldn't surprise me. And even with this price cut it's still
           | 4x more expensive than Deepseek R1 is.
        
       | ilaksh wrote:
       | Maybe because they also are releasing o3-pro.
        
       | MallocVoidstar wrote:
       | Note that they have not actually dropped the price yet:
       | https://x.com/OpenAIDevs/status/1932463601119637532
       | 
       | > We'll post to @openaidevs once the new pricing is in full
       | effect. In $10... 9... 8...
       | 
       | There is also speculation that they are only dropping the input
       | price, not the output price (which includes the reasoning
       | tokens).
        
         | sunaookami wrote:
         | I think that was a joke. New pricing is already in place:
         | 
         | Input: $2.00 / 1M tokens
         | 
         | Cached input: $0.50 / 1M tokens
         | 
         | Output: $8.00 / 1M tokens
         | 
         | https://openai.com/api/pricing/
         | 
         | Now cheaper than gpt-4o and same price as gpt-4.1 (!).
        
           | rvnx wrote:
           | It is slower though
        
           | MallocVoidstar wrote:
           | No, people had tested it after Altman's announcement and had
           | confirmed that they were still being billed at the original
           | price. And I checked the docs ~1h after and they still showed
           | the original price.
           | 
           | The speculation of only input pricing being lowered was
           | because yesterday they gave out vouchers for 1M free _input_
           | tokens while output tokens were still billed.
        
           | runako wrote:
           | > Now cheaper than gpt-4o and same price as gpt-4.1 (!).
           | 
           | This is where the naming choices get confusing. "Should" o3
           | cost more or less than GPT-4.1? Which is more capable? A
           | generation 3 of tech intuitively feels less advanced than a
           | 4.1 of a (similar) tech.
        
             | jacob019 wrote:
             | Do we know parameter counts? The reasoning models have
             | typically been cheaper per token, but use more tokens.
             | Latency is annoying. I'll keep using gpt-4.1 for day-to-
             | day.
        
             | koakuma-chan wrote:
             | o3 is a reasoning model, GPT-4.1 is not. They are
             | orthogonal.
        
               | runako wrote:
               | My quibble is with naming choices and differentiating.
               | Even here they are confusing:
               | 
               | - o4 is reasoning
               | 
               | - 4o is not
               | 
               | They simply do not do a good job of differentiating.
               | Unless you work directly in the field, it is likely not
               | obvious what is the difference between "our most powerful
               | reasoning model" and "our flagship model for complex
               | tasks."
               | 
               | "Does my complex task need reasoning or not?" seems to be
               | how one would choose. (What type of task is complex but
               | does not require any reasoning?) This seems less than
               | ideal!
        
               | koakuma-chan wrote:
               | This is true, and I believe apps automatically route
               | requests to appropriate models for normie users.
        
           | agsqwe wrote:
           | thinking models produce a lot of internal output tokens
           | making them more expensive than non-reasoning models for
           | similar prompt and visible output lengths
        
           | vitaflo wrote:
           | Still 4x more expensive than Deepseek R1 tho.
        
       | teaearlgraycold wrote:
       | Personally I've found these bigger models (o3/Claude 4 Opus) to
       | be disappointing for coding.
        
         | apwell23 wrote:
         | i found them all disappointing in their own ways. Atleast
         | deepseek models actually listen to what i say instead of
         | ignoring me doing their own thing like a toddler.
        
         | rvnx wrote:
         | Opus is really great but through Claude Code. If you used
         | Cursor or RooCode it could be normal that you get disappointed
        
           | bitpush wrote:
           | This matches my experience, but cant explain it. Do you know
           | what's going on?
        
             | eunoia wrote:
             | My understanding is context size. Companies like Cursor are
             | trying to minimize the amount of context sent to the models
             | to keep their own costs down. Claude Code seems to send a
             | lot more context with every request and that seems to make
             | the difference.
        
             | supermdguy wrote:
             | Just guessing, but the new Opus was probably RL tuned to
             | work better with Claude Code's tool calls
        
           | jedisct1 wrote:
           | I got the opposite experience. Not with Opus (too expensive),
           | but with Sonnet. I got things done way more efficiently when
           | using Sonnet with Roo than with Claude Code.
        
             | rgbrenner wrote:
             | same. i ran a few tests ($100 worth of api calls) with opus
             | 4 and didn't see any difference compared to sonnet 4 other
             | than the price.
             | 
             | also no idea why he thinks roo is handicapped when claude
             | code nerfs the thinking output and requires typing
             | "think"/think hard/think harder/ultrathink just to expand
             | the max thinking tokens.. which on ultrathink only sets it
             | at 32k... when the max in roo is 51200 and it's just a
             | setting.
        
       | behnamoh wrote:
       | how do we know it's not a quantized version of o3? what's
       | stopping these firms from announcing the full model to perform
       | well on the benchmarks and then gradually quantizing it (first at
       | Q8 so no one notices, then Q6, then Q4, ...).
       | 
       | I have a suspicion that's how they were able to get gpt-4-turbo
       | so fast. In practice, I found it inferior to the original GPT-4
       | but the company probably benchmaxxed the hell out of the turbo
       | and 4o versions so even though they were worse models, users
       | found them more pleasing.
        
         | esafak wrote:
         | Are there any benchmarks that track historical performance?
        
           | behnamoh wrote:
           | good question, and I don't know of any, although it's a no
           | brainer that someone should make it.
           | 
           | a proxy to that may be the anecdotal evidence of users who
           | report back in a month that model X has gotten dumber
           | (started with gpt-4 and keeps happening, esp. with Anthro and
           | OpenAI models). I haven't heard such anecdotal stories about
           | Gemini, R1, etc.
        
           | SparkyMcUnicorn wrote:
           | Aider has one, but it hasn't been updated in months. People
           | kept claiming models were getting worse, but the results
           | proved that they weren't.
        
             | esafak wrote:
             | https://aider.chat/docs/leaderboards/by-release-date.html
        
             | __mharrison__ wrote:
             | Updated yesterday... https://aider.chat/docs/leaderboards/
        
               | vitaflo wrote:
               | That Deepseek price is always hilarious to see in these
               | charts.
        
               | SparkyMcUnicorn wrote:
               | That's not the one I'm referring to. See my other
               | comments or your sibling comment.
        
         | benterix wrote:
         | > users found them more pleasing.
         | 
         |  _Some_ users. For me the drop was so huge it became almost
         | unusable for the things I had used it for.
        
           | behnamoh wrote:
           | Same here. One of my apps straight out stopped working
           | because the gpt-4o outputs were noticeably worse than the
           | gpt-4 that I built the app based on.
        
         | lispisok wrote:
         | I swear every time a new model is released it's great at first
         | but then performance gets worse over time. I figured they were
         | fine-tuning it to get rid of bad output which also nerfed the
         | really good output. Now I'm wondering if they were quantizing
         | it.
        
           | nabla9 wrote:
           | It seems that least Google is overselling their compute
           | capacity.
           | 
           | You pay monthly fee, but Gemini is completely jammed 5-6
           | hours when North America is working.
        
             | baq wrote:
             | Gemini is simply that good. I'm trying out Claude 4 every
             | now and then and go back to Gemini to fix its mess...
        
               | fasterthanlime wrote:
               | Funny, I have the exact opposite experience! I use Claude
               | to fix Gemini's mess.
        
               | symfoniq wrote:
               | Maybe LLMs just make messes.
        
               | hgomersall wrote:
               | I heard that, but I'm getting consistent garbage from
               | Gemini.
        
               | dayjah wrote:
               | For code? Use the context7 mcp.
        
               | energy123 wrote:
               | Gemini is the best model in the world. Gemini is the
               | worst web app in the world. Somehow those two things are
               | coexisting. The web devs in their UI team have really
               | betrayed the hard work of their ML and hardware
               | colleagues. I don't say this lightly - I say this after
               | having paid attention to critical bugs, more than I can
               | count on one hand, that persisted for over a year. They
               | either don't care or are grossly incompetent.
        
               | thorum wrote:
               | Try AI Studio if you haven't already:
               | https://aistudio.google.com/
        
               | koakuma-chan wrote:
               | https://ai.dev
        
               | nabla9 wrote:
               | Well said.
               | 
               | Google is best in pure AI research, both quality and
               | volume. They have sucked at productization for years. Not
               | not just AI but other products as well. Real mystery.
        
               | energy123 wrote:
               | I don't understand why they can't just make it fast and
               | go through the bug reports from a year ago and fix them.
               | Is it that hard to build a box for users to type text
               | into without it lagging for 5 seconds or throwing a bunch
               | of errors?
        
             | edzitron wrote:
             | When you say "jammed," how do you mean?
        
           | solfox wrote:
           | I have seen this behavior as well.
        
           | mhitza wrote:
           | That was my suspicion when I first deleted my account, when
           | it felt the output got worse in ChatGPT and I found highly
           | suspicious when I saw an errand davinci model keyword in the
           | chatgpt url.
           | 
           | Now I'm feeling similarly with their image generation (which
           | is the only reason I created a paid account two months ago,
           | and the output looks more generic by default).
        
           | Tiberium wrote:
           | I've heard lots of people say that, but no objective
           | reproducible benchmarks confirm such a thing happening often.
           | Could this simply be a case of novelty/excitement for a new
           | model fading away as you learn more about its shortcomings?
        
             | 85392_school wrote:
             | I think it's an illusion. People have been claiming it
             | since the GPT-4 days, but nobody's ever posted any good
             | evidence to the "model-changes" channel in Anthropic's
             | Discord. It's probably just nostalgia.
        
             | herval wrote:
             | there's definitely measurements (eg
             | https://hdsr.mitpress.mit.edu/pub/y95zitmz/release/2 ) but
             | I imagine they're rare because those benchmarks are
             | expensive, so nobody keeps running them all the time?
             | 
             | Anecdotally, it's quite clear that some models are
             | throttled during the day (eg Claude sometimes falls back to
             | "concise mode" - with and without a warning on the app).
             | 
             | You can tell if you're using Windsurf/Cursor too - there
             | are times of the day where the models constantly fail to do
             | tool calling, and other times they "just work" (for the
             | same query).
             | 
             | Finally, there's cases where it was confirmed by the
             | company, like Gpt-4o's sycopanth tirade that very clearly
             | impacted its output (https://openai.com/index/sycophancy-
             | in-gpt-4o/)
        
               | drewnick wrote:
               | I feel this too. I swear some of the coding Claude Code
               | does on weekends is superior to the weekdays. It just has
               | these eureka moments every now and then.
        
               | herval wrote:
               | Claude has been particularly bad since they released 4.0.
               | The push to remove 3.7 from Windsurf hasn't helped
               | either. Pretty evident they're trying to force people to
               | pay for Claude Code...
               | 
               | Trusting these LLM providers today is as risky as
               | trusting Facebook as a platform, when they were pushing
               | their "opensocial" stuff
        
               | Deathmax wrote:
               | Your linked article is specifically comparing two
               | different versioned snapshots of a model and not
               | comparing the same model across time.
               | 
               | You've also made the mistake of conflating what's served
               | via API platforms which are meant to be stable, and
               | frontends which have no stability guarantees, and are
               | very much iterated on in terms of the underlying model
               | and system prompts. The GPT-4o sycophancy debacle was
               | only on the specific model that's served via the ChatGPT
               | frontend and never impacted the stable snapshots on the
               | API.
               | 
               | I have never seen any sort of compelling evidence that
               | any of the large labs tinkers with their stable,
               | versioned model releases that are served via their API
               | platforms.
        
               | herval wrote:
               | Please read it again. The article is clearly comparing
               | gpt4 to gpt4, and gpt3.5 to gpt3.5, in march vs june 2023
        
               | Deathmax wrote:
               | I did read it, and I even went to their eval repo.
               | 
               | > At the time of writing, there are two major versions
               | available for GPT-4 and GPT-3.5 through OpenAI's API, one
               | snapshotted in March 2023 and another in June 2023.
               | 
               | openaichat/gpt-3.5-turbo-0301 vs
               | openaichat/gpt-3.5-turbo-0613, openaichat/gpt-4-0314 vs
               | openaichat/gpt-4-0613. Two _distinct_ versions of the
               | model, and not the _same_ model over time like how people
               | like to complain that a model gets "nerfed" over time.
        
               | glitch253 wrote:
               | Cursor / Windsurf's degraded functionality is exactly why
               | I created my own system:
               | 
               | https://github.com/mpfaffenberger/code_puppy
        
             | Kranar wrote:
             | I used to think the models got worse over time as well but
             | then I checked my chat history and what I noticed isn't
             | that ChatGPT gets worse, it's that my standards and
             | expectations increase over time.
             | 
             | When a new model comes out I test the waters a bit with
             | some more ambitious queries and get impressed when it can
             | handle them reasonably well. Over time I take it for
             | granted and then just expect it to be able to handle ever
             | more complex queries and get dissappointed when I hit a new
             | limit.
        
               | echelon wrote:
               | Re-run your historical queries, or queries that are
               | similarly shaped.
        
               | throwaway314155 wrote:
               | Sounds like a _whole_ thing.
        
               | sakesun wrote:
               | They could cache that :)
        
             | bobxmax wrote:
             | My suspicion is it's the personalization. Most people have
             | things like 'memory' on, and as the models increasingly
             | personalize towards you, that personalization is hurting
             | quality rather than helping it.
             | 
             | Which is why the base model wouldn't necessarily show
             | differences when you benchmarked them.
        
             | cainxinth wrote:
             | I assumed it was because the first week revealed a ton of
             | safety issues that they then "patched" by adjusting the
             | system prompt, and thus using up more inference tokens on
             | things other than the user's request.
        
           | JamesBarney wrote:
           | I'm pretty sure this is just a psychological phenomenon. When
           | a new model is released all the capabilities the new model
           | has that the old model lacks are very salient. This makes it
           | seem amazing. Then you get used to the model, push it to the
           | frontier, and suddenly the most salient memories of the new
           | model are it's failures.
           | 
           | There are tons of benchmarks that don't show any regressions.
           | Even small and unpublished ones rarely show regressions.
        
         | risho wrote:
         | Quantization is a massive efficiency gain for near negligible
         | drop in quality. If the tradeoff is quantization for an 80
         | percent price drop I would take that any day of the week.
        
           | behnamoh wrote:
           | > for near negligible drop in quality
           | 
           | Hmm, that's evidently and anecdotally wrong:
           | 
           | https://github.com/ggml-org/llama.cpp/discussions/4110
        
           | spiderice wrote:
           | You may be right that the tradeoff is worth it, but it should
           | be advertised as such. You shouldn't think you're paying for
           | full o3, even if they're heavily discounting it.
        
         | CSMastermind wrote:
         | This is almost certainly what they're doing and rebranding the
         | original o3 model as "o3-pro"
        
           | behnamoh wrote:
           | > rebranding the original o3 model as "o3-pro"
           | 
           | interesting take, I wouldn't be surprised if they did that.
        
           | anticensor wrote:
           | -pro models appear to be a best-of-10 sampling of the
           | original full size model
        
             | Szpadel wrote:
             | how do you sample it behind the scenes? usually best of X
             | means you generate X outputs and you choose best result.
             | 
             | if you could do this automatically, it would be game
             | changer as you could run top 5 best models in parallel and
             | select best answer every time
             | 
             | but it's not practical because you are the bottleneck as
             | you have to read all 5 solutions and compare them
        
               | joshstrange wrote:
               | I think the idea is they use another/same model to judge
               | all the results and only return the best one to the user.
        
               | anticensor wrote:
               | > if you could do this automatically, it would be game
               | changer as you could run top 5 best models in parallel
               | and select best answer every time
               | 
               | remember they have access to the RLHF reward model,
               | against which they can evaluate all N outputs and have
               | the most "rewarded" answer picked and sent
        
           | tedsanders wrote:
           | Nope, not what we're doing.
           | 
           | o3 is still o3 (no nerfing) and o3-pro is new and better than
           | o3.
           | 
           | If we were lying about this, it would be really easy to catch
           | us - just run evals.
           | 
           | (I work at OpenAI.)
        
             | bn-l wrote:
             | Not quantized?
        
               | tedsanders wrote:
               | Not quantized. Weights are the same.
               | 
               | If we did change the model, we'd release it as a new
               | model with a new name (e.g., o3-turbo-2025-06-10). It
               | would be very annoying to customers if we ever silently
               | changed models, so we never do this [1].
               | 
               | [1] `chatgpt-4o-latest` being an explicit exception
        
             | MattDaEskimo wrote:
             | What's with the dropped benchmark performance compared to
             | the original o3 release? It was disappointing to not see
             | o4-mini on it as well
        
         | ants_everywhere wrote:
         | Is this what happened to Gemini 2.5 Pro? It used to be very
         | good, but it's started struggling on basic tasks.
         | 
         | The thing that gets me is it seems to be lying about fetching a
         | web page. It will say things are there that were never on any
         | version of the page and it sometimes takes multiple screenshots
         | of the page to convince it that it's wrong.
        
           | SparkyMcUnicorn wrote:
           | The Aider discord community has proposed and disproven the
           | theory that 2.5 Pro became worse, several times, through many
           | benchmark runs.
           | 
           | It had a few bugs here or there when they pushed updates, but
           | it didn't get worse.
        
             | ants_everywhere wrote:
             | Gemini is objectively exhibiting new behavior with the same
             | prompts and that behavior is unwelcome. It includes
             | hallucinating information and refusing to believe it's
             | wrong.
             | 
             | My question is not whether this is true (it is) but why
             | it's happening.
             | 
             | I am willing to believe the aider community has found that
             | Gemini has maintained approximately equivalent performance
             | on fixed benchmarks. That's reasonable considering they
             | probably use a/b testing on benchmarks to tell them whether
             | training or architectural changes need to be reverted.
             | 
             | But all versions of aider I've tested, including the most
             | recent one, don't handle Gemini correctly so I'm skeptical
             | that they're the state of the art with respect to bench-
             | marking Gemini.
        
               | SparkyMcUnicorn wrote:
               | Gemini 2.5 Pro is the highest ranking model on the aider
               | benchmarks leaderboard.
               | 
               | For benchmarks, either Gemini writes code that adheres to
               | the required edit format, builds successfully, and passes
               | unit tests, or it doesn't.
               | 
               | I primarily use aider + 2.5 pro for planning/spec files,
               | and occasionally have it do file edits directly. Works
               | great, other than stopping it mid-execution once in a
               | while.
        
         | jstummbillig wrote:
         | You can just give it a go for very little money (in Windsurf
         | it's 1x right now), and see what it does. There is no room for
         | conspiracy here, because you can simple look at what it does.
         | If you don't like it, so won't others, and then people will not
         | use it. People are obviously very capable of (collectively)
         | forming opinions on models, and then vote with their wallet.
        
         | resters wrote:
         | It's probably optimized in some way, but if the optimizations
         | degrade performance, let's hope it is reflected in various
         | benchmarks. One alternative hypothesis is that it's the same
         | model, but in the early days they make it think "harder" and
         | run a meta-process to collect training data for reinforcement
         | learning for use on future models.
        
           | SparkyMcUnicorn wrote:
           | It's a bit dated now, but it would be cool if people
           | submitted PRs for this one:
           | https://aider.chat/docs/leaderboards/by-release-date.html
        
             | __mharrison__ wrote:
             | Dated? This was updated yesterday
             | https://aider.chat/docs/leaderboards/
        
               | SparkyMcUnicorn wrote:
               | My link is to the benchmark results _over time_.
               | 
               | The main leaderboard page that you linked to is updated
               | quite frequently, but it doesn't contain multiple
               | benchmarks for the same exact model.
        
         | carter-0 wrote:
         | An OpenAI researcher claims it's the exact same model on X:
         | https://x.com/aidan_mclau/status/1932507602216497608
        
         | segmondy wrote:
         | you don't, so run your own model.
        
         | EnPissant wrote:
         | The API lists o3 and o3-2025-04-16 as the same thing with the
         | same price. The date based models are set in stone.
        
         | hyperknot wrote:
         | I got 700+ tokens/sec on o3 after the announcement, I suspect
         | it's very much a quantized version.
         | 
         | https://x.com/hyperknot/status/1932476190608036243
        
           | dist-epoch wrote:
           | Or maybe they just brought online much faster much cheaper
           | hardware.
        
           | zackangelo wrote:
           | Is that input tokens or output tokens/s?
        
         | Bjorkbat wrote:
         | Related, when o3 finally came out ARC-AGI updated their graph
         | because it didn't perform nearly as well as the version of o3
         | that "beat" the benchmark.
         | 
         | https://arcprize.org/blog/analyzing-o3-with-arc-agi
        
         | ctoth wrote:
         | From the announcement email:
         | 
         | > Today, we dropped the price of OpenAI o3 by 80%, bringing the
         | cost down to $2 / 1M input tokens and $8 / 1M output tokens.
         | 
         | > We optimized our inference stack that serves o3--this is the
         | same exact model, just cheaper.
        
         | smusamashah wrote:
         | Hw about testing same input vs output with same seed on
         | different dates. If its a different model it will return
         | different output.
        
           | zomnoys wrote:
           | Isn't this not true since these models run with a non-zero
           | temperature?
        
             | smusamashah wrote:
             | You can set the temperature too.
        
         | luke-stanley wrote:
         | I think the API has some special IDs to check for
         | reproducibility of the environment.
        
       | visiondude wrote:
       | always seemed to me that efficient caching strategies could
       | greatly reduce costs... wonder if they cooked up something new
        
         | xmprt wrote:
         | How are LLMs cached? Every prompt would be different so it's
         | not clear how that would work. Unless you're talking about
         | caching the model weights...
        
           | koakuma-chan wrote:
           | A lot of the prompt is always the same: the instructions, the
           | context, the codebase (if you are coding), etc.
        
           | amanda99 wrote:
           | You would use a KV cache to cache a significant chunk of the
           | inference work.
        
             | biophysboy wrote:
             | Do you mean that they provide the same answer to verbatim-
             | equivalent questions, and pull the answer out of storage
             | instead of recalculating each time? I've always wondered if
             | they did this.
        
               | koakuma-chan wrote:
               | The prompt may be the same but the seed is different
               | every time.
        
               | biophysboy wrote:
               | Could you not cache the top k outputs given a provided
               | input token set? I thought the randomness was applied at
               | the end by sampling the output distribution.
        
               | Traubenfuchs wrote:
               | I bet there is a set of repetitive single, or two,
               | question user requests that makes out a sizeable amount
               | of all requests. The models are so expensive to run, 1%
               | would be enough. Much less than 1%. To make it less
               | obvious they probably have a big set of response
               | variants. I don't see how they would not do this.
               | 
               | They probably also have cheap code or cheap models that
               | normalize requests to increase cache hit rate.
        
           | hadlock wrote:
           | I've asked it a question not in it's dataset three different
           | ways and I see the same three sentences in the response, word
           | for word, which could imply it's caching the core answer. I
           | hadn't previously seen this behavior before this last week.
        
           | HugoDias wrote:
           | This document explains the process very well. It's a good
           | read: https://platform.openai.com/docs/guides/prompt-caching
        
             | xmprt wrote:
             | That link explains how OpenAI uses it, but doesn't really
             | walk through how it's any faster. I thought the whole point
             | of transformers was that inference speed no longer depended
             | on prompt length. So how does caching the prompt help
             | reduce latency if the outputs aren't being cached.
             | 
             | > Regardless of whether caching is used, the output
             | generated will be identical. This is because only the
             | prompt itself is cached, while the actual response is
             | computed anew each time based on the cached prompt
        
           | tasuki wrote:
           | > Every prompt would be different
           | 
           | No? Eg "how to cook pasta" is probably asked a lot.
        
       | candiddevmike wrote:
       | It's going to be a race to the bottom, they have no moat.
        
         | rvnx wrote:
         | Especially now that they are second in the race (behind
         | Anthropic) and lot of free-to-download and free-to-use models
         | are now starting to be viable competitors.
         | 
         | Once new MacBooks and iPhones have enough memory onboard this
         | is going to be a disaster for OpenAI and other providers.
        
           | mattnewton wrote:
           | I'm not sure they're scared of Anthropic - they're doing
           | great work but afaict running into some scaling issues and
           | really focused on winning over developers at the moment.
           | 
           | If I was OpenAI (or Anthropic for that matter) I would remain
           | scared of Google, who is now awake and able to dump Gemini
           | 2.5 pro on the market at costs that I'm not sure people
           | without their own hardware can compete with, and with the
           | infrastructure to handle everyone switching to them tomorrow.
        
             | itomato wrote:
             | Codex Research Preview appeared in my account in the early
             | AM.
        
             | piuantiderp wrote:
             | Google is going to lap them. The hardware muscle they have
             | has not even started flexing
        
           | koakuma-chan wrote:
           | What do you mean, Google is number 1
        
           | aerhardt wrote:
           | OpenAI are second in the race to Anthropic in some benchmarks
           | (maybe?), but OpenAI still dwarves Anthropic in distribution
           | and popularity.
        
             | ratedgene wrote:
             | That's slowly changing. I know some relatively non-tech
             | savvy young people using things like Claude for various
             | reasons, so people are exploring options.
        
               | jstummbillig wrote:
               | Very, _very_ slowly.
               | 
               | OpenAI vs Anthropic on Google Trends
               | 
               | https://trends.google.com/trends/explore?date=today%203-m
               | &q=...
               | 
               | ChatGPT vs Claude on Google Trends
               | 
               | https://trends.google.com/trends/explore?date=today%203-m
               | &q=...
        
               | rvnx wrote:
               | This is such a big difference, thank you for sharing it,
               | I didn't expect the gap to be _that_ huge
        
               | sndean wrote:
               | I wonder how much of this is brand name? Like Kleenex.
               | Non-tech people might not search for LLM, generative AI,
               | etc. ChatGPT may just be what people have heard of. I'm
               | assuming OpenAI has a large advantage over Anthropic, and
               | the name helps, but I bet the name is exaggerating the
               | difference here a bit. Not everyone buys Kleenex branded
               | Kleenex.
        
           | jdprgm wrote:
           | While mac unified ram inference is great for prosumers+ I
           | really don't foresee Apple making 128GB+ options affordable
           | enough to be attractive for inference for the general public.
           | iPhone even less so considering the latest is only at 8GB.
           | Meanwhile the best model sizes will just keep growing.
        
           | paxys wrote:
           | Third behind Anthropic/Google. People are too quick to
           | discount mindshare though. For the vast majority of the
           | world's population AI = LLM = ChatGPT, and that itself will
           | keep OpenAI _years_ ahead of the competition as long as they
           | don 't blunder away that audience.
        
           | slashdev wrote:
           | Thrid for coding, after Anthropic, and Gemini, which was
           | leading last I checked.
        
         | joshuajooste05 wrote:
         | There was an article on here a week or two ago on batch
         | inference.
         | 
         | Do you not think that batch inference gives at least a bit of a
         | moat whereby unit costs fall with more prompts per unit of
         | time, especially if models get more complicated and larger in
         | the future?
        
           | minimaxir wrote:
           | Batch inference is not exclusive to OpenAI.
        
         | mrweasel wrote:
         | My understanding was that OpenAI couldn't make money at their
         | previous price point, and I don't think operation and training
         | cost have gone down sufficiently to make up for those short
         | comings. So how are they going to make money by lowering the
         | price by 80%?
         | 
         | I get the point is to be the last man standing, and poaching
         | customers by lowering the price, and perhaps attract a few
         | people who wouldn't have bought a subscription at the higher
         | price. I just question how long investors can justify pouring
         | money into OpenAI. OpenAI is also the poster child for modern
         | AI, so if they fail the market will react badly.
         | 
         | Mostly I don't understand Silicon Valley venture capital, but
         | dumping price, making wild purchases for investor money and
         | mostly only leading on branding, why isn't this a sign that
         | OpenAI is failing?
        
           | simonw wrote:
           | OpenAI's Adam Groth credits "engineers optimizing
           | inferencing" for the price drop:
           | https://twitter.com/TheRealAdamG/status/1932440328293806321
           | 
           | That seems likely to me, all of the LLM providers have been
           | consistently finding new optimizations for the past couple of
           | years.
        
         | m3kw9 wrote:
         | LLM inferencing is race to the bottom but the service layers on
         | top isn't. People always pay much more for convenience, those
         | are the thing OpenAI focuses on and is harder to replicate
        
         | Szpadel wrote:
         | for sure they are no longer clear winners, but they try to be
         | just barely on top of others.
         | 
         | right now new Gemini surpassed their o3 (barely) in benchmarks
         | for significantly less money so they cut pricing to be still
         | competitive.
         | 
         | I bet they didn't released o4 not because it's not competitive,
         | but because they are doing Nvidia game: release new product
         | that is just enough better to convince people to buy it. so IMO
         | they are holding full o4 model to have something to release
         | after competition release something better that their top horse
        
       | ninetyninenine wrote:
       | You know. because LLMs can only be built by corporations... but
       | because they're so easy to build, I see the price going down
       | massively thanks to competition. Consumers benefit because all
       | the companies are trying to out run each other.
        
         | codr7 wrote:
         | And then they all go out of business, since models cost a
         | fortune to build, and their fan club is left staring at their
         | computers trying to remember how to do anything without getting
         | it served on a silver plate.
        
           | merth wrote:
           | Investors pouring money, its probably impossible to go out of
           | business, at least for the big ones, until investors realise
           | this is wrong hill to die on.
        
             | codr7 wrote:
             | Which they will eventually; so the point stands, no matter
             | how unpopular with the AI excusers out there.
        
           | wrsh07 wrote:
           | I expect they don't go out of business: at worst they don't
           | start their next training run quite as aggressively and
           | instead let their new very good model be profitable for a
           | minute
           | 
           | Many many companies are currently thrilled to pay the current
           | model prices for no performance improvement for 2-3 years
           | 
           | We still have so many features to build on top of current
           | capabilities
        
         | croes wrote:
         | Easy doesn't mean cheap.
         | 
         | They need lots of energy and customers don't pay much, if they
         | pay at all
        
           | briian wrote:
           | Exactly,
           | 
           | The developers of AI models do have a moat, the cost of
           | training the model in the first place.
           | 
           | It's 90% of the low effort AI wrappers with little to no
           | value add who have no moat.
        
       | koakuma-chan wrote:
       | OpenAI dropped the price by so much that the server also went
       | down.
        
         | pbasista wrote:
         | Is the price drop really the reason for their recent outage?
         | 
         | Or is the price drop an attempt to cover up bad news about the
         | outage with news about the price drop?
        
           | johanyc wrote:
           | > Or is the price drop an attempt to cover up bad news about
           | the outage with news about the price drop?
           | 
           | This makes no sense. No way a global outage will get less
           | coverage than the price drop.
           | 
           | Also the earliest sign of price drop is this tweet 20 hrs ago
           | (https://x.com/OpenAIDevs/status/1932248668469445002), which
           | is earlier than the earliest outage reports 13hrs ago on
           | https://downdetector.com/status/openai/
        
             | koakuma-chan wrote:
             | > No way a global outage will get less coverage than the
             | price drop.
             | 
             | Have you seen today's outage on any news outlet? I have
             | not. Is there an HN thread?
        
       | biophysboy wrote:
       | I don't know if this is OpenAI's intention, but the little
       | message "you've reached your usage limit!" is actively
       | disincentivizing me from subscribing. For my purposes, the free
       | model is more than good enough; the difference before and after
       | is negligible. I honestly wouldn't pay a dollar.
       | 
       | That said, I'm absolutely willing to hear people out on "value-
       | adds" I am missing out on; I'm not a knee-jerk hater (For
       | context, I work with large, complex & private
       | databases/platforms, so its not really possible for me to do
       | anything but ask for scripting suggestions).
       | 
       | Also, I am 100% expecting a sad day when I'll be forced to
       | subscribe, unless I want to read dick pill ads shoehorned in to
       | the answers (looking at you, YouTube). I do worry about getting
       | dependent on this tool and watching it become enshittified.
        
         | Traubenfuchs wrote:
         | > "you've reached your usage limit!"
         | 
         | Just switch to a competitors free offering. There are enough to
         | cycle through not to be hindered by limits. I wonder how much
         | money I have cost those companies by now?
         | 
         | How anyone believes there is any moat for anyone here is beyond
         | me.
        
           | wrsh07 wrote:
           | I expect the answer is <$1 as someone who shares a discord
           | server with a friend where we egregiously ping the models
        
         | wrsh07 wrote:
         | o3 is so good it's worth paying for a minute (just for plus)
         | just to see what it's like
         | 
         | I've never used anything like it. I think new Claude is
         | similarly capable
        
       | lvl155 wrote:
       | Google has been catching up. Funny how fast this space is
       | evolving. Just a few months ago, it was all about DeepSeek.
        
         | bitpush wrote:
         | Many would say Google's Gemini models are SOTA, although Claude
         | seems to be doing well with coding tasks.
        
           | snarf21 wrote:
           | Gemini has been better than Claude for me on a coding
           | project. Claude kept telling me it update some code but the
           | update wasn't in the output. Like, I had to re-prompt just
           | for updated output 5 times in a row.
        
             | jacob019 wrote:
             | I break out Gemini 2.5 pro when Claude gets stuck, it's
             | just so slow and verbose. Claude follows instructions
             | better and seems to better understand it's role in agentic
             | workflows. Gemini does something different with the
             | context, it has a deeper understanding of the control flow
             | and can uncover edge case bugs that Claude misses. o3 seems
             | better at high level thinking and planning, questioning if
             | it should it be done and whether the challenge actually
             | matches the need. They're kind of like colleagues with
             | unique strengths. o3 does well with a lot of things, I just
             | haven't used it as much because of the cost. Will probably
             | use it more now.
        
         | johan914 wrote:
         | I have been using Google's models the past couple months, and
         | was surprised to see how sycophantic chatGPT is now. It's not
         | just at the start or end of responses, it's interspaced within
         | the markdown, with little substance. Asking it to change its
         | style makes it overuse technical terms.
        
       | lxgr wrote:
       | Is there also a corresponding increase in weekly messages for
       | ChatGPT Plus users with o3?
       | 
       | In my experience, o4-mini and o4-mini-high are far behind o3 in
       | utility, but since I'm rate-limited for the latter, I end up
       | primarily using the former, which has kind of reinforced the
       | perception that OpenAI's thinking models are behind the
       | competition altogether.
        
         | el_benhameen wrote:
         | My usage has also reflected the pretty heavy rate limits on o3.
         | I find o4-mini-high to be quite good, but I agree that I would
         | much rather use o3. Hoping this means an increase in the
         | limits.
        
       | coffeecoders wrote:
       | Despite the popular take that LLMs have no moat and are burning
       | cash, I find OpenAI's situation really promising.
       | 
       | Just yesterday, they reported an annualized revenue run rate of
       | 10B. Their last funding round in March valued them at 300B.
       | Despite losing 5B last year, they are growing really fast - 30x
       | revenue with over 500M active users.
       | 
       | It reminds me a lot of Uber in its earlier years--fast growth,
       | heavy investment, but edging closer to profitability.
        
         | rgavuliak wrote:
         | I don't think the no moat approach makes sense. In a world
         | where more an more content and interaction is done with and via
         | LLMs, the data of your users chatting with your LLM is a super
         | valuable dataset.
        
         | bitpush wrote:
         | The problem is your costs also scale with revenue. Ideally you
         | want to have control costs as you scale (the first you build is
         | expensive, but as you make more your costs come down).
         | 
         | For OpenAI, the more people use the product, the same you spend
         | on compute unless they can supplement it with another ways of
         | generating revenue.
         | 
         | I dont unfortunately think OpenAI will be able to hit sustained
         | profitability (see Netflix for another example)
        
           | Legend2440 wrote:
           | >(see Netflix for another example)
           | 
           | Netflix has been profitable for over a decade though? They
           | reported $8.7 billion in profit in 2024.
        
             | amazingamazing wrote:
             | They increased prices and are not selling a pure commodity
             | tho
        
           | aizk wrote:
           | > sustained profitability (see Netflix for another example)
           | 
           | What? Netflix is incredibly profitable.
        
             | bitpush wrote:
             | Probably a bad example from my part, but also because of
             | increasing the costs and offering a tier with ads. I was
             | mostly talking about the Netflix as it was originally
             | concieved. "Give access to unlimited content at a flat
             | fee", which didnt scale pretty well.
        
               | whiplash451 wrote:
               | Isn't this exactly what they offer today?
        
           | tptacek wrote:
           | All costs are not equal. There is a classic pattern of
           | dogfights for winner-take-most product categories where the
           | long term winner does the best job of _acquiring customers_
           | at the expense of things like  "engineering to reduce costs".
           | I have no idea how the AI space is going to shake out, but if
           | I had to pick between OpenAI's mindshare in the broadest
           | possible cohort of users vs. best/most efficient model, I'd
           | pick the customers.
           | 
           | Obviously, lots of nerds on HN have preferences for Gemini
           | and Claude, and having used all three I completely get why
           | that is. But we should remember we're not representative of
           | the whole addressable market. There were probably nerds on
           | like ancient dial-up bulletin boards explaining why Betamax
           | was going to win, too.
        
             | awongh wrote:
             | We don't even know yet if the model is the product though,
             | and if OpenAI is the company that will make _the_ AI
             | product /model, (chat that keeps expanding into other
             | functionalities and capabilities) or will it be 10,000
             | companies using the OpenAI models. (well, it's probably
             | both, but in what proportion of revenue)
        
               | tptacek wrote:
               | Right, but it might not even matter if all the
               | competitors are in the ballpark of the final
               | product/market fit and OpenAI holds a commanding lead in
               | customer acquisition.
               | 
               | Again: I don't know. I've got no predictions. I'm just
               | saying that the logic where OpenAI is outcompeted on
               | models themselves and thus automatically lose does not
               | hold automatically.
        
           | Magmalgebra wrote:
           | Anyone concerned about cost should remember that those costs
           | are dropping exponenentially.
           | 
           | Similarly, nearly all AI products but especially OpenAI are
           | heavily _under_ monetized. OpenAI is an excellent personal
           | shopper - the ad revenue that could be generated from that
           | rivals Facebook or Google.
        
             | smelendez wrote:
             | It wouldn't surprise me if they try, but ironically if GPT
             | is a good personal shopper, it might make it harder to
             | monetize with ads because people will trust the bot's
             | organic responses more than the ads.
             | 
             | You could override its suggestions with paid ones, or nerf
             | the bot's shopping abilities so it doesn't overshadow the
             | sponsors, but that will destroy trust in the product in a
             | very competitive industry.
             | 
             | You could put user-targeted ads on the site not necessarily
             | related to the current query, like ads you would see on
             | Facebook, but if the bot is really such a good personal
             | shopper, people are literally at a ChatGPT prompt when they
             | see the ads and will use it to comparison shop.
        
               | whiplash451 wrote:
               | Alternative: let users reduce their monthly bill by
               | accepting a sponsored answer with a dedicated button in
               | the UI
               | 
               | (with many potential variants)
        
           | simonw wrote:
           | "... as you make more your costs come down"
           | 
           | I'd say dropping the price of o3 by 80% due to "engineers
           | optimizing inferencing" is a strong sign that they're doing
           | exactly that.
        
           | marsten wrote:
           | You raise a good point that this isn't a low marginal cost
           | business like software, telecom, or (most of) the web.
           | Efficiency will be a big advantage for companies that can
           | achieve it, in part because it will let them scale to new AI
           | use cases.
           | 
           | With the race to get new models out the door, I doubt any of
           | these companies have done much to optimize cost so far.
           | Google is a partial exception - they began developing the TPU
           | ten years ago and the rest of their infrastructure has been
           | optimized over the years to serve computationally expensive
           | products (search, gmail, youtube, etc.).
        
         | ToucanLoucan wrote:
         | I mean sure, it's very promising if OpenAI's future is your
         | only metric. It gets notably darker if you look at the broader
         | picture of ChatGPT (and company)'s impact on our society.
         | 
         | * We have people uploading tons of zero-effort slop pieces to
         | all manner of online storefronts, and making people less likely
         | to buy overall because they assume everything is AI now
         | 
         | * We have an uncomfortable community of, to be blunt, actual
         | cultists emerging around ChatGPT, doing all kinds of shit from
         | annoying their friends and family all the way up to divorcing
         | their spouses
         | 
         | * Education is struggling in all kinds of ways due to students
         | using (and abusing) the tech, with already strained
         | administrations struggling to figure out how to navigate it
         | 
         | Like yeah if your only metric is OpenAI's particular line going
         | up, it's looking alright. And much like Uber, it's success
         | seems to be corrosive to the society in which it operates. Is
         | this supposed to be good news?
        
           | arealaccount wrote:
           | Dying for a reference on the cult stuff, a quick search
           | didn't provide anything interesting.
        
             | ToucanLoucan wrote:
             | Scroll through the ChatGPT subreddit right now and tell me
             | there isn't a TON of people in there who are legitimately
             | unwell. Reads like the back page notes of a dystopian
             | novel.
        
               | arandomhuman wrote:
               | I think this is less caused by ChatGPT/LLMs and more of a
               | phenomenon in social media circles where people flock to
               | "the thing" and have poor social skills and mental health
               | generally speaking.
        
             | wizzwizz4 wrote:
             | https://futurism.com/chatgpt-mental-health-crises, which
             | references the more famous
             | https://www.rollingstone.com/culture/culture-features/ai-
             | spi... but is a newer article.
        
             | MangoToupe wrote:
             | In addition to what the parent commenter was likely
             | referring to, there are also the Zizians:
             | https://en.wikipedia.org/wiki/Zizians
        
           | SlowTao wrote:
           | Yes but in a typical western business sense they are merely
           | optimizing for user engadgement and profits. What happens to
           | society a decade from now because of all the slop being
           | produced, that is not their concern. Facebook is just about
           | connecting friends right, totally wont become a series of
           | information moats and bubbles controlled by the algorithms...
           | 
           | A great communicator on the risks of AI being to heavily
           | intergrated into society is Zak Stein. As someone who works
           | in education, they are see first hand how people are becoming
           | dependent on this stuff rather than any kind of self
           | improvement. The people who are just handing over all their
           | thinking to the machine. It is very bizarre and I am seeing
           | it in my personal experience a lot more over the last few
           | months.
        
         | seydor wrote:
         | their moat is leaky because llm prices will be dropping forever
         | and the only viable model will be a free model. Eventually
         | everyone will catch up.
         | 
         | Plus there is the thing that "thinking models" can't really
         | solve complex tasks / aren't really as good as they are
         | believed to be .
        
           | Zaheer wrote:
           | I would wager most of their revenue is from the subscriptions
           | - both consumer and business. That pricing is detached from
           | the API pricing. The heavy emphasis on applications more
           | recently is because they realize this as well.
        
         | therealdrag0 wrote:
         | As an anecdote they have first mover advantage on me. I pay
         | monthly but mostly because it's good enough and I can't be
         | bothered to try a bunch out and switch. But if the dust settles
         | and prices drop i would be motivated to switch. How much that
         | matters maybe depends if their revenue comes from app users or
         | API plans. And first mover only works once. Now they maybe
         | coasting on name recognition, but otherwise new users maybe
         | load balanced among all the options.
        
       | unraveller wrote:
       | I have no moat and I must make these GPUs scream.
        
       | blueblisters wrote:
       | This is the best model out there, priced level or lesser than
       | Claude and Gemini
       | 
       | They're not letting the competition breathe
        
       | seydor wrote:
       | when the race to the bottom reaches the bottom, the foundation
       | model companies will be bought by ... energy companies. You 'll
       | be paying for AI with your electricity bill
        
         | paxys wrote:
         | It'll be the opposite. Large tech companies are already running
         | their own power plants.
        
       | ramesh31 wrote:
       | Anthropic will need to follow suit with Opus soon. It is simply
       | too expensive for anything by an order of magnitude.
        
       | madebywelch wrote:
       | They could drop the price 100% and I still wouldn't use it, so
       | long as they're retaining my data.
        
         | simonw wrote:
         | Sounds like you want their Zero Data Retention plan:
         | https://platform.openai.com/docs/guides/your-data#zero-data-...
         | 
         | (It's "contact us" pricing, so I have no idea how much that
         | would set you back. I'm guessing it's not cheap.)
        
           | scudsworth wrote:
           | it doesn't seem like this would supercede a court order
        
             | tech234a wrote:
             | Actually it does according to
             | https://openai.com/index/response-to-nyt-data-demands/
        
       | sschueller wrote:
       | Has anyone noticed that OpenAI has become "lazy"? When I ask
       | questions now it will not give me a complete file or fix. Instead
       | it tells me what I should do and I need to ask a second or third
       | time to just do the thing I asked.
       | 
       | I don't see this happening with for example deepseek.
       | 
       | Is it possible they are saving on resources by having it answer
       | that way?
        
         | tedsanders wrote:
         | Yeah, our models are sometimes too lazy. It's not intentional,
         | and future models will be less lazy.
         | 
         | When I worked at Netflix I sometimes heard the same speculation
         | about intentionally bad recommendations, which people theorized
         | would lower streaming and increase profit margins. It made even
         | less sense there as streaming costs are usually less than a
         | penny. In reality, it's just hard to make perfect products!
         | 
         | (I work at OpenAI.)
        
           | ukblewis wrote:
           | Please be careful about the alternative. I've seen o3 doing
           | excessive tool calls and research for relatively simple
           | problems.
        
       | polskibus wrote:
       | Is this a reaction to Apple paper showing that reasoning models
       | don't really reason?
        
         | anothermathbozo wrote:
         | Why would that be?
        
       | nikcub wrote:
       | fyi the price drop has been updated in Cursor:
       | 
       | https://x.com/cursor_ai/status/1932484008816050492
        
       | BeetleB wrote:
       | Why does OpenAI require me to verify my "organization" (which
       | requires my state issued ID) to use o3?
        
         | bearjaws wrote:
         | Prevent Deepseek R2 being trained on it
        
           | piskov wrote:
           | If only there were people with multiple passports or, I don't
           | know, Kyrgyzstan.
           | 
           | How exactly will passport check prevent any training?
           | 
           | At most this will block API access to your average Ivan, not
           | a state actor
        
             | BeetleB wrote:
             | Yeah, I just don't see myself using o3 when I have
             | Gemini-2.5 Pro. I don't recall if Google Cloud verified my
             | ID in the past, though. Still, no need to let yet another
             | organization have my data if I'm not getting something
             | _better_ in return.
        
         | valleyer wrote:
         | Don't bother anyway. There are lots of cases of people trying
         | and failing to go through the process, and there is no way to
         | try a second time.
         | 
         | https://community.openai.com/t/session-expired-verify-organi...
         | 
         | https://community.openai.com/t/callback-from-persona-id-chec...
         | 
         | https://community.openai.com/t/verification-issue-on-second-...
         | 
         | https://community.openai.com/t/verification-not-working-and-...
        
       | OutOfHere wrote:
       | o3 is very much needed in VSCode GitHub CoPilot for
       | Ask/Edit/Agent modes. It is sorely missing there.
        
       | alliao wrote:
       | it used to take decades of erosion to make google search a hot
       | mess, now that everything's happening in light speed, we get days
       | for AI models to decay to the point of hot mess again..
        
       | godelski wrote:
       | For those wondering                 Yesterday:
       | Today       -------------           -------------       Price
       | Price       Input:                  Input:       $10.00 / 1M
       | tokens      $2.00 / 1M tokens       Cached input:
       | Cached input:       $2.50 / 1M tokens       $0.50 / 1M tokens
       | Output:                 Output:       $40.00 / 1M tokens
       | $8.00 / 1M tokens
       | 
       | https://archive.is/20250610154009/https://openai.com/api/pri...
       | 
       | https://openai.com/api/pricing/
        
       | JojoFatsani wrote:
       | O3 is really good. I haven't had the same results with o4
       | unfortunately
        
       ___________________________________________________________________
       (page generated 2025-06-10 23:00 UTC)