[HN Gopher] OpenAI dropped the price of o3 by 80%
___________________________________________________________________
OpenAI dropped the price of o3 by 80%
Author : mfiguiere
Score : 222 points
Date : 2025-06-10 17:41 UTC (5 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| minimaxir wrote:
| ...how? I'd understand a 20-30% price drop from infra
| improvements for a model as-is, but 80%?
|
| I wonder if "we quantized it lol" would classify as false
| advertising for modern LLMs.
| tofof wrote:
| Presumably because the price was about 5x higher to begin with
| than any the competitors at the same tier of performance?
| Perhaps it's better to get paid anything at all than to just
| lose 100% of the customers.
| drexlspivey wrote:
| Deepseek made a few major innovations allowing them to achieve
| major compute efficiency and then published them. My guess is
| that OpenAI just implemented these themselves.
| vitaflo wrote:
| Wouldn't surprise me. And even with this price cut it's still
| 4x more expensive than Deepseek R1 is.
| ilaksh wrote:
| Maybe because they also are releasing o3-pro.
| MallocVoidstar wrote:
| Note that they have not actually dropped the price yet:
| https://x.com/OpenAIDevs/status/1932463601119637532
|
| > We'll post to @openaidevs once the new pricing is in full
| effect. In $10... 9... 8...
|
| There is also speculation that they are only dropping the input
| price, not the output price (which includes the reasoning
| tokens).
| sunaookami wrote:
| I think that was a joke. New pricing is already in place:
|
| Input: $2.00 / 1M tokens
|
| Cached input: $0.50 / 1M tokens
|
| Output: $8.00 / 1M tokens
|
| https://openai.com/api/pricing/
|
| Now cheaper than gpt-4o and same price as gpt-4.1 (!).
| rvnx wrote:
| It is slower though
| MallocVoidstar wrote:
| No, people had tested it after Altman's announcement and had
| confirmed that they were still being billed at the original
| price. And I checked the docs ~1h after and they still showed
| the original price.
|
| The speculation of only input pricing being lowered was
| because yesterday they gave out vouchers for 1M free _input_
| tokens while output tokens were still billed.
| runako wrote:
| > Now cheaper than gpt-4o and same price as gpt-4.1 (!).
|
| This is where the naming choices get confusing. "Should" o3
| cost more or less than GPT-4.1? Which is more capable? A
| generation 3 of tech intuitively feels less advanced than a
| 4.1 of a (similar) tech.
| jacob019 wrote:
| Do we know parameter counts? The reasoning models have
| typically been cheaper per token, but use more tokens.
| Latency is annoying. I'll keep using gpt-4.1 for day-to-
| day.
| koakuma-chan wrote:
| o3 is a reasoning model, GPT-4.1 is not. They are
| orthogonal.
| runako wrote:
| My quibble is with naming choices and differentiating.
| Even here they are confusing:
|
| - o4 is reasoning
|
| - 4o is not
|
| They simply do not do a good job of differentiating.
| Unless you work directly in the field, it is likely not
| obvious what is the difference between "our most powerful
| reasoning model" and "our flagship model for complex
| tasks."
|
| "Does my complex task need reasoning or not?" seems to be
| how one would choose. (What type of task is complex but
| does not require any reasoning?) This seems less than
| ideal!
| koakuma-chan wrote:
| This is true, and I believe apps automatically route
| requests to appropriate models for normie users.
| agsqwe wrote:
| thinking models produce a lot of internal output tokens
| making them more expensive than non-reasoning models for
| similar prompt and visible output lengths
| vitaflo wrote:
| Still 4x more expensive than Deepseek R1 tho.
| teaearlgraycold wrote:
| Personally I've found these bigger models (o3/Claude 4 Opus) to
| be disappointing for coding.
| apwell23 wrote:
| i found them all disappointing in their own ways. Atleast
| deepseek models actually listen to what i say instead of
| ignoring me doing their own thing like a toddler.
| rvnx wrote:
| Opus is really great but through Claude Code. If you used
| Cursor or RooCode it could be normal that you get disappointed
| bitpush wrote:
| This matches my experience, but cant explain it. Do you know
| what's going on?
| eunoia wrote:
| My understanding is context size. Companies like Cursor are
| trying to minimize the amount of context sent to the models
| to keep their own costs down. Claude Code seems to send a
| lot more context with every request and that seems to make
| the difference.
| supermdguy wrote:
| Just guessing, but the new Opus was probably RL tuned to
| work better with Claude Code's tool calls
| jedisct1 wrote:
| I got the opposite experience. Not with Opus (too expensive),
| but with Sonnet. I got things done way more efficiently when
| using Sonnet with Roo than with Claude Code.
| rgbrenner wrote:
| same. i ran a few tests ($100 worth of api calls) with opus
| 4 and didn't see any difference compared to sonnet 4 other
| than the price.
|
| also no idea why he thinks roo is handicapped when claude
| code nerfs the thinking output and requires typing
| "think"/think hard/think harder/ultrathink just to expand
| the max thinking tokens.. which on ultrathink only sets it
| at 32k... when the max in roo is 51200 and it's just a
| setting.
| behnamoh wrote:
| how do we know it's not a quantized version of o3? what's
| stopping these firms from announcing the full model to perform
| well on the benchmarks and then gradually quantizing it (first at
| Q8 so no one notices, then Q6, then Q4, ...).
|
| I have a suspicion that's how they were able to get gpt-4-turbo
| so fast. In practice, I found it inferior to the original GPT-4
| but the company probably benchmaxxed the hell out of the turbo
| and 4o versions so even though they were worse models, users
| found them more pleasing.
| esafak wrote:
| Are there any benchmarks that track historical performance?
| behnamoh wrote:
| good question, and I don't know of any, although it's a no
| brainer that someone should make it.
|
| a proxy to that may be the anecdotal evidence of users who
| report back in a month that model X has gotten dumber
| (started with gpt-4 and keeps happening, esp. with Anthro and
| OpenAI models). I haven't heard such anecdotal stories about
| Gemini, R1, etc.
| SparkyMcUnicorn wrote:
| Aider has one, but it hasn't been updated in months. People
| kept claiming models were getting worse, but the results
| proved that they weren't.
| esafak wrote:
| https://aider.chat/docs/leaderboards/by-release-date.html
| __mharrison__ wrote:
| Updated yesterday... https://aider.chat/docs/leaderboards/
| vitaflo wrote:
| That Deepseek price is always hilarious to see in these
| charts.
| SparkyMcUnicorn wrote:
| That's not the one I'm referring to. See my other
| comments or your sibling comment.
| benterix wrote:
| > users found them more pleasing.
|
| _Some_ users. For me the drop was so huge it became almost
| unusable for the things I had used it for.
| behnamoh wrote:
| Same here. One of my apps straight out stopped working
| because the gpt-4o outputs were noticeably worse than the
| gpt-4 that I built the app based on.
| lispisok wrote:
| I swear every time a new model is released it's great at first
| but then performance gets worse over time. I figured they were
| fine-tuning it to get rid of bad output which also nerfed the
| really good output. Now I'm wondering if they were quantizing
| it.
| nabla9 wrote:
| It seems that least Google is overselling their compute
| capacity.
|
| You pay monthly fee, but Gemini is completely jammed 5-6
| hours when North America is working.
| baq wrote:
| Gemini is simply that good. I'm trying out Claude 4 every
| now and then and go back to Gemini to fix its mess...
| fasterthanlime wrote:
| Funny, I have the exact opposite experience! I use Claude
| to fix Gemini's mess.
| symfoniq wrote:
| Maybe LLMs just make messes.
| hgomersall wrote:
| I heard that, but I'm getting consistent garbage from
| Gemini.
| dayjah wrote:
| For code? Use the context7 mcp.
| energy123 wrote:
| Gemini is the best model in the world. Gemini is the
| worst web app in the world. Somehow those two things are
| coexisting. The web devs in their UI team have really
| betrayed the hard work of their ML and hardware
| colleagues. I don't say this lightly - I say this after
| having paid attention to critical bugs, more than I can
| count on one hand, that persisted for over a year. They
| either don't care or are grossly incompetent.
| thorum wrote:
| Try AI Studio if you haven't already:
| https://aistudio.google.com/
| koakuma-chan wrote:
| https://ai.dev
| nabla9 wrote:
| Well said.
|
| Google is best in pure AI research, both quality and
| volume. They have sucked at productization for years. Not
| not just AI but other products as well. Real mystery.
| energy123 wrote:
| I don't understand why they can't just make it fast and
| go through the bug reports from a year ago and fix them.
| Is it that hard to build a box for users to type text
| into without it lagging for 5 seconds or throwing a bunch
| of errors?
| edzitron wrote:
| When you say "jammed," how do you mean?
| solfox wrote:
| I have seen this behavior as well.
| mhitza wrote:
| That was my suspicion when I first deleted my account, when
| it felt the output got worse in ChatGPT and I found highly
| suspicious when I saw an errand davinci model keyword in the
| chatgpt url.
|
| Now I'm feeling similarly with their image generation (which
| is the only reason I created a paid account two months ago,
| and the output looks more generic by default).
| Tiberium wrote:
| I've heard lots of people say that, but no objective
| reproducible benchmarks confirm such a thing happening often.
| Could this simply be a case of novelty/excitement for a new
| model fading away as you learn more about its shortcomings?
| 85392_school wrote:
| I think it's an illusion. People have been claiming it
| since the GPT-4 days, but nobody's ever posted any good
| evidence to the "model-changes" channel in Anthropic's
| Discord. It's probably just nostalgia.
| herval wrote:
| there's definitely measurements (eg
| https://hdsr.mitpress.mit.edu/pub/y95zitmz/release/2 ) but
| I imagine they're rare because those benchmarks are
| expensive, so nobody keeps running them all the time?
|
| Anecdotally, it's quite clear that some models are
| throttled during the day (eg Claude sometimes falls back to
| "concise mode" - with and without a warning on the app).
|
| You can tell if you're using Windsurf/Cursor too - there
| are times of the day where the models constantly fail to do
| tool calling, and other times they "just work" (for the
| same query).
|
| Finally, there's cases where it was confirmed by the
| company, like Gpt-4o's sycopanth tirade that very clearly
| impacted its output (https://openai.com/index/sycophancy-
| in-gpt-4o/)
| drewnick wrote:
| I feel this too. I swear some of the coding Claude Code
| does on weekends is superior to the weekdays. It just has
| these eureka moments every now and then.
| herval wrote:
| Claude has been particularly bad since they released 4.0.
| The push to remove 3.7 from Windsurf hasn't helped
| either. Pretty evident they're trying to force people to
| pay for Claude Code...
|
| Trusting these LLM providers today is as risky as
| trusting Facebook as a platform, when they were pushing
| their "opensocial" stuff
| Deathmax wrote:
| Your linked article is specifically comparing two
| different versioned snapshots of a model and not
| comparing the same model across time.
|
| You've also made the mistake of conflating what's served
| via API platforms which are meant to be stable, and
| frontends which have no stability guarantees, and are
| very much iterated on in terms of the underlying model
| and system prompts. The GPT-4o sycophancy debacle was
| only on the specific model that's served via the ChatGPT
| frontend and never impacted the stable snapshots on the
| API.
|
| I have never seen any sort of compelling evidence that
| any of the large labs tinkers with their stable,
| versioned model releases that are served via their API
| platforms.
| herval wrote:
| Please read it again. The article is clearly comparing
| gpt4 to gpt4, and gpt3.5 to gpt3.5, in march vs june 2023
| Deathmax wrote:
| I did read it, and I even went to their eval repo.
|
| > At the time of writing, there are two major versions
| available for GPT-4 and GPT-3.5 through OpenAI's API, one
| snapshotted in March 2023 and another in June 2023.
|
| openaichat/gpt-3.5-turbo-0301 vs
| openaichat/gpt-3.5-turbo-0613, openaichat/gpt-4-0314 vs
| openaichat/gpt-4-0613. Two _distinct_ versions of the
| model, and not the _same_ model over time like how people
| like to complain that a model gets "nerfed" over time.
| glitch253 wrote:
| Cursor / Windsurf's degraded functionality is exactly why
| I created my own system:
|
| https://github.com/mpfaffenberger/code_puppy
| Kranar wrote:
| I used to think the models got worse over time as well but
| then I checked my chat history and what I noticed isn't
| that ChatGPT gets worse, it's that my standards and
| expectations increase over time.
|
| When a new model comes out I test the waters a bit with
| some more ambitious queries and get impressed when it can
| handle them reasonably well. Over time I take it for
| granted and then just expect it to be able to handle ever
| more complex queries and get dissappointed when I hit a new
| limit.
| echelon wrote:
| Re-run your historical queries, or queries that are
| similarly shaped.
| throwaway314155 wrote:
| Sounds like a _whole_ thing.
| sakesun wrote:
| They could cache that :)
| bobxmax wrote:
| My suspicion is it's the personalization. Most people have
| things like 'memory' on, and as the models increasingly
| personalize towards you, that personalization is hurting
| quality rather than helping it.
|
| Which is why the base model wouldn't necessarily show
| differences when you benchmarked them.
| cainxinth wrote:
| I assumed it was because the first week revealed a ton of
| safety issues that they then "patched" by adjusting the
| system prompt, and thus using up more inference tokens on
| things other than the user's request.
| JamesBarney wrote:
| I'm pretty sure this is just a psychological phenomenon. When
| a new model is released all the capabilities the new model
| has that the old model lacks are very salient. This makes it
| seem amazing. Then you get used to the model, push it to the
| frontier, and suddenly the most salient memories of the new
| model are it's failures.
|
| There are tons of benchmarks that don't show any regressions.
| Even small and unpublished ones rarely show regressions.
| risho wrote:
| Quantization is a massive efficiency gain for near negligible
| drop in quality. If the tradeoff is quantization for an 80
| percent price drop I would take that any day of the week.
| behnamoh wrote:
| > for near negligible drop in quality
|
| Hmm, that's evidently and anecdotally wrong:
|
| https://github.com/ggml-org/llama.cpp/discussions/4110
| spiderice wrote:
| You may be right that the tradeoff is worth it, but it should
| be advertised as such. You shouldn't think you're paying for
| full o3, even if they're heavily discounting it.
| CSMastermind wrote:
| This is almost certainly what they're doing and rebranding the
| original o3 model as "o3-pro"
| behnamoh wrote:
| > rebranding the original o3 model as "o3-pro"
|
| interesting take, I wouldn't be surprised if they did that.
| anticensor wrote:
| -pro models appear to be a best-of-10 sampling of the
| original full size model
| Szpadel wrote:
| how do you sample it behind the scenes? usually best of X
| means you generate X outputs and you choose best result.
|
| if you could do this automatically, it would be game
| changer as you could run top 5 best models in parallel and
| select best answer every time
|
| but it's not practical because you are the bottleneck as
| you have to read all 5 solutions and compare them
| joshstrange wrote:
| I think the idea is they use another/same model to judge
| all the results and only return the best one to the user.
| anticensor wrote:
| > if you could do this automatically, it would be game
| changer as you could run top 5 best models in parallel
| and select best answer every time
|
| remember they have access to the RLHF reward model,
| against which they can evaluate all N outputs and have
| the most "rewarded" answer picked and sent
| tedsanders wrote:
| Nope, not what we're doing.
|
| o3 is still o3 (no nerfing) and o3-pro is new and better than
| o3.
|
| If we were lying about this, it would be really easy to catch
| us - just run evals.
|
| (I work at OpenAI.)
| bn-l wrote:
| Not quantized?
| tedsanders wrote:
| Not quantized. Weights are the same.
|
| If we did change the model, we'd release it as a new
| model with a new name (e.g., o3-turbo-2025-06-10). It
| would be very annoying to customers if we ever silently
| changed models, so we never do this [1].
|
| [1] `chatgpt-4o-latest` being an explicit exception
| MattDaEskimo wrote:
| What's with the dropped benchmark performance compared to
| the original o3 release? It was disappointing to not see
| o4-mini on it as well
| ants_everywhere wrote:
| Is this what happened to Gemini 2.5 Pro? It used to be very
| good, but it's started struggling on basic tasks.
|
| The thing that gets me is it seems to be lying about fetching a
| web page. It will say things are there that were never on any
| version of the page and it sometimes takes multiple screenshots
| of the page to convince it that it's wrong.
| SparkyMcUnicorn wrote:
| The Aider discord community has proposed and disproven the
| theory that 2.5 Pro became worse, several times, through many
| benchmark runs.
|
| It had a few bugs here or there when they pushed updates, but
| it didn't get worse.
| ants_everywhere wrote:
| Gemini is objectively exhibiting new behavior with the same
| prompts and that behavior is unwelcome. It includes
| hallucinating information and refusing to believe it's
| wrong.
|
| My question is not whether this is true (it is) but why
| it's happening.
|
| I am willing to believe the aider community has found that
| Gemini has maintained approximately equivalent performance
| on fixed benchmarks. That's reasonable considering they
| probably use a/b testing on benchmarks to tell them whether
| training or architectural changes need to be reverted.
|
| But all versions of aider I've tested, including the most
| recent one, don't handle Gemini correctly so I'm skeptical
| that they're the state of the art with respect to bench-
| marking Gemini.
| SparkyMcUnicorn wrote:
| Gemini 2.5 Pro is the highest ranking model on the aider
| benchmarks leaderboard.
|
| For benchmarks, either Gemini writes code that adheres to
| the required edit format, builds successfully, and passes
| unit tests, or it doesn't.
|
| I primarily use aider + 2.5 pro for planning/spec files,
| and occasionally have it do file edits directly. Works
| great, other than stopping it mid-execution once in a
| while.
| jstummbillig wrote:
| You can just give it a go for very little money (in Windsurf
| it's 1x right now), and see what it does. There is no room for
| conspiracy here, because you can simple look at what it does.
| If you don't like it, so won't others, and then people will not
| use it. People are obviously very capable of (collectively)
| forming opinions on models, and then vote with their wallet.
| resters wrote:
| It's probably optimized in some way, but if the optimizations
| degrade performance, let's hope it is reflected in various
| benchmarks. One alternative hypothesis is that it's the same
| model, but in the early days they make it think "harder" and
| run a meta-process to collect training data for reinforcement
| learning for use on future models.
| SparkyMcUnicorn wrote:
| It's a bit dated now, but it would be cool if people
| submitted PRs for this one:
| https://aider.chat/docs/leaderboards/by-release-date.html
| __mharrison__ wrote:
| Dated? This was updated yesterday
| https://aider.chat/docs/leaderboards/
| SparkyMcUnicorn wrote:
| My link is to the benchmark results _over time_.
|
| The main leaderboard page that you linked to is updated
| quite frequently, but it doesn't contain multiple
| benchmarks for the same exact model.
| carter-0 wrote:
| An OpenAI researcher claims it's the exact same model on X:
| https://x.com/aidan_mclau/status/1932507602216497608
| segmondy wrote:
| you don't, so run your own model.
| EnPissant wrote:
| The API lists o3 and o3-2025-04-16 as the same thing with the
| same price. The date based models are set in stone.
| hyperknot wrote:
| I got 700+ tokens/sec on o3 after the announcement, I suspect
| it's very much a quantized version.
|
| https://x.com/hyperknot/status/1932476190608036243
| dist-epoch wrote:
| Or maybe they just brought online much faster much cheaper
| hardware.
| zackangelo wrote:
| Is that input tokens or output tokens/s?
| Bjorkbat wrote:
| Related, when o3 finally came out ARC-AGI updated their graph
| because it didn't perform nearly as well as the version of o3
| that "beat" the benchmark.
|
| https://arcprize.org/blog/analyzing-o3-with-arc-agi
| ctoth wrote:
| From the announcement email:
|
| > Today, we dropped the price of OpenAI o3 by 80%, bringing the
| cost down to $2 / 1M input tokens and $8 / 1M output tokens.
|
| > We optimized our inference stack that serves o3--this is the
| same exact model, just cheaper.
| smusamashah wrote:
| Hw about testing same input vs output with same seed on
| different dates. If its a different model it will return
| different output.
| zomnoys wrote:
| Isn't this not true since these models run with a non-zero
| temperature?
| smusamashah wrote:
| You can set the temperature too.
| luke-stanley wrote:
| I think the API has some special IDs to check for
| reproducibility of the environment.
| visiondude wrote:
| always seemed to me that efficient caching strategies could
| greatly reduce costs... wonder if they cooked up something new
| xmprt wrote:
| How are LLMs cached? Every prompt would be different so it's
| not clear how that would work. Unless you're talking about
| caching the model weights...
| koakuma-chan wrote:
| A lot of the prompt is always the same: the instructions, the
| context, the codebase (if you are coding), etc.
| amanda99 wrote:
| You would use a KV cache to cache a significant chunk of the
| inference work.
| biophysboy wrote:
| Do you mean that they provide the same answer to verbatim-
| equivalent questions, and pull the answer out of storage
| instead of recalculating each time? I've always wondered if
| they did this.
| koakuma-chan wrote:
| The prompt may be the same but the seed is different
| every time.
| biophysboy wrote:
| Could you not cache the top k outputs given a provided
| input token set? I thought the randomness was applied at
| the end by sampling the output distribution.
| Traubenfuchs wrote:
| I bet there is a set of repetitive single, or two,
| question user requests that makes out a sizeable amount
| of all requests. The models are so expensive to run, 1%
| would be enough. Much less than 1%. To make it less
| obvious they probably have a big set of response
| variants. I don't see how they would not do this.
|
| They probably also have cheap code or cheap models that
| normalize requests to increase cache hit rate.
| hadlock wrote:
| I've asked it a question not in it's dataset three different
| ways and I see the same three sentences in the response, word
| for word, which could imply it's caching the core answer. I
| hadn't previously seen this behavior before this last week.
| HugoDias wrote:
| This document explains the process very well. It's a good
| read: https://platform.openai.com/docs/guides/prompt-caching
| xmprt wrote:
| That link explains how OpenAI uses it, but doesn't really
| walk through how it's any faster. I thought the whole point
| of transformers was that inference speed no longer depended
| on prompt length. So how does caching the prompt help
| reduce latency if the outputs aren't being cached.
|
| > Regardless of whether caching is used, the output
| generated will be identical. This is because only the
| prompt itself is cached, while the actual response is
| computed anew each time based on the cached prompt
| tasuki wrote:
| > Every prompt would be different
|
| No? Eg "how to cook pasta" is probably asked a lot.
| candiddevmike wrote:
| It's going to be a race to the bottom, they have no moat.
| rvnx wrote:
| Especially now that they are second in the race (behind
| Anthropic) and lot of free-to-download and free-to-use models
| are now starting to be viable competitors.
|
| Once new MacBooks and iPhones have enough memory onboard this
| is going to be a disaster for OpenAI and other providers.
| mattnewton wrote:
| I'm not sure they're scared of Anthropic - they're doing
| great work but afaict running into some scaling issues and
| really focused on winning over developers at the moment.
|
| If I was OpenAI (or Anthropic for that matter) I would remain
| scared of Google, who is now awake and able to dump Gemini
| 2.5 pro on the market at costs that I'm not sure people
| without their own hardware can compete with, and with the
| infrastructure to handle everyone switching to them tomorrow.
| itomato wrote:
| Codex Research Preview appeared in my account in the early
| AM.
| piuantiderp wrote:
| Google is going to lap them. The hardware muscle they have
| has not even started flexing
| koakuma-chan wrote:
| What do you mean, Google is number 1
| aerhardt wrote:
| OpenAI are second in the race to Anthropic in some benchmarks
| (maybe?), but OpenAI still dwarves Anthropic in distribution
| and popularity.
| ratedgene wrote:
| That's slowly changing. I know some relatively non-tech
| savvy young people using things like Claude for various
| reasons, so people are exploring options.
| jstummbillig wrote:
| Very, _very_ slowly.
|
| OpenAI vs Anthropic on Google Trends
|
| https://trends.google.com/trends/explore?date=today%203-m
| &q=...
|
| ChatGPT vs Claude on Google Trends
|
| https://trends.google.com/trends/explore?date=today%203-m
| &q=...
| rvnx wrote:
| This is such a big difference, thank you for sharing it,
| I didn't expect the gap to be _that_ huge
| sndean wrote:
| I wonder how much of this is brand name? Like Kleenex.
| Non-tech people might not search for LLM, generative AI,
| etc. ChatGPT may just be what people have heard of. I'm
| assuming OpenAI has a large advantage over Anthropic, and
| the name helps, but I bet the name is exaggerating the
| difference here a bit. Not everyone buys Kleenex branded
| Kleenex.
| jdprgm wrote:
| While mac unified ram inference is great for prosumers+ I
| really don't foresee Apple making 128GB+ options affordable
| enough to be attractive for inference for the general public.
| iPhone even less so considering the latest is only at 8GB.
| Meanwhile the best model sizes will just keep growing.
| paxys wrote:
| Third behind Anthropic/Google. People are too quick to
| discount mindshare though. For the vast majority of the
| world's population AI = LLM = ChatGPT, and that itself will
| keep OpenAI _years_ ahead of the competition as long as they
| don 't blunder away that audience.
| slashdev wrote:
| Thrid for coding, after Anthropic, and Gemini, which was
| leading last I checked.
| joshuajooste05 wrote:
| There was an article on here a week or two ago on batch
| inference.
|
| Do you not think that batch inference gives at least a bit of a
| moat whereby unit costs fall with more prompts per unit of
| time, especially if models get more complicated and larger in
| the future?
| minimaxir wrote:
| Batch inference is not exclusive to OpenAI.
| mrweasel wrote:
| My understanding was that OpenAI couldn't make money at their
| previous price point, and I don't think operation and training
| cost have gone down sufficiently to make up for those short
| comings. So how are they going to make money by lowering the
| price by 80%?
|
| I get the point is to be the last man standing, and poaching
| customers by lowering the price, and perhaps attract a few
| people who wouldn't have bought a subscription at the higher
| price. I just question how long investors can justify pouring
| money into OpenAI. OpenAI is also the poster child for modern
| AI, so if they fail the market will react badly.
|
| Mostly I don't understand Silicon Valley venture capital, but
| dumping price, making wild purchases for investor money and
| mostly only leading on branding, why isn't this a sign that
| OpenAI is failing?
| simonw wrote:
| OpenAI's Adam Groth credits "engineers optimizing
| inferencing" for the price drop:
| https://twitter.com/TheRealAdamG/status/1932440328293806321
|
| That seems likely to me, all of the LLM providers have been
| consistently finding new optimizations for the past couple of
| years.
| m3kw9 wrote:
| LLM inferencing is race to the bottom but the service layers on
| top isn't. People always pay much more for convenience, those
| are the thing OpenAI focuses on and is harder to replicate
| Szpadel wrote:
| for sure they are no longer clear winners, but they try to be
| just barely on top of others.
|
| right now new Gemini surpassed their o3 (barely) in benchmarks
| for significantly less money so they cut pricing to be still
| competitive.
|
| I bet they didn't released o4 not because it's not competitive,
| but because they are doing Nvidia game: release new product
| that is just enough better to convince people to buy it. so IMO
| they are holding full o4 model to have something to release
| after competition release something better that their top horse
| ninetyninenine wrote:
| You know. because LLMs can only be built by corporations... but
| because they're so easy to build, I see the price going down
| massively thanks to competition. Consumers benefit because all
| the companies are trying to out run each other.
| codr7 wrote:
| And then they all go out of business, since models cost a
| fortune to build, and their fan club is left staring at their
| computers trying to remember how to do anything without getting
| it served on a silver plate.
| merth wrote:
| Investors pouring money, its probably impossible to go out of
| business, at least for the big ones, until investors realise
| this is wrong hill to die on.
| codr7 wrote:
| Which they will eventually; so the point stands, no matter
| how unpopular with the AI excusers out there.
| wrsh07 wrote:
| I expect they don't go out of business: at worst they don't
| start their next training run quite as aggressively and
| instead let their new very good model be profitable for a
| minute
|
| Many many companies are currently thrilled to pay the current
| model prices for no performance improvement for 2-3 years
|
| We still have so many features to build on top of current
| capabilities
| croes wrote:
| Easy doesn't mean cheap.
|
| They need lots of energy and customers don't pay much, if they
| pay at all
| briian wrote:
| Exactly,
|
| The developers of AI models do have a moat, the cost of
| training the model in the first place.
|
| It's 90% of the low effort AI wrappers with little to no
| value add who have no moat.
| koakuma-chan wrote:
| OpenAI dropped the price by so much that the server also went
| down.
| pbasista wrote:
| Is the price drop really the reason for their recent outage?
|
| Or is the price drop an attempt to cover up bad news about the
| outage with news about the price drop?
| johanyc wrote:
| > Or is the price drop an attempt to cover up bad news about
| the outage with news about the price drop?
|
| This makes no sense. No way a global outage will get less
| coverage than the price drop.
|
| Also the earliest sign of price drop is this tweet 20 hrs ago
| (https://x.com/OpenAIDevs/status/1932248668469445002), which
| is earlier than the earliest outage reports 13hrs ago on
| https://downdetector.com/status/openai/
| koakuma-chan wrote:
| > No way a global outage will get less coverage than the
| price drop.
|
| Have you seen today's outage on any news outlet? I have
| not. Is there an HN thread?
| biophysboy wrote:
| I don't know if this is OpenAI's intention, but the little
| message "you've reached your usage limit!" is actively
| disincentivizing me from subscribing. For my purposes, the free
| model is more than good enough; the difference before and after
| is negligible. I honestly wouldn't pay a dollar.
|
| That said, I'm absolutely willing to hear people out on "value-
| adds" I am missing out on; I'm not a knee-jerk hater (For
| context, I work with large, complex & private
| databases/platforms, so its not really possible for me to do
| anything but ask for scripting suggestions).
|
| Also, I am 100% expecting a sad day when I'll be forced to
| subscribe, unless I want to read dick pill ads shoehorned in to
| the answers (looking at you, YouTube). I do worry about getting
| dependent on this tool and watching it become enshittified.
| Traubenfuchs wrote:
| > "you've reached your usage limit!"
|
| Just switch to a competitors free offering. There are enough to
| cycle through not to be hindered by limits. I wonder how much
| money I have cost those companies by now?
|
| How anyone believes there is any moat for anyone here is beyond
| me.
| wrsh07 wrote:
| I expect the answer is <$1 as someone who shares a discord
| server with a friend where we egregiously ping the models
| wrsh07 wrote:
| o3 is so good it's worth paying for a minute (just for plus)
| just to see what it's like
|
| I've never used anything like it. I think new Claude is
| similarly capable
| lvl155 wrote:
| Google has been catching up. Funny how fast this space is
| evolving. Just a few months ago, it was all about DeepSeek.
| bitpush wrote:
| Many would say Google's Gemini models are SOTA, although Claude
| seems to be doing well with coding tasks.
| snarf21 wrote:
| Gemini has been better than Claude for me on a coding
| project. Claude kept telling me it update some code but the
| update wasn't in the output. Like, I had to re-prompt just
| for updated output 5 times in a row.
| jacob019 wrote:
| I break out Gemini 2.5 pro when Claude gets stuck, it's
| just so slow and verbose. Claude follows instructions
| better and seems to better understand it's role in agentic
| workflows. Gemini does something different with the
| context, it has a deeper understanding of the control flow
| and can uncover edge case bugs that Claude misses. o3 seems
| better at high level thinking and planning, questioning if
| it should it be done and whether the challenge actually
| matches the need. They're kind of like colleagues with
| unique strengths. o3 does well with a lot of things, I just
| haven't used it as much because of the cost. Will probably
| use it more now.
| johan914 wrote:
| I have been using Google's models the past couple months, and
| was surprised to see how sycophantic chatGPT is now. It's not
| just at the start or end of responses, it's interspaced within
| the markdown, with little substance. Asking it to change its
| style makes it overuse technical terms.
| lxgr wrote:
| Is there also a corresponding increase in weekly messages for
| ChatGPT Plus users with o3?
|
| In my experience, o4-mini and o4-mini-high are far behind o3 in
| utility, but since I'm rate-limited for the latter, I end up
| primarily using the former, which has kind of reinforced the
| perception that OpenAI's thinking models are behind the
| competition altogether.
| el_benhameen wrote:
| My usage has also reflected the pretty heavy rate limits on o3.
| I find o4-mini-high to be quite good, but I agree that I would
| much rather use o3. Hoping this means an increase in the
| limits.
| coffeecoders wrote:
| Despite the popular take that LLMs have no moat and are burning
| cash, I find OpenAI's situation really promising.
|
| Just yesterday, they reported an annualized revenue run rate of
| 10B. Their last funding round in March valued them at 300B.
| Despite losing 5B last year, they are growing really fast - 30x
| revenue with over 500M active users.
|
| It reminds me a lot of Uber in its earlier years--fast growth,
| heavy investment, but edging closer to profitability.
| rgavuliak wrote:
| I don't think the no moat approach makes sense. In a world
| where more an more content and interaction is done with and via
| LLMs, the data of your users chatting with your LLM is a super
| valuable dataset.
| bitpush wrote:
| The problem is your costs also scale with revenue. Ideally you
| want to have control costs as you scale (the first you build is
| expensive, but as you make more your costs come down).
|
| For OpenAI, the more people use the product, the same you spend
| on compute unless they can supplement it with another ways of
| generating revenue.
|
| I dont unfortunately think OpenAI will be able to hit sustained
| profitability (see Netflix for another example)
| Legend2440 wrote:
| >(see Netflix for another example)
|
| Netflix has been profitable for over a decade though? They
| reported $8.7 billion in profit in 2024.
| amazingamazing wrote:
| They increased prices and are not selling a pure commodity
| tho
| aizk wrote:
| > sustained profitability (see Netflix for another example)
|
| What? Netflix is incredibly profitable.
| bitpush wrote:
| Probably a bad example from my part, but also because of
| increasing the costs and offering a tier with ads. I was
| mostly talking about the Netflix as it was originally
| concieved. "Give access to unlimited content at a flat
| fee", which didnt scale pretty well.
| whiplash451 wrote:
| Isn't this exactly what they offer today?
| tptacek wrote:
| All costs are not equal. There is a classic pattern of
| dogfights for winner-take-most product categories where the
| long term winner does the best job of _acquiring customers_
| at the expense of things like "engineering to reduce costs".
| I have no idea how the AI space is going to shake out, but if
| I had to pick between OpenAI's mindshare in the broadest
| possible cohort of users vs. best/most efficient model, I'd
| pick the customers.
|
| Obviously, lots of nerds on HN have preferences for Gemini
| and Claude, and having used all three I completely get why
| that is. But we should remember we're not representative of
| the whole addressable market. There were probably nerds on
| like ancient dial-up bulletin boards explaining why Betamax
| was going to win, too.
| awongh wrote:
| We don't even know yet if the model is the product though,
| and if OpenAI is the company that will make _the_ AI
| product /model, (chat that keeps expanding into other
| functionalities and capabilities) or will it be 10,000
| companies using the OpenAI models. (well, it's probably
| both, but in what proportion of revenue)
| tptacek wrote:
| Right, but it might not even matter if all the
| competitors are in the ballpark of the final
| product/market fit and OpenAI holds a commanding lead in
| customer acquisition.
|
| Again: I don't know. I've got no predictions. I'm just
| saying that the logic where OpenAI is outcompeted on
| models themselves and thus automatically lose does not
| hold automatically.
| Magmalgebra wrote:
| Anyone concerned about cost should remember that those costs
| are dropping exponenentially.
|
| Similarly, nearly all AI products but especially OpenAI are
| heavily _under_ monetized. OpenAI is an excellent personal
| shopper - the ad revenue that could be generated from that
| rivals Facebook or Google.
| smelendez wrote:
| It wouldn't surprise me if they try, but ironically if GPT
| is a good personal shopper, it might make it harder to
| monetize with ads because people will trust the bot's
| organic responses more than the ads.
|
| You could override its suggestions with paid ones, or nerf
| the bot's shopping abilities so it doesn't overshadow the
| sponsors, but that will destroy trust in the product in a
| very competitive industry.
|
| You could put user-targeted ads on the site not necessarily
| related to the current query, like ads you would see on
| Facebook, but if the bot is really such a good personal
| shopper, people are literally at a ChatGPT prompt when they
| see the ads and will use it to comparison shop.
| whiplash451 wrote:
| Alternative: let users reduce their monthly bill by
| accepting a sponsored answer with a dedicated button in
| the UI
|
| (with many potential variants)
| simonw wrote:
| "... as you make more your costs come down"
|
| I'd say dropping the price of o3 by 80% due to "engineers
| optimizing inferencing" is a strong sign that they're doing
| exactly that.
| marsten wrote:
| You raise a good point that this isn't a low marginal cost
| business like software, telecom, or (most of) the web.
| Efficiency will be a big advantage for companies that can
| achieve it, in part because it will let them scale to new AI
| use cases.
|
| With the race to get new models out the door, I doubt any of
| these companies have done much to optimize cost so far.
| Google is a partial exception - they began developing the TPU
| ten years ago and the rest of their infrastructure has been
| optimized over the years to serve computationally expensive
| products (search, gmail, youtube, etc.).
| ToucanLoucan wrote:
| I mean sure, it's very promising if OpenAI's future is your
| only metric. It gets notably darker if you look at the broader
| picture of ChatGPT (and company)'s impact on our society.
|
| * We have people uploading tons of zero-effort slop pieces to
| all manner of online storefronts, and making people less likely
| to buy overall because they assume everything is AI now
|
| * We have an uncomfortable community of, to be blunt, actual
| cultists emerging around ChatGPT, doing all kinds of shit from
| annoying their friends and family all the way up to divorcing
| their spouses
|
| * Education is struggling in all kinds of ways due to students
| using (and abusing) the tech, with already strained
| administrations struggling to figure out how to navigate it
|
| Like yeah if your only metric is OpenAI's particular line going
| up, it's looking alright. And much like Uber, it's success
| seems to be corrosive to the society in which it operates. Is
| this supposed to be good news?
| arealaccount wrote:
| Dying for a reference on the cult stuff, a quick search
| didn't provide anything interesting.
| ToucanLoucan wrote:
| Scroll through the ChatGPT subreddit right now and tell me
| there isn't a TON of people in there who are legitimately
| unwell. Reads like the back page notes of a dystopian
| novel.
| arandomhuman wrote:
| I think this is less caused by ChatGPT/LLMs and more of a
| phenomenon in social media circles where people flock to
| "the thing" and have poor social skills and mental health
| generally speaking.
| wizzwizz4 wrote:
| https://futurism.com/chatgpt-mental-health-crises, which
| references the more famous
| https://www.rollingstone.com/culture/culture-features/ai-
| spi... but is a newer article.
| MangoToupe wrote:
| In addition to what the parent commenter was likely
| referring to, there are also the Zizians:
| https://en.wikipedia.org/wiki/Zizians
| SlowTao wrote:
| Yes but in a typical western business sense they are merely
| optimizing for user engadgement and profits. What happens to
| society a decade from now because of all the slop being
| produced, that is not their concern. Facebook is just about
| connecting friends right, totally wont become a series of
| information moats and bubbles controlled by the algorithms...
|
| A great communicator on the risks of AI being to heavily
| intergrated into society is Zak Stein. As someone who works
| in education, they are see first hand how people are becoming
| dependent on this stuff rather than any kind of self
| improvement. The people who are just handing over all their
| thinking to the machine. It is very bizarre and I am seeing
| it in my personal experience a lot more over the last few
| months.
| seydor wrote:
| their moat is leaky because llm prices will be dropping forever
| and the only viable model will be a free model. Eventually
| everyone will catch up.
|
| Plus there is the thing that "thinking models" can't really
| solve complex tasks / aren't really as good as they are
| believed to be .
| Zaheer wrote:
| I would wager most of their revenue is from the subscriptions
| - both consumer and business. That pricing is detached from
| the API pricing. The heavy emphasis on applications more
| recently is because they realize this as well.
| therealdrag0 wrote:
| As an anecdote they have first mover advantage on me. I pay
| monthly but mostly because it's good enough and I can't be
| bothered to try a bunch out and switch. But if the dust settles
| and prices drop i would be motivated to switch. How much that
| matters maybe depends if their revenue comes from app users or
| API plans. And first mover only works once. Now they maybe
| coasting on name recognition, but otherwise new users maybe
| load balanced among all the options.
| unraveller wrote:
| I have no moat and I must make these GPUs scream.
| blueblisters wrote:
| This is the best model out there, priced level or lesser than
| Claude and Gemini
|
| They're not letting the competition breathe
| seydor wrote:
| when the race to the bottom reaches the bottom, the foundation
| model companies will be bought by ... energy companies. You 'll
| be paying for AI with your electricity bill
| paxys wrote:
| It'll be the opposite. Large tech companies are already running
| their own power plants.
| ramesh31 wrote:
| Anthropic will need to follow suit with Opus soon. It is simply
| too expensive for anything by an order of magnitude.
| madebywelch wrote:
| They could drop the price 100% and I still wouldn't use it, so
| long as they're retaining my data.
| simonw wrote:
| Sounds like you want their Zero Data Retention plan:
| https://platform.openai.com/docs/guides/your-data#zero-data-...
|
| (It's "contact us" pricing, so I have no idea how much that
| would set you back. I'm guessing it's not cheap.)
| scudsworth wrote:
| it doesn't seem like this would supercede a court order
| tech234a wrote:
| Actually it does according to
| https://openai.com/index/response-to-nyt-data-demands/
| sschueller wrote:
| Has anyone noticed that OpenAI has become "lazy"? When I ask
| questions now it will not give me a complete file or fix. Instead
| it tells me what I should do and I need to ask a second or third
| time to just do the thing I asked.
|
| I don't see this happening with for example deepseek.
|
| Is it possible they are saving on resources by having it answer
| that way?
| tedsanders wrote:
| Yeah, our models are sometimes too lazy. It's not intentional,
| and future models will be less lazy.
|
| When I worked at Netflix I sometimes heard the same speculation
| about intentionally bad recommendations, which people theorized
| would lower streaming and increase profit margins. It made even
| less sense there as streaming costs are usually less than a
| penny. In reality, it's just hard to make perfect products!
|
| (I work at OpenAI.)
| ukblewis wrote:
| Please be careful about the alternative. I've seen o3 doing
| excessive tool calls and research for relatively simple
| problems.
| polskibus wrote:
| Is this a reaction to Apple paper showing that reasoning models
| don't really reason?
| anothermathbozo wrote:
| Why would that be?
| nikcub wrote:
| fyi the price drop has been updated in Cursor:
|
| https://x.com/cursor_ai/status/1932484008816050492
| BeetleB wrote:
| Why does OpenAI require me to verify my "organization" (which
| requires my state issued ID) to use o3?
| bearjaws wrote:
| Prevent Deepseek R2 being trained on it
| piskov wrote:
| If only there were people with multiple passports or, I don't
| know, Kyrgyzstan.
|
| How exactly will passport check prevent any training?
|
| At most this will block API access to your average Ivan, not
| a state actor
| BeetleB wrote:
| Yeah, I just don't see myself using o3 when I have
| Gemini-2.5 Pro. I don't recall if Google Cloud verified my
| ID in the past, though. Still, no need to let yet another
| organization have my data if I'm not getting something
| _better_ in return.
| valleyer wrote:
| Don't bother anyway. There are lots of cases of people trying
| and failing to go through the process, and there is no way to
| try a second time.
|
| https://community.openai.com/t/session-expired-verify-organi...
|
| https://community.openai.com/t/callback-from-persona-id-chec...
|
| https://community.openai.com/t/verification-issue-on-second-...
|
| https://community.openai.com/t/verification-not-working-and-...
| OutOfHere wrote:
| o3 is very much needed in VSCode GitHub CoPilot for
| Ask/Edit/Agent modes. It is sorely missing there.
| alliao wrote:
| it used to take decades of erosion to make google search a hot
| mess, now that everything's happening in light speed, we get days
| for AI models to decay to the point of hot mess again..
| godelski wrote:
| For those wondering Yesterday:
| Today ------------- ------------- Price
| Price Input: Input: $10.00 / 1M
| tokens $2.00 / 1M tokens Cached input:
| Cached input: $2.50 / 1M tokens $0.50 / 1M tokens
| Output: Output: $40.00 / 1M tokens
| $8.00 / 1M tokens
|
| https://archive.is/20250610154009/https://openai.com/api/pri...
|
| https://openai.com/api/pricing/
| JojoFatsani wrote:
| O3 is really good. I haven't had the same results with o4
| unfortunately
___________________________________________________________________
(page generated 2025-06-10 23:00 UTC)