[HN Gopher] GPT-4.5: "Not a frontier model"?
___________________________________________________________________
GPT-4.5: "Not a frontier model"?
Author : pama
Score : 140 points
Date : 2025-03-02 14:47 UTC (8 hours ago)
(HTM) web link (www.interconnects.ai)
(TXT) w3m dump (www.interconnects.ai)
| tsunego wrote:
| GPT-4.5 feels like OpenAI's way of discovering just how much
| we'll pay for diminishing returns.
|
| The leap from GPT-4o to 4.5 isn't a leap--it's an expensive
| tiptoe toward incremental improvements, priced like a luxury item
| without the luxury payoff.
|
| With pricing at 15x GPT-4o, they're practically daring us not to
| use it. Given this, I wouldn't be surprised if GPT-4.5 quietly
| disappears from the API once OpenAI finishes squeezing insights
| (and cash) out of this experiment.
| hooverd wrote:
| They should have called it "ChatGPT Enterprise".
| tsunego wrote:
| Exactly! designed specifically for people who love burning
| corporate budgets.
| numba888 wrote:
| OpenAI is going to add it to Plus subscriptions. I.e.
| available for many at no additional cost. Likely with
| restrictions line N prompts/hour.
|
| As for API price, when it matters businesses and people are
| willing to pay much more for just a bit better results.
| OpenAI doesn't take the other options away. So we don't
| lose anything.
| fodkodrasz wrote:
| IMO the 4o output is lot more Enterprise-compatible, the 4.5
| being straight to the point and more natural is quite the
| opposite. Pricing-wise your point stands.
|
| Disclaimer: have not tried 4.5 yet, just skimmed through the
| announcement, using 4o regularly.
| TZubiri wrote:
| Time to enter the tick cycle.
|
| I ask chatgpt to give me a map highlighting all spanish
| speaking countries, gives me stable diffusion trash.
|
| Just gotta do the grunt work, add a tool with a map api.
| Integrate with google maps for transit stuff.
|
| It's a good LLM model already it doesn't need to be einstein
| and solve aerospatial equations. We just need to wait until
| they realize their limits and find the humility to build yet
| another useful product that won't conquer the world.
| Willingham wrote:
| I've thought of LLM's as google 2.0 for some time now. Truly
| a world changing technology similar to how google changed the
| world, likely to have an even larger impact than google had
| as we create highly specialized Implementations of the
| technology in the coming decade...but it's not energy
| positive nuclear fusion, or a polynomial time NP solver, it's
| just google 2.0
| dingnuts wrote:
| Google 2.0 where you have to check every answer it gives
| you because it's authoritative about nothing.
|
| Works great when the output is small enough to unit test or
| immediately try in situations with no possible negative
| outcomes.
|
| Anything larger? Skip the LLM slop and go to the source.
| You have to go to the source, anyway.
| CamperBob2 wrote:
| _You have to go to the source, anyway._
|
| Yeah, and then check that. I don't get this argument at
| all.
|
| People who uncritically swallow the first answer or two
| they get from Google have a name... but that would just
| derail the thread into politics.
| LPisGood wrote:
| There is something to be said for trusting people's (or
| systems of people's) authority.
|
| For example, have you ever personally verified that
| humans went to the moon? Have you ever done the
| experiments to prove the Earth is round?
| sillyfluke wrote:
| This is not a helpful phrasing I think. Sources allow the
| reader to go as far down the rabbit hole as they are
| willing to or knowledgable enough to go.
|
| For example, if I'm looking for some medical finding and
| I get to a source that's a clinical study from a
| reputable publication, I may be satisfied and stop there
| since this is not my area of expertise. However, a person
| with knowledge of the field may be able to parse the
| study and pick it apart better than I could. Hence, their
| search would not end there since they would be
| unsatisfied with just the source I was satisfied with.
|
| On the other hand, having no verifiable sources should
| leave everyone unsatisfied.
| LPisGood wrote:
| Of course, that verifiability is a big part of that
| trust. I'm not sure why you think my phrasing is not
| helpful; we seem to agree.
| rwiggins wrote:
| > Have you ever done the experiments to prove the Earth
| is round?
|
| I have, actually! Thanks, astronomy class!
|
| I've even estimated the earth's diameter, and I was only
| like 30% off (iirc). Pretty good for the simplistic
| method and rough measurements we used.
|
| Sometimes authorities are actually authoritative, though,
| particularly for technical, factual material. If I'm
| reading a published release date for a video game,
| directly from the publisher -- what is there to contest?
| Meanwhile, ask an LLM and you may have... mixed results,
| even if the date is within its knowledge cutoff.
| Spooky23 wrote:
| Have you provided documentation that you are human?
| Perhaps you are a lizard person sowing misinformation to
| firm up dominance of humankind.
| sillyfluke wrote:
| There is a truth in the grandparent's comment that
| doesn't necessarily conflict with this view. The Google
| 2.0 effect is not necessarily that it gives you a better
| correct answer faster than google. I think it never
| dawned on people how bad they were at searching about
| topics they didn't know much about or how bad google was
| at pointing them in the right direction prior to chatgpt.
| Or putting it another way, they never realized how much
| utility they would get out of something that pointed them
| in the correct direction even though they couldn't trust
| the details.
|
| It turns out that going from not knowing what you don't
| know to _knowing_ what you don 't know adds an order of
| magnitude improvement to people's experience.
| TZubiri wrote:
| And the llm by design does not save or provide source.
| Unlike google or wikipedia which are transparent about
| sources.
| CamperBob2 wrote:
| It most certainly does, if you are using the latest
| models, which people making comments like this never are
| as a rule.
| Chilko wrote:
| All while using far more energy than a normal google
| search
| glenneroo wrote:
| I keep wondering what the long-game (if any) of LLMs
| is... to make the world dependent on various models then
| jack the rates up to cover the costs? The gravy-train of
| SV funding has to end eventually... right?
| blharr wrote:
| Giving ChatGPT stupid AI image generation was a huge nerf. I
| get frustrated with this all the time.
| SketchySeaBeast wrote:
| Oh, I think it's great they did that. It's super helpful
| for visualizing ChatGPT's limitations. Ask it for an
| absolutely full, overflowing glass of wine or a wrist watch
| whose time is 6:30 and it's obvious what it actually does.
| It's educational.
| bee_rider wrote:
| LLMs could make some nice little tools.
|
| However they'll need to replace vast swathes of the economy
| to justify these AI companies' market caps.
| tiahura wrote:
| I asked claude to give me a script in python to create a map
| highlighting all spanish speaking countries. it took 3 tries
| and then gave me a perfect svg and png.
| zamadatix wrote:
| Even this is a bit overly complicated/optimistic to me. Why not
| something as simple as: OpenAI has been building larger and
| larger models to great success for a long time. As a result,
| they were excited this one was going to be so much larger=so
| much better that the price to run it would be well worth the
| huge jump they were planning to get from it. What really
| happened is this method of scaling hit a wall and they were
| left with an expensive dud they won't get much out of but they
| have to release something for now otherwise they start falling
| well behind on the boards the next few months. Meanwhile they
| scramble focus to find other means of scaling like "chain of
| thought + runtime" provided.
| hn_throwaway_99 wrote:
| Thank you so much for this comment. I don't really understand
| the need for people to go straight to semi-conspiratorial
| hypotheses, when the simpler explanation makes so much more
| sense. All the evidence is that this model is _much_ larger
| than previous ones, so they must charge a lot more for
| inference because it costs so much more to run. OpenAI were
| the OGs when it came to scaling, so it 's not surprising they
| went this route and eventually hit a wall.
|
| I don't at all blame OpenAI for going down this path (indeed,
| I laud them for making expensive bets), but I do blame all
| the quote-un-quote "thought leaders" who were writing
| breathless posts about how AGI was just around the corner
| because things would just scale linearly forever. It was
| classic "based on historical data, this 10 year old will be
| 20 feet tall by the time he's 30" thinking, and lots of
| people called them out on this, and they either just ignored
| it or responded with "oh, simple not-in-the-know peons"
| dismissiveness.
| bee_rider wrote:
| It is weird because this is a board for working programmers
| for the most part. So like, who's seen a gram conspiracy
| actually be accomplished? Probably now many. A lackluster
| product that gets released even though it sucks because too
| many people are highly motivated not to notice that it
| sucks? Everybody has experienced that, right?
| glenstein wrote:
| Exactly. Although I wouldn't even say they have blinders,
| it seems like OpenAI understands quite well what 4.5 can
| do and what it can't hence the modesty in their
| messaging.
|
| To your point, though, I would add not only who has seen
| any grand conspiracy actually be accomplished, who has
| seen one even attempted and kept under wraps? Such that
| the absence of corroborating sources was more consistent
| with an effectively executed conspiracy theory than the
| simple absence of such a plan.
| danielbln wrote:
| It works until it doesn't and hindsight is 20/20.
| hn_throwaway_99 wrote:
| > It works until it doesn't
|
| Of course, that's my point. Again, I think it's great
| that OpenAI swung for the fences. My beef is again with
| these "thought leaders" who would write this blather
| about AGI being just around the corner in the most
| uncritical manner possible (e.g.
| https://news.ycombinator.com/item?id=40576324). These
| folks tended to be in one of two buckets:
|
| 1. "AGI cultists" as I called them, the "we're entering a
| new phase of human evolution"-type people.
|
| 2. People who had a motive to try and sell something.
|
| And it's not about one side or the other being "right" or
| "wrong" after the fact, it's that so much of this just
| sounded like magical thinking and unwarranted
| extrapolations from the get go. The _actual_ experts in
| the area, if they were free to be honest, were much, much
| more cautious in their pronouncements.
| danielbln wrote:
| Definitely, the grifters and hypesters are always
| spoiling things, but even with a sober look it felt like
| AGI _could_ be around the corner. All these novel and
| somewhat unexpected emerging capabilities as we pushed
| more data through training, you'd think maybe that's
| enough? It wasn't and test time compute alone isn't
| either, but that's also hindsight to a degree.
|
| Either way, AGI or not, LLMs are pretty magical.
| snovv_crash wrote:
| If you've been around long enough to witness a previous
| hype bubble (and we've literally just come out of the
| crypto bubble), you should really know better by now.
| Pets.com, literally an online shop selling pet food,
| almost IPOd for $300M in early 2000, just before the
| whole dot-com bubble burst.
|
| And yeah, LLMs are awesome. But you can't predict
| scientific discovery, and all future AI capabilities are
| literally still a research project.
|
| I've had this on my HN user page since 2017, and it's
| just as true as ever: In the real world, exponentials are
| actually early stage sigmoids, or even gaussians.
| baxtr wrote:
| Well that's only because YOU don't understand exponential
| growth! No human can! /s
| Kye wrote:
| In fundamental science terms, it also proves once and for
| all that more model doesn't mean more better. Any forces
| within OpenAI pushing to move past just growing the model
| for gains now have a strong argument for going all-in on
| new processes.
| Kerbonut wrote:
| Apparently, OpenAI API "credits" expire after a year. I
| stupidly put another $20 and trying to blow through them, 4.5
| is the easiest way considering recent 4o has fallen out of
| favor for other models and I don't want to just let them expire
| again. An expiry after only one year is asinine.
| Chance-Device wrote:
| Yes. I also discovered this, and was also forced to blow
| through my credits in a rush. Terrible policy.
| glenstein wrote:
| I'm learning this for the first time now. I don't
| appreciate having to anticipate how many credits I'll use
| like its an FSA account.
| heed wrote:
| >Terrible policy.
|
| And unfortunately one not exclusive to OpenAI. Anthropic
| credits also expire after 1 year.
| jstummbillig wrote:
| This is how pricing on human labour works. Nobody expects an
| employee that costs twice as much to produce twice the output
| for any given task. All that is expected is that they can do a
| narrow set of things, that another person can't.
| sampton wrote:
| Too much money not enough new ideas.
| siva7 wrote:
| It's marketed to be slightly better at "creative writing". This
| isn't the problem most businesses have with current-generation
| LLMs. On the other side; Anthropic releases nearly at the same
| time a new model which solves more practical problems for
| businesses to the point that for coding many insiders don't use
| OpenAI models for such tasks anymore.
| dingnuts wrote:
| I think it should be illegal to trick humans into reading
| "creative" machine output.
|
| It strikes me as a form of fraud that steals my most precious
| resources: time and attention. I read creative writing to feel
| a human connection to the author. If the author is a machine
| and this is not disclosed, that's a huge lie.
|
| It should be required that publishers label AI generated
| content.
| CuriouslyC wrote:
| I'm pretty sure you read for pleasure, and feeling a human
| connection is one way that you derive pleasure. If it's the
| only way that you derive pleasure from reading, my
| condolences.
| becquerel wrote:
| Pretty much where my thoughts on this are. I rarely feel
| any particular sense of connection to the actual author
| when I read their books. And I have taken great pleasure
| from some AI stories (to the degree I put them up on my
| personal website as a way to keep them around).
| Philpax wrote:
| Under the dingnuts regime, Dwarf Fortress will be illegal.
| Actually, any game with a procedural story? You better
| believe that's a crime: we can't have a machine generate text
| a _human_ might enjoy.
| glenneroo wrote:
| Dingnuts point was that it should be disclosed. Everyone
| knows Dwarf Fortress stories are procedural/AI generated,
| the authors aren't trying to hide that fact.
| Philpax wrote:
| Actually, fair enough. I still disagree with their
| argument, but this was the wrong tack for me to use.
| Hoasi wrote:
| > I think it should be illegal to trick humans into reading
| "creative" machine output.
|
| Creativity has lost its meaning. Should it be illegal? The
| courts will take a long time to settle the matter. Reselling
| people's work against their will as creative machine output
| seems unethical, to say the least.
|
| > It should be required that publishers label AI-generated
| content.
|
| Strongly agree.
| buyucu wrote:
| OpenAI has been irrelevant for a while now. All of the new and
| exciting developments on AI are coming from other places.
| ClosedAI is no longer the driver of change and innovation.
| nickthegreek wrote:
| The other models are literally distilling OpenAI's models into
| theirs.
| demosthanos wrote:
| So it's been claimed, but has it been proven yet?
|
| I'm not even sure what is being alleged there--o1's reasoning
| tokens are kept secret precisely to avoid the kind of
| distillation that's being alleged. How can you distill a
| reasoning process given only the final output?
| ipaddr wrote:
| The outputting that they are chatGPT from deepseek is a big
| clue.
| orbital-decay wrote:
| Do they? Why doesn't this happen to Claude then? I've been
| hearing this for a while, but never saw any evidence beyond
| the contamination of the dataset with GPT slop that is all
| over the web. Just by sending anything to the competitors
| you're giving up a large part of your know how before you
| even finish your product, that's a big incentive against
| doing that.
| nickthegreek wrote:
| Who said it isn't happening to Claude?
|
| Companies are 100% using these big players to generate
| synthetic data. Distillation is extremely powerful. How is
| this even in question?
| nullc wrote:
| OpenAI conceals probabilities so how is anyone distilling
| from it?
| kossTKR wrote:
| And OpenAI based their tech on a Google paper again building
| on years of public academic research so what's the point
| exactly here?
|
| OpenAI was just first out of the gates, there'll always be
| some company that's first, essence is how they handle their
| leadership, and they've sadly been absolutely terrible and
| scummy.
|
| Actually i think Google was a pretty good example of the
| exact opposite, decades of "actually not being evil", while
| openAI switched up 1 second after launch.
| nickthegreek wrote:
| > so what's the point exactly here.
|
| What is your point? OpenAI wasn't the first out of that
| gate as your own argument cites Google prior. All these
| companies are predatory, who is arguing against that? OP
| said OpenAi was irrelevant. That's just dumb. They are not.
| Feel free to advance an argument in favor of that narrative
| if you wish as I was just trying to provide a single
| example that shows that some of these lightweight models
| are building directly off the backs of giants spending the
| big money. I find nothing wrong with distillation and am
| excited about companies like DeepSeek.
| ipaddr wrote:
| Google wasn't the first search engine but they were the
| best marketing google = search. That's where we are with
| openai. Google search was a better product at the time and
| chatGPT 3.5 was a breakthrough the public used. Fast
| forward and some will say Google isn't the best search
| engine anymore (kagi, duckduckgo, yandex offer different
| experiences) but people still think of google=search. Same
| with chatGPT. Claude may be better for coding or gemini
| better are searching or Deepseek cheaper but equal but
| chatGPT is a verb and will live on like Intel inside long
| after it's actual value has declined.
| williamcotton wrote:
| Google was so much better than AltaVista that I just
| can't buy that it was marketing that pushed them to the
| forefront of search.
| Jweb_Guru wrote:
| > Google wasn't the first search engine but they were the
| best marketing google = search
|
| Google's overwhelming victory in search had ~ nothing to
| do with marketing.
| mrcwinn wrote:
| That's quite a world you've constructed!
| jug wrote:
| I think OpenAI is currently in this position where they are
| still industry standard, but also not leading. Deepseek R1 beat
| o1 on perf/cost with similar perf at a fraction of the cost.
| o3-mini is judged as "weird" and quite hit and miss on coding
| (basically the sole reason for its existence) with a sky high
| SimpleQA hallucination rate due to its limited scope, probably
| beat by Sonnet 3.7 by a fairly large margin.
|
| Still, being early with a product and still often "good enough"
| still takes them a long way. I think GPT-5 and where their
| competition will be then will be quite important for OpenAI
| though. I think the signs on the horizon is that everyone will
| close up on each other as we hit the diminishing returns, so
| the underlying business model, integrations, enterprise reach,
| marketing and market share will probably be king rather than
| the underlying LLM in 2026.
|
| Since GPT-5 is meant to select the best model behind the
| scenes, one issue might be that users won't have the same
| confidence in the model, feeling like it's deciding for them or
| OpenAI tuning it to err on the side of being cheap.
| yimby2001 wrote:
| It seems like there's a misunderstanding as why this happened.
| They've been baking this model for months. long before deep seek
| came out with fundamental new ways of distilling models. and even
| given that it's not great it's its large form, they're going to
| distil from this going forward .. so it likely makes sense for
| them to periodically train these very large models as a basis.
| lhl wrote:
| I think this framing isn't quite right either. DeepSeek's R1
| isn't very different from what OpenAI has already been doing
| with o1 (and that other groups have been doing as well). As for
| distilling - the R1 "distilled" models they released aren't
| even proper (logit) distillations, but just SFTs, not
| fundamentally new at all. But it's great that they published
| their full recipes and it's also great to see that it's
| effective. In fact we've seen now with LIMO, s1/s1.1, that even
| as few as 1K reasoning traces can get most LLMs to near SOTA
| math benchmarks. This mirrors the "Alpaca" moment in a lot of
| ways (and you could even directly mirror say LIMO w/ LIMA).
|
| I think the main takeaway of GPT4.5 (Orion) is that it
| basically gives a perspective to all the "hit a wall" talk from
| the end of last year. Here we have a model that has been
| trained on by many accounts 10-100X the compute of GPT4, is
| likely several times larger in parameter count, but is only...
| subtly better, certainly not super-intelligent. I've been
| playing around w/ it a lot the past few days, both with several
| million tokens worth of non-standard benchmarks and talking to
| it and it _is_ better than previous GPTs (in particular, it
| makes a big jump in humor), but I think it 's clear that the
| "easy" gains in the near future are going to be figuring out
| how as many domains as possible can be approximately
| verified/RL'd.
|
| As for the release? I suppose they could just have kept it
| internally for distillation/knowledge transfer, so I'm actually
| happy that they released it, even if it ends up not being a
| really "useful" model.
| modeless wrote:
| I've been using 4.5 instead of 4o for quick questions. I don't
| mind the slowness for short answers. I feel like it is less
| likely to hallucinate than other models.
| kubb wrote:
| Seems like we're hitting the limits of the technology...
| xmichael909 wrote:
| Yes, I believe the sprint is over, now its doing to be slow
| cycles maybe 18 months to see a 5% increase an ability and even
| that 5% increase will be highly subjective. Claude's new
| release is about the same 3.7 is arguably worse at some things
| than 3.5 and better at others. Based on the previous pace of
| release in about 6 months or so - if the next release from any
| of the leaders is about the same "kinda better kinda worse"
| then we'll know. Imagine how much money is going to evaporate
| from the stock market if this is the limit!!!
| ANewFormation wrote:
| You can keep getting rich off shovels long after the gold has
| run dry.
| TIPSIO wrote:
| I also hate waiting on reasoning.
|
| I much would prefer a super lightning fast model that is
| cheaper but the same quality as these frontier models.
|
| Let me query these things to death.
| ljlolel wrote:
| try groq (hyperfast chips) https://groq.com/
| apwell23 wrote:
| does it mean we get a reprieve from "this is just the
| beginning" comments.
| kubb wrote:
| I wouldn't count on it.
| thfuran wrote:
| Maybe if it takes many years before the next major
| architectural advancement.
| retskrad wrote:
| Sam Altman views Steve Jobs as one of his inspirations (he called
| the iPhone the greatest product of all time). So if you look at
| OpenAI in the lens of Apple, where you think about making the
| product enjoyable to use at all costs, then it makes perfect
| sense why you'd spend so much money to go from 4o to 4.5 which
| brings such subtle differences to power users.
|
| The vast majority of users, which are over 300 million weekly,
| will mainly use 4o and whatever is the default. In the future
| they'll use 4.5 and think it's most human like and less robotic.
| bravura wrote:
| Yes but Steve Jobs also understood the paradox of choice, and
| the importance of having incredibly clear delineation between
| every different product in your line.
| ipaddr wrote:
| Do models matter to the regular user over brand? People talk
| about using chatGPT over Google's AI or Deepseek not 4o-mini
| vs gemini 2.
|
| OpenAI has done a good job of making the model less important
| and the domain gptGPT.com more important.
|
| Most of the time the model rarely matters. When you find
| something incorrect you may switch models but that rarely
| fixes the problem. Rewording a prompt has more value than
| changing a model.
| esafak wrote:
| If the model did not matter they would be spending their
| money on marketing or sales instead of improving the model.
| pzo wrote:
| Long term it might be hard to monetise those infrastructure
| considering their competition:
|
| 1) For coding (API) most probably will stick to Claude 3.5 / 3.7
| - big market but still small comparing to all world wide problems
|
| 2) For non-coding API IMHO gemini 2.0 flash is the winner - dirty
| cheap (cheaper than 4o-mini), good enough and even better than
| gpt-4o, cheap audio and image input.
|
| 3) For subscription app ChatGPT is probably still the best but
| only slightly - they have the best advanced voice audio
| conversation but Grok will be probably eating their lunch here
| anukin wrote:
| Sesame model for voice audio imo is better than ChatGPT voice
| audio conversation. They are going to open source it as well.
| bckr wrote:
| Sure but is there an app I can talk to / work with? It seems
| they're a voice synthesis model company, not a chatbot app /
| tool company.
| OsrsNeedsf2P wrote:
| > They are going to open source it as well.
|
| Means nothing until they do
| ipaddr wrote:
| For the rest of us using free tiers ChatGPT is hands down the
| winner allowing limited image generation, unlimited usage of
| some model and limited usage of 4o.
|
| Claude is still stuck at 10 messages per day and gemini is less
| accurate/useful.
| dingnuts wrote:
| 10 messages a day? How are people "vibe coding" with that?
| irishloop wrote:
| They're paying for Pro
| dingnuts wrote:
| Ah thank you; I had heard the paid ones had daily limits
| too so I was confused
| danielbln wrote:
| They do, I subscribe to pro. All of my vibe coding
| however is done via the API.
| Layvier wrote:
| We were using gpt-4o for our chat agent, and after some
| experiments I think we'll move to flash 2.0. Faster, cheaper
| and a bit more reliable even. I also experimented with the
| experimental thinking version, and there a single node
| architecture seemed to work well enough (instead of multiple
| specialised sub agents nodes). It did better than deepseek
| actually. Now I'm waiting for the official release before
| spending more time on it.
| HarHarVeryFunny wrote:
| GPT 4.5 also has a knowledge cutoff date of 10-2023.
|
| https://www.reddit.com/r/singularity/comments/1izpb8t/gpt45_...
|
| I'm guessing that this model was finished pre-training at least a
| year ago (it's been 2 years since GPT 4.0 was released) and they
| just didn't see the hoped-for performance gains to think it
| warranted releasing at the time, and so put all their effort into
| the Q-star/strawberry = eventual O1 reasoning effort instead.
|
| It seems that OpenAI's reasoning model lead isn't perhaps what
| they thought it was, and the recent slew of strong non-reasoning
| models (Gemini 2.0 Flash, Grok 3, Sonnet 3.7) made them feel the
| need to release something themselves for appearances sake, so
| they dusted off this model, perhaps did a bit of post-training on
| it for EQ, and here we are.
|
| The price is a bit of a mystery - perhaps just a reflection of an
| older model without all the latest efficiency tricks to make it
| cheaper. Maybe it's dense rather than MoE - who knows.
| sigmoid10 wrote:
| Rumors said that GPT4.5 is an order of magnitude larger. Around
| 12 trillion parameters total (compared to GPT4's 1.2 trillion).
| It's almost certainly MoE as well, just a scaled up version.
| That would explain the cost. OpenAI also said that this is what
| they originally developed as "Omni" - the model supposed to
| succeed GPT4 but which fell behind expectations. So they
| renamed it 4.5 and shoehorned it in to remain in the news among
| all those competitor releases.
| ljlolel wrote:
| the gpt-4o ("omni") is probably a distilled 4.5; hence why
| not much quality difference
| sigmoid10 wrote:
| 4o has been out since May last year, while omni (now
| rechristened as 4.5) only finished training in
| October/November.
| cubefox wrote:
| 4.5 was called Orion, not Omni.
| Leary wrote:
| How does this compare with Grok 3's parameter count? I know
| Grok 3 was trained on a larger cluster (100k-200k) but GPT
| 4.5 used distributed training.
| glenstein wrote:
| This is all excellent detail. Wondering if there's any good
| suggestions for further reading on the inside baseball of
| what happened with GPT 4.5?
| qeternity wrote:
| Well, it's not...it gets most details wrong.
| glenstein wrote:
| Can you elaborate?
| qeternity wrote:
| GPT-4 was rumored to be 1.8T params...not 1.2
|
| And the successor model was called "Orion", not "Omni".
| glenstein wrote:
| Appreciate the corrections, but I'm still a bit puzzled.
| Are they wrong about 4.5 having 12 trillion parameters,
| it originally intending to be Orion (not omni), or an
| expected successor to GPT 4? And do you have any related
| reading that speaks to any of this?
| qeternity wrote:
| GPT-4 was rumored to be 1.8T params...not 1.2
|
| And the successor model was called "Orion", not "Omni".
| Chance-Device wrote:
| Releasing it was probably a mistake. In context what the model
| is could have been understood, but they haven't really
| presented that context. Also it would be lost on a general
| audience.
|
| The general public will naturally expect it to be the next big
| thing. Wasn't that the point of releasing it? To seem like
| progress is being made? To try to make that point with a model
| that doesn't deliver is a misstep.
|
| If I were Sam Altman, I'd be pulling this back before it goes
| on general release, saying something like it was experimental
| and after user feedback the costs weren't worth it and they're
| working on something else as a replacement. Then o3 or whatever
| they actually are working on instead can be the "replacement"
| even if it's much later.
| datadrivenangel wrote:
| or just say it was too good and thus too dangerous to
| release...
| simonw wrote:
| I don't think th October 2023 training cut-off means the model
| finished pre-training a year ago. All of OpenAI's models share
| that same cut-off date.
|
| One theory is that they're worried about the increasing tide of
| LLM-generated slop that's been posted online since that date. I
| don't know if I buy that or not - other model providers (such
| as Anthropic, Gemini) don't seem worried about that.
| glenstein wrote:
| >The price is a bit of a mystery
|
| I think it at least is somewhat analogous to what happened with
| pricing on previous models. GPT 4, despite being less capable
| than 4o, is an order of magnitude more expensive, and
| comparably expensive to o1. It seems like once the model is
| out, the price is the price, and the performance gains emerge
| but they emerge attached to new minified variations of previous
| models.
| bilater wrote:
| I sort of believed this but also 4.5 coming out last year would
| absolutely have been a big deal compared to what was out there
| at the time? I just dont understand why they would not launch
| it then.
| numba888 wrote:
| > slew of strong non-reasoning models (Gemini 2.0 Flash, Grok
| 3, Sonnet 3.7)
|
| Sonnet 3.7 is actually reasoning model.
| LaurensBER wrote:
| It's my understanding that reasoning in Sonnet 3.7 is
| optional and configurable.
|
| I might be wrong but I couldn't find a source that indicates
| that the "base" model also implements reasoning.
| wegfawefgawefg wrote:
| so is grok3
| phillipcarter wrote:
| My take from using it a bit is that they seem to have genuinely
| innovated on:
|
| - Not writing things that go off in weird directions / staying
| grounded in "reality"
|
| - Responding very well to tone preferences and catching nuance in
| what I say
|
| It seems like it's less that it has a great "personality" like
| Claude, but that it's capable of adapting towards being the
| "personality" I want and "understanding" what I'm saying in ways
| that other models haven't been able to do for me.
| XenophileJKO wrote:
| So this kind of mirrors my feelings after using GPT-4.5 on
| general conversation and song writing.
|
| GPT picked up on unspecified requirements almost instantly. It
| is subtle (and may be undesirable in some contexts). For
| example in my songs, I have to bracket the section headings, it
| picked up on that from my original input. All the other
| frontier models generally have to be reminded. Additionally, I
| separately asked for an edit to a music style description. When
| I asked GPT-4.5 to write a song all by itself, it included a
| music style description. No other model I have worked with has
| done this.
|
| These are subtle differences, but in aggregate the model just
| generally needs less nudging to create what is required.
| torginus wrote:
| I haven't used 4.5 but have some experience using Claude for
| creative writing, and in my experience it sometimes has the
| uncanny ability to get to the core of my ideas, rephrasing my
| paragraph long descriptions into just a sentence or two, or
| both improving and concretizing my vague ideas into something
| that's insightful and tasteful.
|
| Other times it locks itself into a dull style and ignores
| what I ask of it and just produces boring generic garbage,
| and I have to wrangle it hard to get some of the spark back.
|
| I have no idea what's going on inside, but just like with
| Stable Diffusion, it's fairly easy to make something that has
| the spark of genius, and is very close to being perfect, but
| getting the last 10% there, and maintaining the quality seems
| almost impossible.
|
| It's a very weird feeling, it's hard to put into words what
| is exactly going on, and probably even harder to make it into
| a benchmark, but it makes me constantly flip-flop between
| scared of being how good the AI is, and questioning why I
| ever bothered with using it in the first place, as I would've
| progress much faster without it.
| neom wrote:
| 4.5 can extremely quickly distill and work with what I at least
| consider, complex nuanced thought. 4.5 is night and day better
| than every other AI for my work, it's quite clever and I like it.
|
| Very quick mvp comparison for the show me what you mean crew:
| https://chatgpt.com/share/67c48fcc-db24-800f-865b-c0485efd7f... &
| https://chatgpt.com/share/67c48fe2-0830-800f-a370-7a18586e8b...
| (~30 seconds vs ~3 minutes)
| ttul wrote:
| I believe 4.5 is a very large and rich model. The price is high
| because it's costly to inference; however, the bigger reason is
| to ensure that others don't distill from it. Big models have a
| rich latent space, but it takes time to squeeze the juice out.
| esafak wrote:
| That also means people won't use it. Way to shoot yourself in
| the foot.
|
| The irony of a company that has distilled the word's
| information complaining about another company distilling
| their model...
| cscurmudgeon wrote:
| My assumption: There will be use cases where cost of using
| this will be smaller than the gain from it. Data from this
| will make the next version better and cheaper.
| ttul wrote:
| The small number of use cases that do pay are providing
| gross margins as well as feedback that helps OpenAI in
| various ways. I don't think it's a stupid move at all.
| nyrikki wrote:
| The 4.5 has better 'vibes' but isn't 'better', as a concrete
| example:
|
| > Mission is the operationalized version of vision; it
| translates aspiration into clear, achievable action.
|
| The "Mission is the operationalized version of vision" is not
| in the corpus that I am find and is obviously a confabulated
| mixture of classic Taylorist like "strategic planning"
|
| SOPs and metrics, which will be tied to compensation and the
| unfortunate ubiquitous nature of Taylorism would not result in
| shared purpose, but a bunch of Gantt charts past the planning
| horizon.
|
| IMHO I would consider "complex nuanced thought" as
| understanding the historical issues and at least respect the
| divide between classical and neo-classical org theory. Or at
| least avoid pollution of more modern theories with classical
| baggage that is a significant barrier to delivering value.
|
| Mission statements need to share strategic intent in an
| actionable way, strategy is not operationalization.
| neom wrote:
| The statement "Mission is the operationalized version of
| vision; it translates aspiration into clear, achievable
| action" isn't a Taylorist reduction of mission to mechanical
| processes - it's actually a nuanced understanding of how
| these organizational elements relate. You're misinterpreting
| what "operationalized" means in this context. From what i can
| tell, the 4.5 response isn't suggesting Taylorist
| implementation with Gantt charts etc it's describing how
| missions translate vision into actionable direction while
| remaining strategic. Instead of jargon, it's recognizing that
| founders need something between abstract vision and tactical
| execution. Missions serve this critical bridging function.
| CEO has vision, orgs capture the vision into their missions,
| people find their purpose when aligned via the 2. Without it,
| founders either get stuck in aspirational thinking or jump
| straight to implementation details without strategic
| guidance. The distinction matters exactly because it helps
| avoid the dysfunction that prevents startups from scaling
| effectively. I think you're assuming "operationalized" means
| tactical implementation (Gantt charts, SOPs) when in this
| context it means "made operational/actionable at a strategic
| level". Missions != mission statements. Also, you're creating
| a false dichotomy between "strategic intent" and
| "operationalization" when they very much, exist on a
| spectrum. (If anything, connecting employees to mission and
| purpose is the opposite of Tayloristic thinking, which viewed
| workers more as interchangeable parts than as stakeholders in
| a shared mission towards responding to a shared vision of
| global change) - You are doing what o1 pro did, and as I
| said: As a tool for teaching business to founders,
| personally, I find the 4.5 response to be better.
| nyrikki wrote:
| An example of a typical nieve definition of a mission
| statement is:
|
| Concise, clear, and memorable statement that outlines a
| company's core purpose, values, and target audience.
|
| > "made operational/actionable at a strategic level".
|
| Taken the common definition from the first part of this
| plan, what do you think the average manager would do given
| that in the social sciences, operationalization is
| explicitly about measuring abstract qualities. [1]
|
| "operationalization" is a compromise, trying to quantify
| qualitative properties, it is not typically subject to
| methods like MECE principal, because there are too many
| unknown unknowns.
|
| You are correct that "operationalization" and "strategic
| intent" are not mutually exclusive in all aspects, but they
| are for mission statements that need to be durable across
| changes that no CEO can envision.
|
| The "made operational/actionable at a strategic level" is
| the exact claim of pseudo scientific management theory
| (Greater Taylorism) that Japan directly targeted to destroy
| the US manufacturing sector. You can look at the former CEO
| of Komatsu if you want direct evidence.
|
| GM:s failure to learn form Toyota at NUMII (sp?) is
| another.
|
| The planning process needs to be informed by stratagy, but
| planning is not strategic, it has a limited horizon.
|
| But you are correct that it is more nuanced and neither
| Taylor nor Tolstoy allowed for that.
|
| Neo-classical org theory is when bounded rationality was
| first acknowledged, although the Prussian military figured
| that out long before Taylor grabbed his stopwatch to time
| people loading pig iron into train cars.
|
| I encourage you to read:
|
| Strategy: A History By sir Lawrence Freedman
|
| For a more in depth discussion.
|
| [1] https://socialsci.libretexts.org/Bookshelves/Sociology/
| Intro...
| neom wrote:
| Your responses are interesting because they drive me to
| feel reinforced about my opinion. This conversation is
| precisely why I rate 4.5 over o1 pro. I prompted in a
| very very very specific way. I'm afraid to say your
| comments are highly disengaged for the realities of
| business and business building. Appreciate the historical
| context and recommended reading (although I assure you, I
| am extremely well versed). The term 'operationalized'
| here refers to strategic alignment, not Taylorist
| quantification, think guiding principles over rigid
| metrics. You are badly conflating operationalization in
| social sciences (which is about measurement) with
| strategic operationalization in management, which is not
| same. Again: operationalized in this context means making
| the mission actionable at a strategic level, not about
| quantification. Modern mission frameworks prioritize
| adaptability within durable purpose, avoiding the
| pitfalls you've rightly flagged. Successful founders
| don't get caught in these theoretical distinctions.
| Founders taught be my, and I guess GPT 4.5, understand
| correctly, mission as the bridge between aspirational
| vision and practical action. This isn't "Greater
| Taylorism" but pragmatic leadership. While your
| historical references (NUMMI, not NUMII) demonstrate
| academic knowledge, they miss how effective missions
| actually guide organizations while remaining adaptable.
| The 4.5 response captured this practical reality well- it
| pointed to but it not create artificial boundaries
| between interconnected concepts. If we had some founders
| trained by you (o1 Pro) and me (Gpt 4.5) - I would be
| willing to bet my founders would out preform yours any
| day of the week.
| nyrikki wrote:
| Tuckman as a 'real' framework is a belief so that is
| fair.
|
| He clearly communicated in 1977 that his ideas were never
| formally validated and that he cautioned about their use
| in other contexts.
|
| I think that the concepts can be useful, if you took them
| as anything more than a guiding framework that may or may
| not be appropriate for a particular need.
|
| https://core.ac.uk/download/pdf/36725856.pdf
|
| I personally find value in team and org mission
| statements, especially for building a shared purpose, but
| to be honest, any of the studies on that are more about
| manager satisfaction then anything else.
|
| There is far more data on the failure of strategy
| execution, and linking strategy with purpose as well as
| providing runways and goals is one place I find vision
| and mission statements useful.
|
| As up to 90% of companies fail on strategy execution, and
| because employee engagement is in free fall, the fact
| that companies are still in business means little.
|
| Context is king, and this is horses for courses, but I
| would caution against ignoring more recent, Nobel winning
| theories like Holmstrom's theorem.
|
| Most teams don't experience the literal steps Tuckman
| suggested, rarely all at once, and never as one time
| singular events. As the above link demonstrated, some
| portions like the _storming_ can be problematic.
|
| Make them operationalize their mission statement, and
| they will and it will be in concrete.
|
| Remember von MoltKe "No plan of operations extends with
| certainty beyond the first encounter with the enemy's
| main strength."
|
| There is a balance between C2 and mission command styles,
| the risk is trying to force or worse intentionally
| causing people to resort to c2 when almost always you
| need a shifting balance between command and intent based
| solutions.
|
| The Feudal Mode of Production was _sufficient_ for
| centuries, but far from optimal.
|
| The NUMMI reference was exactly related to the same
| reason Amazon profits historically raised higher despite
| head count increases that should have allowed.
|
| Small cross functional teams, with clearly communicated
| tasks, and enough freedom to accomplish those tasks
| efficiently.
|
| You can look at Trist's study about the challenges with
| incentivizing teams to game the system. Same problem
| happened under Balmer at MS, and DEC failed the opposite
| way, trying to do everything at once and please everyone.
|
| https://www.uv.es/=gonzalev/PSI%20ORG%2006-07/ARTICULOS%2
| 0RR...
|
| The reality is that the popularity of frameworks rarely
| relates to their effectiveness, building teams is hard,
| making teams work as teams across teams is even harder.
|
| Tuckerman may be useful in that...but this claim is
| wrong:
|
| > "Modern mission frameworks prioritize adaptability
| within durable purpose, avoiding the pitfalls you've
| rightly flagged"
|
| Modern _ frameworks prioritize _adoption_ and depending
| on the _framework_ to solve your companies needs will
| always fail. You need to _choose_ a framework that fits
| your strategy and objectives, and adapt it to fit _your
| needs_.
|
| Learn from others, but don't ignore the reality on the
| ground.
| neom wrote:
| Regarding Tuckman's model, there are actually numerous
| studies validating its relevance and practical
| application: Gren et al. (2017) validated it specifically
| for agile teams across eight large companies. Natvig &
| Stark (2016) confirmed its accuracy in team development
| contexts. Bonebright's (2010) historical review
| demonstrated its ongoing relevance across four decades of
| application.
|
| I feel we're talking past each other here. My original
| point was about which AI model is better for MY WORK. (I
| run a starup accelerator for first time founders) 4.5, in
| 30 seconds over minutes, provided more practical value to
| founders building actual businesses, and saved me time.
| While I appreciate your historical references and
| academic perspectives, they don't address my central
| argument about GPT-4.5's response being more
| pragmatically useful. The distinction between academic
| precision and practical utility is exactly what I'm
| highlighting. Founders don't need perfect theoretical
| models - they need frameworks that help them bridge
| vision and execution in the real world. When you bring up
| feudal production modes and von Moltke, we're moving
| further from the practical question of which AI response
| would better guide someone trying to align teams around a
| meaningful mission that drives business results. It's
| exactly why I formed the 2 prompts in the manner I did, I
| wanted to see if it was an academic or an expert.
|
| My assessment stands that GPT-4.5's 30 seconds of
| thinking reflects well how mission operationalizes vision
| reflects how successful businesses actually work, not how
| academics might describe them in theoretical papers. I've
| read the papers, I've studied the theory deeply, but I
| also have NYSE and NASDAQ ticker symbols under my belt,
| from seed. That, is the whole point here.
| ewoodrich wrote:
| I have been experimenting with 4.5 for a journaling app I am
| developing for my own personal needs, for example, turning
| bullet/unstructured thoughts into a consistent diary
| format/voice.
|
| The quality of writing can be much better than Claude 3.5/3.7
| at times but struggling with similar confabulation of
| information that is not in the original text but "sounds
| good/flows well". Which isn't ideal for a personal journal...
| I am still playing around with the system prompt but given
| the astronomical cost (even with me as the only user) with
| marginal benefits I am probably going to end up sticking with
| Claude for now.
|
| Unless others have a recommendation for a less robot-y
| sounding model (that will, however, follow instructions
| precisely) with API access other than the mainstream
| Claude/OpenAI/Gemini models?
| neom wrote:
| I've found this on par with 4.5 in tone, but not as nuanced
| in connecting super wide ideas in systems, 4.5 still does
| that best: https://ai.google.dev/gemini-api/docs/thinking
|
| (also: the person you are responding to is doing exactly
| what you're saying you don't want done, take something
| unrelated to the original text (Taylorism) but could sound
| good, and jam it in)
| EcommerceFlow wrote:
| I've found 4.5 to be quite good at "business decisions", much
| better than other models. It does have some magic to it, similar
| to Grok 3, but maybe a bit smarter?
| sunami-ai wrote:
| Meanwhile all GPT4o models on Azure are set to be deprecated in
| May and there are no alternative models yet. We should start
| moving to Anthropic? DS too slow, melting under its own success.
| Anyone on GPT4o/Azure has any idea when they'll release the next
| "o" model?
| Uvix wrote:
| Only an older version of GPT-4o has been deprecated and will be
| removed in May. The newest version will be supported through at
| least 20 November 2025.
|
| https://learn.microsoft.com/en-us/azure/ai-services/openai/c...
| sunami-ai wrote:
| The Nov 2024 release, which is due to be deprecated in Nov
| 2025, I was told has degraded performance compared to the Aug
| 2024 release. In fact, OpenAI Models page says their current
| GPT4o API is serving the Aug release.
| https://platform.openai.com/docs/models#gpt-4o
|
| So I'm still on the Aug 24 release, which, with your
| reminding me, is not to be deprecated till Aug 2025, but
| that's less than 5 months from now, and we're skipping the
| Nov 2024 release just as OpenAI themselves have chosen to do.
| ein0p wrote:
| I have access to it. It is better, but not where most techies
| would care. It knows more, it writes better, it's more pleasant
| to talk to. I think they might have studied the traffic their
| hundreds of millions of users generate and realized where they
| need to improve, then did exactly that for their _non thinking_
| model. They understand that a non-thinking model is not going to
| blow the doors off on coding no matter what they do, but it can
| do writing and "associative memory" tasks quite well, and having
| a lot more weights helps there. I also predict that they will
| fine tune their future distilled, thinking models for coding,
| based on the same logic, distilling from 4.5 this time. Those
| models have to be fast, and therefore they have to be smaller.
| ghostly_s wrote:
| I don't get it. Aren't these two sentences in the same paragraph
| contradictory?
|
| >"Scaling to this size of model did NOT make a clear jump in
| capabilities we are measuring."
|
| > "The jump from GPT-4o (where we are now) to GPT-4.5 made the
| models go from great to really great."
| XenophileJKO wrote:
| No, it means that it got better on things orthogonal to what we
| have mostly been measuring. On the last few rounds, we have
| been mostly focusing on reasoning, not as much on knowledge,
| "creativity", or emotional resonance.
| johnecheck wrote:
| "It's better. We can't measure it, but we're pretty sure it's
| better. We also desperately need it to be better because we
| just spent a boat-load of money on it."
| Artgodma wrote:
| No general model is the frontier.
|
| Thousands of small, specific models are infinitely more efficient
| than a general one.
|
| The more narrowed the task - the better algorithms work.
|
| That's obvious.
|
| Why are general models pushed so hard by its creators?
|
| Their enormous valuations are based on total control over user
| experience.
|
| This total control is justified by computational requirements.
|
| Users can't run general models locally.
|
| Giant data centers for billions are the moat for Model creators
| and corporations behind.
| azan_ wrote:
| It's neither obvious nor true, generalist models outperform
| specialized ones all the time (so frequently that it even has
| its own name - the bitter lesson)
| maleldil wrote:
| Certain desirable capabilities are available only in bigger
| models because it takes a certain size for some behaviours
| emerge.
| mvkel wrote:
| I think this release is for the researchers who worked on it and
| would quit if it never saw daylight
| mirekrusin wrote:
| Is somebody actually looking at those last percentages on
| benchmarks?
|
| Aren't we making mistake of assuming benchmarks are purely 100%
| correct?
___________________________________________________________________
(page generated 2025-03-02 23:01 UTC)