[HN Gopher] GPT-4.5: "Not a frontier model"?
       ___________________________________________________________________
        
       GPT-4.5: "Not a frontier model"?
        
       Author : pama
       Score  : 140 points
       Date   : 2025-03-02 14:47 UTC (8 hours ago)
        
 (HTM) web link (www.interconnects.ai)
 (TXT) w3m dump (www.interconnects.ai)
        
       | tsunego wrote:
       | GPT-4.5 feels like OpenAI's way of discovering just how much
       | we'll pay for diminishing returns.
       | 
       | The leap from GPT-4o to 4.5 isn't a leap--it's an expensive
       | tiptoe toward incremental improvements, priced like a luxury item
       | without the luxury payoff.
       | 
       | With pricing at 15x GPT-4o, they're practically daring us not to
       | use it. Given this, I wouldn't be surprised if GPT-4.5 quietly
       | disappears from the API once OpenAI finishes squeezing insights
       | (and cash) out of this experiment.
        
         | hooverd wrote:
         | They should have called it "ChatGPT Enterprise".
        
           | tsunego wrote:
           | Exactly! designed specifically for people who love burning
           | corporate budgets.
        
             | numba888 wrote:
             | OpenAI is going to add it to Plus subscriptions. I.e.
             | available for many at no additional cost. Likely with
             | restrictions line N prompts/hour.
             | 
             | As for API price, when it matters businesses and people are
             | willing to pay much more for just a bit better results.
             | OpenAI doesn't take the other options away. So we don't
             | lose anything.
        
           | fodkodrasz wrote:
           | IMO the 4o output is lot more Enterprise-compatible, the 4.5
           | being straight to the point and more natural is quite the
           | opposite. Pricing-wise your point stands.
           | 
           | Disclaimer: have not tried 4.5 yet, just skimmed through the
           | announcement, using 4o regularly.
        
         | TZubiri wrote:
         | Time to enter the tick cycle.
         | 
         | I ask chatgpt to give me a map highlighting all spanish
         | speaking countries, gives me stable diffusion trash.
         | 
         | Just gotta do the grunt work, add a tool with a map api.
         | Integrate with google maps for transit stuff.
         | 
         | It's a good LLM model already it doesn't need to be einstein
         | and solve aerospatial equations. We just need to wait until
         | they realize their limits and find the humility to build yet
         | another useful product that won't conquer the world.
        
           | Willingham wrote:
           | I've thought of LLM's as google 2.0 for some time now. Truly
           | a world changing technology similar to how google changed the
           | world, likely to have an even larger impact than google had
           | as we create highly specialized Implementations of the
           | technology in the coming decade...but it's not energy
           | positive nuclear fusion, or a polynomial time NP solver, it's
           | just google 2.0
        
             | dingnuts wrote:
             | Google 2.0 where you have to check every answer it gives
             | you because it's authoritative about nothing.
             | 
             | Works great when the output is small enough to unit test or
             | immediately try in situations with no possible negative
             | outcomes.
             | 
             | Anything larger? Skip the LLM slop and go to the source.
             | You have to go to the source, anyway.
        
               | CamperBob2 wrote:
               | _You have to go to the source, anyway._
               | 
               | Yeah, and then check that. I don't get this argument at
               | all.
               | 
               | People who uncritically swallow the first answer or two
               | they get from Google have a name... but that would just
               | derail the thread into politics.
        
               | LPisGood wrote:
               | There is something to be said for trusting people's (or
               | systems of people's) authority.
               | 
               | For example, have you ever personally verified that
               | humans went to the moon? Have you ever done the
               | experiments to prove the Earth is round?
        
               | sillyfluke wrote:
               | This is not a helpful phrasing I think. Sources allow the
               | reader to go as far down the rabbit hole as they are
               | willing to or knowledgable enough to go.
               | 
               | For example, if I'm looking for some medical finding and
               | I get to a source that's a clinical study from a
               | reputable publication, I may be satisfied and stop there
               | since this is not my area of expertise. However, a person
               | with knowledge of the field may be able to parse the
               | study and pick it apart better than I could. Hence, their
               | search would not end there since they would be
               | unsatisfied with just the source I was satisfied with.
               | 
               | On the other hand, having no verifiable sources should
               | leave everyone unsatisfied.
        
               | LPisGood wrote:
               | Of course, that verifiability is a big part of that
               | trust. I'm not sure why you think my phrasing is not
               | helpful; we seem to agree.
        
               | rwiggins wrote:
               | > Have you ever done the experiments to prove the Earth
               | is round?
               | 
               | I have, actually! Thanks, astronomy class!
               | 
               | I've even estimated the earth's diameter, and I was only
               | like 30% off (iirc). Pretty good for the simplistic
               | method and rough measurements we used.
               | 
               | Sometimes authorities are actually authoritative, though,
               | particularly for technical, factual material. If I'm
               | reading a published release date for a video game,
               | directly from the publisher -- what is there to contest?
               | Meanwhile, ask an LLM and you may have... mixed results,
               | even if the date is within its knowledge cutoff.
        
               | Spooky23 wrote:
               | Have you provided documentation that you are human?
               | Perhaps you are a lizard person sowing misinformation to
               | firm up dominance of humankind.
        
               | sillyfluke wrote:
               | There is a truth in the grandparent's comment that
               | doesn't necessarily conflict with this view. The Google
               | 2.0 effect is not necessarily that it gives you a better
               | correct answer faster than google. I think it never
               | dawned on people how bad they were at searching about
               | topics they didn't know much about or how bad google was
               | at pointing them in the right direction prior to chatgpt.
               | Or putting it another way, they never realized how much
               | utility they would get out of something that pointed them
               | in the correct direction even though they couldn't trust
               | the details.
               | 
               | It turns out that going from not knowing what you don't
               | know to _knowing_ what you don 't know adds an order of
               | magnitude improvement to people's experience.
        
               | TZubiri wrote:
               | And the llm by design does not save or provide source.
               | Unlike google or wikipedia which are transparent about
               | sources.
        
               | CamperBob2 wrote:
               | It most certainly does, if you are using the latest
               | models, which people making comments like this never are
               | as a rule.
        
               | Chilko wrote:
               | All while using far more energy than a normal google
               | search
        
               | glenneroo wrote:
               | I keep wondering what the long-game (if any) of LLMs
               | is... to make the world dependent on various models then
               | jack the rates up to cover the costs? The gravy-train of
               | SV funding has to end eventually... right?
        
           | blharr wrote:
           | Giving ChatGPT stupid AI image generation was a huge nerf. I
           | get frustrated with this all the time.
        
             | SketchySeaBeast wrote:
             | Oh, I think it's great they did that. It's super helpful
             | for visualizing ChatGPT's limitations. Ask it for an
             | absolutely full, overflowing glass of wine or a wrist watch
             | whose time is 6:30 and it's obvious what it actually does.
             | It's educational.
        
           | bee_rider wrote:
           | LLMs could make some nice little tools.
           | 
           | However they'll need to replace vast swathes of the economy
           | to justify these AI companies' market caps.
        
           | tiahura wrote:
           | I asked claude to give me a script in python to create a map
           | highlighting all spanish speaking countries. it took 3 tries
           | and then gave me a perfect svg and png.
        
         | zamadatix wrote:
         | Even this is a bit overly complicated/optimistic to me. Why not
         | something as simple as: OpenAI has been building larger and
         | larger models to great success for a long time. As a result,
         | they were excited this one was going to be so much larger=so
         | much better that the price to run it would be well worth the
         | huge jump they were planning to get from it. What really
         | happened is this method of scaling hit a wall and they were
         | left with an expensive dud they won't get much out of but they
         | have to release something for now otherwise they start falling
         | well behind on the boards the next few months. Meanwhile they
         | scramble focus to find other means of scaling like "chain of
         | thought + runtime" provided.
        
           | hn_throwaway_99 wrote:
           | Thank you so much for this comment. I don't really understand
           | the need for people to go straight to semi-conspiratorial
           | hypotheses, when the simpler explanation makes so much more
           | sense. All the evidence is that this model is _much_ larger
           | than previous ones, so they must charge a lot more for
           | inference because it costs so much more to run. OpenAI were
           | the OGs when it came to scaling, so it 's not surprising they
           | went this route and eventually hit a wall.
           | 
           | I don't at all blame OpenAI for going down this path (indeed,
           | I laud them for making expensive bets), but I do blame all
           | the quote-un-quote "thought leaders" who were writing
           | breathless posts about how AGI was just around the corner
           | because things would just scale linearly forever. It was
           | classic "based on historical data, this 10 year old will be
           | 20 feet tall by the time he's 30" thinking, and lots of
           | people called them out on this, and they either just ignored
           | it or responded with "oh, simple not-in-the-know peons"
           | dismissiveness.
        
             | bee_rider wrote:
             | It is weird because this is a board for working programmers
             | for the most part. So like, who's seen a gram conspiracy
             | actually be accomplished? Probably now many. A lackluster
             | product that gets released even though it sucks because too
             | many people are highly motivated not to notice that it
             | sucks? Everybody has experienced that, right?
        
               | glenstein wrote:
               | Exactly. Although I wouldn't even say they have blinders,
               | it seems like OpenAI understands quite well what 4.5 can
               | do and what it can't hence the modesty in their
               | messaging.
               | 
               | To your point, though, I would add not only who has seen
               | any grand conspiracy actually be accomplished, who has
               | seen one even attempted and kept under wraps? Such that
               | the absence of corroborating sources was more consistent
               | with an effectively executed conspiracy theory than the
               | simple absence of such a plan.
        
             | danielbln wrote:
             | It works until it doesn't and hindsight is 20/20.
        
               | hn_throwaway_99 wrote:
               | > It works until it doesn't
               | 
               | Of course, that's my point. Again, I think it's great
               | that OpenAI swung for the fences. My beef is again with
               | these "thought leaders" who would write this blather
               | about AGI being just around the corner in the most
               | uncritical manner possible (e.g.
               | https://news.ycombinator.com/item?id=40576324). These
               | folks tended to be in one of two buckets:
               | 
               | 1. "AGI cultists" as I called them, the "we're entering a
               | new phase of human evolution"-type people.
               | 
               | 2. People who had a motive to try and sell something.
               | 
               | And it's not about one side or the other being "right" or
               | "wrong" after the fact, it's that so much of this just
               | sounded like magical thinking and unwarranted
               | extrapolations from the get go. The _actual_ experts in
               | the area, if they were free to be honest, were much, much
               | more cautious in their pronouncements.
        
               | danielbln wrote:
               | Definitely, the grifters and hypesters are always
               | spoiling things, but even with a sober look it felt like
               | AGI _could_ be around the corner. All these novel and
               | somewhat unexpected emerging capabilities as we pushed
               | more data through training, you'd think maybe that's
               | enough? It wasn't and test time compute alone isn't
               | either, but that's also hindsight to a degree.
               | 
               | Either way, AGI or not, LLMs are pretty magical.
        
               | snovv_crash wrote:
               | If you've been around long enough to witness a previous
               | hype bubble (and we've literally just come out of the
               | crypto bubble), you should really know better by now.
               | Pets.com, literally an online shop selling pet food,
               | almost IPOd for $300M in early 2000, just before the
               | whole dot-com bubble burst.
               | 
               | And yeah, LLMs are awesome. But you can't predict
               | scientific discovery, and all future AI capabilities are
               | literally still a research project.
               | 
               | I've had this on my HN user page since 2017, and it's
               | just as true as ever: In the real world, exponentials are
               | actually early stage sigmoids, or even gaussians.
        
               | baxtr wrote:
               | Well that's only because YOU don't understand exponential
               | growth! No human can! /s
        
             | Kye wrote:
             | In fundamental science terms, it also proves once and for
             | all that more model doesn't mean more better. Any forces
             | within OpenAI pushing to move past just growing the model
             | for gains now have a strong argument for going all-in on
             | new processes.
        
         | Kerbonut wrote:
         | Apparently, OpenAI API "credits" expire after a year. I
         | stupidly put another $20 and trying to blow through them, 4.5
         | is the easiest way considering recent 4o has fallen out of
         | favor for other models and I don't want to just let them expire
         | again. An expiry after only one year is asinine.
        
           | Chance-Device wrote:
           | Yes. I also discovered this, and was also forced to blow
           | through my credits in a rush. Terrible policy.
        
             | glenstein wrote:
             | I'm learning this for the first time now. I don't
             | appreciate having to anticipate how many credits I'll use
             | like its an FSA account.
        
             | heed wrote:
             | >Terrible policy.
             | 
             | And unfortunately one not exclusive to OpenAI. Anthropic
             | credits also expire after 1 year.
        
         | jstummbillig wrote:
         | This is how pricing on human labour works. Nobody expects an
         | employee that costs twice as much to produce twice the output
         | for any given task. All that is expected is that they can do a
         | narrow set of things, that another person can't.
        
       | sampton wrote:
       | Too much money not enough new ideas.
        
       | siva7 wrote:
       | It's marketed to be slightly better at "creative writing". This
       | isn't the problem most businesses have with current-generation
       | LLMs. On the other side; Anthropic releases nearly at the same
       | time a new model which solves more practical problems for
       | businesses to the point that for coding many insiders don't use
       | OpenAI models for such tasks anymore.
        
         | dingnuts wrote:
         | I think it should be illegal to trick humans into reading
         | "creative" machine output.
         | 
         | It strikes me as a form of fraud that steals my most precious
         | resources: time and attention. I read creative writing to feel
         | a human connection to the author. If the author is a machine
         | and this is not disclosed, that's a huge lie.
         | 
         | It should be required that publishers label AI generated
         | content.
        
           | CuriouslyC wrote:
           | I'm pretty sure you read for pleasure, and feeling a human
           | connection is one way that you derive pleasure. If it's the
           | only way that you derive pleasure from reading, my
           | condolences.
        
             | becquerel wrote:
             | Pretty much where my thoughts on this are. I rarely feel
             | any particular sense of connection to the actual author
             | when I read their books. And I have taken great pleasure
             | from some AI stories (to the degree I put them up on my
             | personal website as a way to keep them around).
        
           | Philpax wrote:
           | Under the dingnuts regime, Dwarf Fortress will be illegal.
           | Actually, any game with a procedural story? You better
           | believe that's a crime: we can't have a machine generate text
           | a _human_ might enjoy.
        
             | glenneroo wrote:
             | Dingnuts point was that it should be disclosed. Everyone
             | knows Dwarf Fortress stories are procedural/AI generated,
             | the authors aren't trying to hide that fact.
        
               | Philpax wrote:
               | Actually, fair enough. I still disagree with their
               | argument, but this was the wrong tack for me to use.
        
           | Hoasi wrote:
           | > I think it should be illegal to trick humans into reading
           | "creative" machine output.
           | 
           | Creativity has lost its meaning. Should it be illegal? The
           | courts will take a long time to settle the matter. Reselling
           | people's work against their will as creative machine output
           | seems unethical, to say the least.
           | 
           | > It should be required that publishers label AI-generated
           | content.
           | 
           | Strongly agree.
        
       | buyucu wrote:
       | OpenAI has been irrelevant for a while now. All of the new and
       | exciting developments on AI are coming from other places.
       | ClosedAI is no longer the driver of change and innovation.
        
         | nickthegreek wrote:
         | The other models are literally distilling OpenAI's models into
         | theirs.
        
           | demosthanos wrote:
           | So it's been claimed, but has it been proven yet?
           | 
           | I'm not even sure what is being alleged there--o1's reasoning
           | tokens are kept secret precisely to avoid the kind of
           | distillation that's being alleged. How can you distill a
           | reasoning process given only the final output?
        
             | ipaddr wrote:
             | The outputting that they are chatGPT from deepseek is a big
             | clue.
        
           | orbital-decay wrote:
           | Do they? Why doesn't this happen to Claude then? I've been
           | hearing this for a while, but never saw any evidence beyond
           | the contamination of the dataset with GPT slop that is all
           | over the web. Just by sending anything to the competitors
           | you're giving up a large part of your know how before you
           | even finish your product, that's a big incentive against
           | doing that.
        
             | nickthegreek wrote:
             | Who said it isn't happening to Claude?
             | 
             | Companies are 100% using these big players to generate
             | synthetic data. Distillation is extremely powerful. How is
             | this even in question?
        
               | nullc wrote:
               | OpenAI conceals probabilities so how is anyone distilling
               | from it?
        
           | kossTKR wrote:
           | And OpenAI based their tech on a Google paper again building
           | on years of public academic research so what's the point
           | exactly here?
           | 
           | OpenAI was just first out of the gates, there'll always be
           | some company that's first, essence is how they handle their
           | leadership, and they've sadly been absolutely terrible and
           | scummy.
           | 
           | Actually i think Google was a pretty good example of the
           | exact opposite, decades of "actually not being evil", while
           | openAI switched up 1 second after launch.
        
             | nickthegreek wrote:
             | > so what's the point exactly here.
             | 
             | What is your point? OpenAI wasn't the first out of that
             | gate as your own argument cites Google prior. All these
             | companies are predatory, who is arguing against that? OP
             | said OpenAi was irrelevant. That's just dumb. They are not.
             | Feel free to advance an argument in favor of that narrative
             | if you wish as I was just trying to provide a single
             | example that shows that some of these lightweight models
             | are building directly off the backs of giants spending the
             | big money. I find nothing wrong with distillation and am
             | excited about companies like DeepSeek.
        
             | ipaddr wrote:
             | Google wasn't the first search engine but they were the
             | best marketing google = search. That's where we are with
             | openai. Google search was a better product at the time and
             | chatGPT 3.5 was a breakthrough the public used. Fast
             | forward and some will say Google isn't the best search
             | engine anymore (kagi, duckduckgo, yandex offer different
             | experiences) but people still think of google=search. Same
             | with chatGPT. Claude may be better for coding or gemini
             | better are searching or Deepseek cheaper but equal but
             | chatGPT is a verb and will live on like Intel inside long
             | after it's actual value has declined.
        
               | williamcotton wrote:
               | Google was so much better than AltaVista that I just
               | can't buy that it was marketing that pushed them to the
               | forefront of search.
        
               | Jweb_Guru wrote:
               | > Google wasn't the first search engine but they were the
               | best marketing google = search
               | 
               | Google's overwhelming victory in search had ~ nothing to
               | do with marketing.
        
         | mrcwinn wrote:
         | That's quite a world you've constructed!
        
         | jug wrote:
         | I think OpenAI is currently in this position where they are
         | still industry standard, but also not leading. Deepseek R1 beat
         | o1 on perf/cost with similar perf at a fraction of the cost.
         | o3-mini is judged as "weird" and quite hit and miss on coding
         | (basically the sole reason for its existence) with a sky high
         | SimpleQA hallucination rate due to its limited scope, probably
         | beat by Sonnet 3.7 by a fairly large margin.
         | 
         | Still, being early with a product and still often "good enough"
         | still takes them a long way. I think GPT-5 and where their
         | competition will be then will be quite important for OpenAI
         | though. I think the signs on the horizon is that everyone will
         | close up on each other as we hit the diminishing returns, so
         | the underlying business model, integrations, enterprise reach,
         | marketing and market share will probably be king rather than
         | the underlying LLM in 2026.
         | 
         | Since GPT-5 is meant to select the best model behind the
         | scenes, one issue might be that users won't have the same
         | confidence in the model, feeling like it's deciding for them or
         | OpenAI tuning it to err on the side of being cheap.
        
       | yimby2001 wrote:
       | It seems like there's a misunderstanding as why this happened.
       | They've been baking this model for months. long before deep seek
       | came out with fundamental new ways of distilling models. and even
       | given that it's not great it's its large form, they're going to
       | distil from this going forward .. so it likely makes sense for
       | them to periodically train these very large models as a basis.
        
         | lhl wrote:
         | I think this framing isn't quite right either. DeepSeek's R1
         | isn't very different from what OpenAI has already been doing
         | with o1 (and that other groups have been doing as well). As for
         | distilling - the R1 "distilled" models they released aren't
         | even proper (logit) distillations, but just SFTs, not
         | fundamentally new at all. But it's great that they published
         | their full recipes and it's also great to see that it's
         | effective. In fact we've seen now with LIMO, s1/s1.1, that even
         | as few as 1K reasoning traces can get most LLMs to near SOTA
         | math benchmarks. This mirrors the "Alpaca" moment in a lot of
         | ways (and you could even directly mirror say LIMO w/ LIMA).
         | 
         | I think the main takeaway of GPT4.5 (Orion) is that it
         | basically gives a perspective to all the "hit a wall" talk from
         | the end of last year. Here we have a model that has been
         | trained on by many accounts 10-100X the compute of GPT4, is
         | likely several times larger in parameter count, but is only...
         | subtly better, certainly not super-intelligent. I've been
         | playing around w/ it a lot the past few days, both with several
         | million tokens worth of non-standard benchmarks and talking to
         | it and it _is_ better than previous GPTs (in particular, it
         | makes a big jump in humor), but I think it 's clear that the
         | "easy" gains in the near future are going to be figuring out
         | how as many domains as possible can be approximately
         | verified/RL'd.
         | 
         | As for the release? I suppose they could just have kept it
         | internally for distillation/knowledge transfer, so I'm actually
         | happy that they released it, even if it ends up not being a
         | really "useful" model.
        
       | modeless wrote:
       | I've been using 4.5 instead of 4o for quick questions. I don't
       | mind the slowness for short answers. I feel like it is less
       | likely to hallucinate than other models.
        
       | kubb wrote:
       | Seems like we're hitting the limits of the technology...
        
         | xmichael909 wrote:
         | Yes, I believe the sprint is over, now its doing to be slow
         | cycles maybe 18 months to see a 5% increase an ability and even
         | that 5% increase will be highly subjective. Claude's new
         | release is about the same 3.7 is arguably worse at some things
         | than 3.5 and better at others. Based on the previous pace of
         | release in about 6 months or so - if the next release from any
         | of the leaders is about the same "kinda better kinda worse"
         | then we'll know. Imagine how much money is going to evaporate
         | from the stock market if this is the limit!!!
        
           | ANewFormation wrote:
           | You can keep getting rich off shovels long after the gold has
           | run dry.
        
         | TIPSIO wrote:
         | I also hate waiting on reasoning.
         | 
         | I much would prefer a super lightning fast model that is
         | cheaper but the same quality as these frontier models.
         | 
         | Let me query these things to death.
        
           | ljlolel wrote:
           | try groq (hyperfast chips) https://groq.com/
        
         | apwell23 wrote:
         | does it mean we get a reprieve from "this is just the
         | beginning" comments.
        
           | kubb wrote:
           | I wouldn't count on it.
        
           | thfuran wrote:
           | Maybe if it takes many years before the next major
           | architectural advancement.
        
       | retskrad wrote:
       | Sam Altman views Steve Jobs as one of his inspirations (he called
       | the iPhone the greatest product of all time). So if you look at
       | OpenAI in the lens of Apple, where you think about making the
       | product enjoyable to use at all costs, then it makes perfect
       | sense why you'd spend so much money to go from 4o to 4.5 which
       | brings such subtle differences to power users.
       | 
       | The vast majority of users, which are over 300 million weekly,
       | will mainly use 4o and whatever is the default. In the future
       | they'll use 4.5 and think it's most human like and less robotic.
        
         | bravura wrote:
         | Yes but Steve Jobs also understood the paradox of choice, and
         | the importance of having incredibly clear delineation between
         | every different product in your line.
        
           | ipaddr wrote:
           | Do models matter to the regular user over brand? People talk
           | about using chatGPT over Google's AI or Deepseek not 4o-mini
           | vs gemini 2.
           | 
           | OpenAI has done a good job of making the model less important
           | and the domain gptGPT.com more important.
           | 
           | Most of the time the model rarely matters. When you find
           | something incorrect you may switch models but that rarely
           | fixes the problem. Rewording a prompt has more value than
           | changing a model.
        
             | esafak wrote:
             | If the model did not matter they would be spending their
             | money on marketing or sales instead of improving the model.
        
       | pzo wrote:
       | Long term it might be hard to monetise those infrastructure
       | considering their competition:
       | 
       | 1) For coding (API) most probably will stick to Claude 3.5 / 3.7
       | - big market but still small comparing to all world wide problems
       | 
       | 2) For non-coding API IMHO gemini 2.0 flash is the winner - dirty
       | cheap (cheaper than 4o-mini), good enough and even better than
       | gpt-4o, cheap audio and image input.
       | 
       | 3) For subscription app ChatGPT is probably still the best but
       | only slightly - they have the best advanced voice audio
       | conversation but Grok will be probably eating their lunch here
        
         | anukin wrote:
         | Sesame model for voice audio imo is better than ChatGPT voice
         | audio conversation. They are going to open source it as well.
        
           | bckr wrote:
           | Sure but is there an app I can talk to / work with? It seems
           | they're a voice synthesis model company, not a chatbot app /
           | tool company.
        
           | OsrsNeedsf2P wrote:
           | > They are going to open source it as well.
           | 
           | Means nothing until they do
        
         | ipaddr wrote:
         | For the rest of us using free tiers ChatGPT is hands down the
         | winner allowing limited image generation, unlimited usage of
         | some model and limited usage of 4o.
         | 
         | Claude is still stuck at 10 messages per day and gemini is less
         | accurate/useful.
        
           | dingnuts wrote:
           | 10 messages a day? How are people "vibe coding" with that?
        
             | irishloop wrote:
             | They're paying for Pro
        
               | dingnuts wrote:
               | Ah thank you; I had heard the paid ones had daily limits
               | too so I was confused
        
               | danielbln wrote:
               | They do, I subscribe to pro. All of my vibe coding
               | however is done via the API.
        
         | Layvier wrote:
         | We were using gpt-4o for our chat agent, and after some
         | experiments I think we'll move to flash 2.0. Faster, cheaper
         | and a bit more reliable even. I also experimented with the
         | experimental thinking version, and there a single node
         | architecture seemed to work well enough (instead of multiple
         | specialised sub agents nodes). It did better than deepseek
         | actually. Now I'm waiting for the official release before
         | spending more time on it.
        
       | HarHarVeryFunny wrote:
       | GPT 4.5 also has a knowledge cutoff date of 10-2023.
       | 
       | https://www.reddit.com/r/singularity/comments/1izpb8t/gpt45_...
       | 
       | I'm guessing that this model was finished pre-training at least a
       | year ago (it's been 2 years since GPT 4.0 was released) and they
       | just didn't see the hoped-for performance gains to think it
       | warranted releasing at the time, and so put all their effort into
       | the Q-star/strawberry = eventual O1 reasoning effort instead.
       | 
       | It seems that OpenAI's reasoning model lead isn't perhaps what
       | they thought it was, and the recent slew of strong non-reasoning
       | models (Gemini 2.0 Flash, Grok 3, Sonnet 3.7) made them feel the
       | need to release something themselves for appearances sake, so
       | they dusted off this model, perhaps did a bit of post-training on
       | it for EQ, and here we are.
       | 
       | The price is a bit of a mystery - perhaps just a reflection of an
       | older model without all the latest efficiency tricks to make it
       | cheaper. Maybe it's dense rather than MoE - who knows.
        
         | sigmoid10 wrote:
         | Rumors said that GPT4.5 is an order of magnitude larger. Around
         | 12 trillion parameters total (compared to GPT4's 1.2 trillion).
         | It's almost certainly MoE as well, just a scaled up version.
         | That would explain the cost. OpenAI also said that this is what
         | they originally developed as "Omni" - the model supposed to
         | succeed GPT4 but which fell behind expectations. So they
         | renamed it 4.5 and shoehorned it in to remain in the news among
         | all those competitor releases.
        
           | ljlolel wrote:
           | the gpt-4o ("omni") is probably a distilled 4.5; hence why
           | not much quality difference
        
             | sigmoid10 wrote:
             | 4o has been out since May last year, while omni (now
             | rechristened as 4.5) only finished training in
             | October/November.
        
               | cubefox wrote:
               | 4.5 was called Orion, not Omni.
        
           | Leary wrote:
           | How does this compare with Grok 3's parameter count? I know
           | Grok 3 was trained on a larger cluster (100k-200k) but GPT
           | 4.5 used distributed training.
        
           | glenstein wrote:
           | This is all excellent detail. Wondering if there's any good
           | suggestions for further reading on the inside baseball of
           | what happened with GPT 4.5?
        
             | qeternity wrote:
             | Well, it's not...it gets most details wrong.
        
               | glenstein wrote:
               | Can you elaborate?
        
               | qeternity wrote:
               | GPT-4 was rumored to be 1.8T params...not 1.2
               | 
               | And the successor model was called "Orion", not "Omni".
        
               | glenstein wrote:
               | Appreciate the corrections, but I'm still a bit puzzled.
               | Are they wrong about 4.5 having 12 trillion parameters,
               | it originally intending to be Orion (not omni), or an
               | expected successor to GPT 4? And do you have any related
               | reading that speaks to any of this?
        
           | qeternity wrote:
           | GPT-4 was rumored to be 1.8T params...not 1.2
           | 
           | And the successor model was called "Orion", not "Omni".
        
         | Chance-Device wrote:
         | Releasing it was probably a mistake. In context what the model
         | is could have been understood, but they haven't really
         | presented that context. Also it would be lost on a general
         | audience.
         | 
         | The general public will naturally expect it to be the next big
         | thing. Wasn't that the point of releasing it? To seem like
         | progress is being made? To try to make that point with a model
         | that doesn't deliver is a misstep.
         | 
         | If I were Sam Altman, I'd be pulling this back before it goes
         | on general release, saying something like it was experimental
         | and after user feedback the costs weren't worth it and they're
         | working on something else as a replacement. Then o3 or whatever
         | they actually are working on instead can be the "replacement"
         | even if it's much later.
        
           | datadrivenangel wrote:
           | or just say it was too good and thus too dangerous to
           | release...
        
         | simonw wrote:
         | I don't think th October 2023 training cut-off means the model
         | finished pre-training a year ago. All of OpenAI's models share
         | that same cut-off date.
         | 
         | One theory is that they're worried about the increasing tide of
         | LLM-generated slop that's been posted online since that date. I
         | don't know if I buy that or not - other model providers (such
         | as Anthropic, Gemini) don't seem worried about that.
        
         | glenstein wrote:
         | >The price is a bit of a mystery
         | 
         | I think it at least is somewhat analogous to what happened with
         | pricing on previous models. GPT 4, despite being less capable
         | than 4o, is an order of magnitude more expensive, and
         | comparably expensive to o1. It seems like once the model is
         | out, the price is the price, and the performance gains emerge
         | but they emerge attached to new minified variations of previous
         | models.
        
         | bilater wrote:
         | I sort of believed this but also 4.5 coming out last year would
         | absolutely have been a big deal compared to what was out there
         | at the time? I just dont understand why they would not launch
         | it then.
        
         | numba888 wrote:
         | > slew of strong non-reasoning models (Gemini 2.0 Flash, Grok
         | 3, Sonnet 3.7)
         | 
         | Sonnet 3.7 is actually reasoning model.
        
           | LaurensBER wrote:
           | It's my understanding that reasoning in Sonnet 3.7 is
           | optional and configurable.
           | 
           | I might be wrong but I couldn't find a source that indicates
           | that the "base" model also implements reasoning.
        
           | wegfawefgawefg wrote:
           | so is grok3
        
       | phillipcarter wrote:
       | My take from using it a bit is that they seem to have genuinely
       | innovated on:
       | 
       | - Not writing things that go off in weird directions / staying
       | grounded in "reality"
       | 
       | - Responding very well to tone preferences and catching nuance in
       | what I say
       | 
       | It seems like it's less that it has a great "personality" like
       | Claude, but that it's capable of adapting towards being the
       | "personality" I want and "understanding" what I'm saying in ways
       | that other models haven't been able to do for me.
        
         | XenophileJKO wrote:
         | So this kind of mirrors my feelings after using GPT-4.5 on
         | general conversation and song writing.
         | 
         | GPT picked up on unspecified requirements almost instantly. It
         | is subtle (and may be undesirable in some contexts). For
         | example in my songs, I have to bracket the section headings, it
         | picked up on that from my original input. All the other
         | frontier models generally have to be reminded. Additionally, I
         | separately asked for an edit to a music style description. When
         | I asked GPT-4.5 to write a song all by itself, it included a
         | music style description. No other model I have worked with has
         | done this.
         | 
         | These are subtle differences, but in aggregate the model just
         | generally needs less nudging to create what is required.
        
           | torginus wrote:
           | I haven't used 4.5 but have some experience using Claude for
           | creative writing, and in my experience it sometimes has the
           | uncanny ability to get to the core of my ideas, rephrasing my
           | paragraph long descriptions into just a sentence or two, or
           | both improving and concretizing my vague ideas into something
           | that's insightful and tasteful.
           | 
           | Other times it locks itself into a dull style and ignores
           | what I ask of it and just produces boring generic garbage,
           | and I have to wrangle it hard to get some of the spark back.
           | 
           | I have no idea what's going on inside, but just like with
           | Stable Diffusion, it's fairly easy to make something that has
           | the spark of genius, and is very close to being perfect, but
           | getting the last 10% there, and maintaining the quality seems
           | almost impossible.
           | 
           | It's a very weird feeling, it's hard to put into words what
           | is exactly going on, and probably even harder to make it into
           | a benchmark, but it makes me constantly flip-flop between
           | scared of being how good the AI is, and questioning why I
           | ever bothered with using it in the first place, as I would've
           | progress much faster without it.
        
       | neom wrote:
       | 4.5 can extremely quickly distill and work with what I at least
       | consider, complex nuanced thought. 4.5 is night and day better
       | than every other AI for my work, it's quite clever and I like it.
       | 
       | Very quick mvp comparison for the show me what you mean crew:
       | https://chatgpt.com/share/67c48fcc-db24-800f-865b-c0485efd7f... &
       | https://chatgpt.com/share/67c48fe2-0830-800f-a370-7a18586e8b...
       | (~30 seconds vs ~3 minutes)
        
         | ttul wrote:
         | I believe 4.5 is a very large and rich model. The price is high
         | because it's costly to inference; however, the bigger reason is
         | to ensure that others don't distill from it. Big models have a
         | rich latent space, but it takes time to squeeze the juice out.
        
           | esafak wrote:
           | That also means people won't use it. Way to shoot yourself in
           | the foot.
           | 
           | The irony of a company that has distilled the word's
           | information complaining about another company distilling
           | their model...
        
             | cscurmudgeon wrote:
             | My assumption: There will be use cases where cost of using
             | this will be smaller than the gain from it. Data from this
             | will make the next version better and cheaper.
        
             | ttul wrote:
             | The small number of use cases that do pay are providing
             | gross margins as well as feedback that helps OpenAI in
             | various ways. I don't think it's a stupid move at all.
        
         | nyrikki wrote:
         | The 4.5 has better 'vibes' but isn't 'better', as a concrete
         | example:
         | 
         | > Mission is the operationalized version of vision; it
         | translates aspiration into clear, achievable action.
         | 
         | The "Mission is the operationalized version of vision" is not
         | in the corpus that I am find and is obviously a confabulated
         | mixture of classic Taylorist like "strategic planning"
         | 
         | SOPs and metrics, which will be tied to compensation and the
         | unfortunate ubiquitous nature of Taylorism would not result in
         | shared purpose, but a bunch of Gantt charts past the planning
         | horizon.
         | 
         | IMHO I would consider "complex nuanced thought" as
         | understanding the historical issues and at least respect the
         | divide between classical and neo-classical org theory. Or at
         | least avoid pollution of more modern theories with classical
         | baggage that is a significant barrier to delivering value.
         | 
         | Mission statements need to share strategic intent in an
         | actionable way, strategy is not operationalization.
        
           | neom wrote:
           | The statement "Mission is the operationalized version of
           | vision; it translates aspiration into clear, achievable
           | action" isn't a Taylorist reduction of mission to mechanical
           | processes - it's actually a nuanced understanding of how
           | these organizational elements relate. You're misinterpreting
           | what "operationalized" means in this context. From what i can
           | tell, the 4.5 response isn't suggesting Taylorist
           | implementation with Gantt charts etc it's describing how
           | missions translate vision into actionable direction while
           | remaining strategic. Instead of jargon, it's recognizing that
           | founders need something between abstract vision and tactical
           | execution. Missions serve this critical bridging function.
           | CEO has vision, orgs capture the vision into their missions,
           | people find their purpose when aligned via the 2. Without it,
           | founders either get stuck in aspirational thinking or jump
           | straight to implementation details without strategic
           | guidance. The distinction matters exactly because it helps
           | avoid the dysfunction that prevents startups from scaling
           | effectively. I think you're assuming "operationalized" means
           | tactical implementation (Gantt charts, SOPs) when in this
           | context it means "made operational/actionable at a strategic
           | level". Missions != mission statements. Also, you're creating
           | a false dichotomy between "strategic intent" and
           | "operationalization" when they very much, exist on a
           | spectrum. (If anything, connecting employees to mission and
           | purpose is the opposite of Tayloristic thinking, which viewed
           | workers more as interchangeable parts than as stakeholders in
           | a shared mission towards responding to a shared vision of
           | global change) - You are doing what o1 pro did, and as I
           | said: As a tool for teaching business to founders,
           | personally, I find the 4.5 response to be better.
        
             | nyrikki wrote:
             | An example of a typical nieve definition of a mission
             | statement is:
             | 
             | Concise, clear, and memorable statement that outlines a
             | company's core purpose, values, and target audience.
             | 
             | > "made operational/actionable at a strategic level".
             | 
             | Taken the common definition from the first part of this
             | plan, what do you think the average manager would do given
             | that in the social sciences, operationalization is
             | explicitly about measuring abstract qualities. [1]
             | 
             | "operationalization" is a compromise, trying to quantify
             | qualitative properties, it is not typically subject to
             | methods like MECE principal, because there are too many
             | unknown unknowns.
             | 
             | You are correct that "operationalization" and "strategic
             | intent" are not mutually exclusive in all aspects, but they
             | are for mission statements that need to be durable across
             | changes that no CEO can envision.
             | 
             | The "made operational/actionable at a strategic level" is
             | the exact claim of pseudo scientific management theory
             | (Greater Taylorism) that Japan directly targeted to destroy
             | the US manufacturing sector. You can look at the former CEO
             | of Komatsu if you want direct evidence.
             | 
             | GM:s failure to learn form Toyota at NUMII (sp?) is
             | another.
             | 
             | The planning process needs to be informed by stratagy, but
             | planning is not strategic, it has a limited horizon.
             | 
             | But you are correct that it is more nuanced and neither
             | Taylor nor Tolstoy allowed for that.
             | 
             | Neo-classical org theory is when bounded rationality was
             | first acknowledged, although the Prussian military figured
             | that out long before Taylor grabbed his stopwatch to time
             | people loading pig iron into train cars.
             | 
             | I encourage you to read:
             | 
             | Strategy: A History By sir Lawrence Freedman
             | 
             | For a more in depth discussion.
             | 
             | [1] https://socialsci.libretexts.org/Bookshelves/Sociology/
             | Intro...
        
               | neom wrote:
               | Your responses are interesting because they drive me to
               | feel reinforced about my opinion. This conversation is
               | precisely why I rate 4.5 over o1 pro. I prompted in a
               | very very very specific way. I'm afraid to say your
               | comments are highly disengaged for the realities of
               | business and business building. Appreciate the historical
               | context and recommended reading (although I assure you, I
               | am extremely well versed). The term 'operationalized'
               | here refers to strategic alignment, not Taylorist
               | quantification, think guiding principles over rigid
               | metrics. You are badly conflating operationalization in
               | social sciences (which is about measurement) with
               | strategic operationalization in management, which is not
               | same. Again: operationalized in this context means making
               | the mission actionable at a strategic level, not about
               | quantification. Modern mission frameworks prioritize
               | adaptability within durable purpose, avoiding the
               | pitfalls you've rightly flagged. Successful founders
               | don't get caught in these theoretical distinctions.
               | Founders taught be my, and I guess GPT 4.5, understand
               | correctly, mission as the bridge between aspirational
               | vision and practical action. This isn't "Greater
               | Taylorism" but pragmatic leadership. While your
               | historical references (NUMMI, not NUMII) demonstrate
               | academic knowledge, they miss how effective missions
               | actually guide organizations while remaining adaptable.
               | The 4.5 response captured this practical reality well- it
               | pointed to but it not create artificial boundaries
               | between interconnected concepts. If we had some founders
               | trained by you (o1 Pro) and me (Gpt 4.5) - I would be
               | willing to bet my founders would out preform yours any
               | day of the week.
        
               | nyrikki wrote:
               | Tuckman as a 'real' framework is a belief so that is
               | fair.
               | 
               | He clearly communicated in 1977 that his ideas were never
               | formally validated and that he cautioned about their use
               | in other contexts.
               | 
               | I think that the concepts can be useful, if you took them
               | as anything more than a guiding framework that may or may
               | not be appropriate for a particular need.
               | 
               | https://core.ac.uk/download/pdf/36725856.pdf
               | 
               | I personally find value in team and org mission
               | statements, especially for building a shared purpose, but
               | to be honest, any of the studies on that are more about
               | manager satisfaction then anything else.
               | 
               | There is far more data on the failure of strategy
               | execution, and linking strategy with purpose as well as
               | providing runways and goals is one place I find vision
               | and mission statements useful.
               | 
               | As up to 90% of companies fail on strategy execution, and
               | because employee engagement is in free fall, the fact
               | that companies are still in business means little.
               | 
               | Context is king, and this is horses for courses, but I
               | would caution against ignoring more recent, Nobel winning
               | theories like Holmstrom's theorem.
               | 
               | Most teams don't experience the literal steps Tuckman
               | suggested, rarely all at once, and never as one time
               | singular events. As the above link demonstrated, some
               | portions like the _storming_ can be problematic.
               | 
               | Make them operationalize their mission statement, and
               | they will and it will be in concrete.
               | 
               | Remember von MoltKe "No plan of operations extends with
               | certainty beyond the first encounter with the enemy's
               | main strength."
               | 
               | There is a balance between C2 and mission command styles,
               | the risk is trying to force or worse intentionally
               | causing people to resort to c2 when almost always you
               | need a shifting balance between command and intent based
               | solutions.
               | 
               | The Feudal Mode of Production was _sufficient_ for
               | centuries, but far from optimal.
               | 
               | The NUMMI reference was exactly related to the same
               | reason Amazon profits historically raised higher despite
               | head count increases that should have allowed.
               | 
               | Small cross functional teams, with clearly communicated
               | tasks, and enough freedom to accomplish those tasks
               | efficiently.
               | 
               | You can look at Trist's study about the challenges with
               | incentivizing teams to game the system. Same problem
               | happened under Balmer at MS, and DEC failed the opposite
               | way, trying to do everything at once and please everyone.
               | 
               | https://www.uv.es/=gonzalev/PSI%20ORG%2006-07/ARTICULOS%2
               | 0RR...
               | 
               | The reality is that the popularity of frameworks rarely
               | relates to their effectiveness, building teams is hard,
               | making teams work as teams across teams is even harder.
               | 
               | Tuckerman may be useful in that...but this claim is
               | wrong:
               | 
               | > "Modern mission frameworks prioritize adaptability
               | within durable purpose, avoiding the pitfalls you've
               | rightly flagged"
               | 
               | Modern _ frameworks prioritize _adoption_ and depending
               | on the _framework_ to solve your companies needs will
               | always fail. You need to _choose_ a framework that fits
               | your strategy and objectives, and adapt it to fit _your
               | needs_.
               | 
               | Learn from others, but don't ignore the reality on the
               | ground.
        
               | neom wrote:
               | Regarding Tuckman's model, there are actually numerous
               | studies validating its relevance and practical
               | application: Gren et al. (2017) validated it specifically
               | for agile teams across eight large companies. Natvig &
               | Stark (2016) confirmed its accuracy in team development
               | contexts. Bonebright's (2010) historical review
               | demonstrated its ongoing relevance across four decades of
               | application.
               | 
               | I feel we're talking past each other here. My original
               | point was about which AI model is better for MY WORK. (I
               | run a starup accelerator for first time founders) 4.5, in
               | 30 seconds over minutes, provided more practical value to
               | founders building actual businesses, and saved me time.
               | While I appreciate your historical references and
               | academic perspectives, they don't address my central
               | argument about GPT-4.5's response being more
               | pragmatically useful. The distinction between academic
               | precision and practical utility is exactly what I'm
               | highlighting. Founders don't need perfect theoretical
               | models - they need frameworks that help them bridge
               | vision and execution in the real world. When you bring up
               | feudal production modes and von Moltke, we're moving
               | further from the practical question of which AI response
               | would better guide someone trying to align teams around a
               | meaningful mission that drives business results. It's
               | exactly why I formed the 2 prompts in the manner I did, I
               | wanted to see if it was an academic or an expert.
               | 
               | My assessment stands that GPT-4.5's 30 seconds of
               | thinking reflects well how mission operationalizes vision
               | reflects how successful businesses actually work, not how
               | academics might describe them in theoretical papers. I've
               | read the papers, I've studied the theory deeply, but I
               | also have NYSE and NASDAQ ticker symbols under my belt,
               | from seed. That, is the whole point here.
        
           | ewoodrich wrote:
           | I have been experimenting with 4.5 for a journaling app I am
           | developing for my own personal needs, for example, turning
           | bullet/unstructured thoughts into a consistent diary
           | format/voice.
           | 
           | The quality of writing can be much better than Claude 3.5/3.7
           | at times but struggling with similar confabulation of
           | information that is not in the original text but "sounds
           | good/flows well". Which isn't ideal for a personal journal...
           | I am still playing around with the system prompt but given
           | the astronomical cost (even with me as the only user) with
           | marginal benefits I am probably going to end up sticking with
           | Claude for now.
           | 
           | Unless others have a recommendation for a less robot-y
           | sounding model (that will, however, follow instructions
           | precisely) with API access other than the mainstream
           | Claude/OpenAI/Gemini models?
        
             | neom wrote:
             | I've found this on par with 4.5 in tone, but not as nuanced
             | in connecting super wide ideas in systems, 4.5 still does
             | that best: https://ai.google.dev/gemini-api/docs/thinking
             | 
             | (also: the person you are responding to is doing exactly
             | what you're saying you don't want done, take something
             | unrelated to the original text (Taylorism) but could sound
             | good, and jam it in)
        
       | EcommerceFlow wrote:
       | I've found 4.5 to be quite good at "business decisions", much
       | better than other models. It does have some magic to it, similar
       | to Grok 3, but maybe a bit smarter?
        
       | sunami-ai wrote:
       | Meanwhile all GPT4o models on Azure are set to be deprecated in
       | May and there are no alternative models yet. We should start
       | moving to Anthropic? DS too slow, melting under its own success.
       | Anyone on GPT4o/Azure has any idea when they'll release the next
       | "o" model?
        
         | Uvix wrote:
         | Only an older version of GPT-4o has been deprecated and will be
         | removed in May. The newest version will be supported through at
         | least 20 November 2025.
         | 
         | https://learn.microsoft.com/en-us/azure/ai-services/openai/c...
        
           | sunami-ai wrote:
           | The Nov 2024 release, which is due to be deprecated in Nov
           | 2025, I was told has degraded performance compared to the Aug
           | 2024 release. In fact, OpenAI Models page says their current
           | GPT4o API is serving the Aug release.
           | https://platform.openai.com/docs/models#gpt-4o
           | 
           | So I'm still on the Aug 24 release, which, with your
           | reminding me, is not to be deprecated till Aug 2025, but
           | that's less than 5 months from now, and we're skipping the
           | Nov 2024 release just as OpenAI themselves have chosen to do.
        
       | ein0p wrote:
       | I have access to it. It is better, but not where most techies
       | would care. It knows more, it writes better, it's more pleasant
       | to talk to. I think they might have studied the traffic their
       | hundreds of millions of users generate and realized where they
       | need to improve, then did exactly that for their _non thinking_
       | model. They understand that a non-thinking model is not going to
       | blow the doors off on coding no matter what they do, but it can
       | do writing and "associative memory" tasks quite well, and having
       | a lot more weights helps there. I also predict that they will
       | fine tune their future distilled, thinking models for coding,
       | based on the same logic, distilling from 4.5 this time. Those
       | models have to be fast, and therefore they have to be smaller.
        
       | ghostly_s wrote:
       | I don't get it. Aren't these two sentences in the same paragraph
       | contradictory?
       | 
       | >"Scaling to this size of model did NOT make a clear jump in
       | capabilities we are measuring."
       | 
       | > "The jump from GPT-4o (where we are now) to GPT-4.5 made the
       | models go from great to really great."
        
         | XenophileJKO wrote:
         | No, it means that it got better on things orthogonal to what we
         | have mostly been measuring. On the last few rounds, we have
         | been mostly focusing on reasoning, not as much on knowledge,
         | "creativity", or emotional resonance.
        
           | johnecheck wrote:
           | "It's better. We can't measure it, but we're pretty sure it's
           | better. We also desperately need it to be better because we
           | just spent a boat-load of money on it."
        
       | Artgodma wrote:
       | No general model is the frontier.
       | 
       | Thousands of small, specific models are infinitely more efficient
       | than a general one.
       | 
       | The more narrowed the task - the better algorithms work.
       | 
       | That's obvious.
       | 
       | Why are general models pushed so hard by its creators?
       | 
       | Their enormous valuations are based on total control over user
       | experience.
       | 
       | This total control is justified by computational requirements.
       | 
       | Users can't run general models locally.
       | 
       | Giant data centers for billions are the moat for Model creators
       | and corporations behind.
        
         | azan_ wrote:
         | It's neither obvious nor true, generalist models outperform
         | specialized ones all the time (so frequently that it even has
         | its own name - the bitter lesson)
        
         | maleldil wrote:
         | Certain desirable capabilities are available only in bigger
         | models because it takes a certain size for some behaviours
         | emerge.
        
       | mvkel wrote:
       | I think this release is for the researchers who worked on it and
       | would quit if it never saw daylight
        
       | mirekrusin wrote:
       | Is somebody actually looking at those last percentages on
       | benchmarks?
       | 
       | Aren't we making mistake of assuming benchmarks are purely 100%
       | correct?
        
       ___________________________________________________________________
       (page generated 2025-03-02 23:01 UTC)