hngopher.com

       [HN Gopher] Open models by OpenAI
       ___________________________________________________________________
        
       Open models by OpenAI
        
       https://openai.com/index/introducing-gpt-oss/
        
       Author : lackoftactics
       Score  : 2047 points
       Date   : 2025-08-05 17:02 UTC (1 days ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | thimabi wrote:
       | Open weight models from OpenAI with performance comparable to
       | that of o3 and o4-mini in benchmarks... well, I certainly wasn't
       | expecting that.
       | 
       | What's the catch?
        
         | coreyh14444 wrote:
         | Because GPT-5 comes out later this week?
        
           | thimabi wrote:
           | It could be, but there's so much hype surrounding the GPT-5
           | release that I'm not sure whether their internal models will
           | live up to it.
           | 
           | For GPT-5 to dwarf these just-released models in importance,
           | it would have to be a huge step forward, and I'm still
           | doubting about OpenAI's capabilities and infrastructure to
           | handle demand at the moment.
        
             | sebzim4500 wrote:
             | Surely OpenAI would not be releasing this now unless GPT-5
             | was much better than it.
        
             | jona777than wrote:
             | As a sidebar, I'm still not sure if GPT-5 will be
             | transformative due to its capabilities as much as its
             | accessibility. All it really needs to do to be highly
             | impactful is lower the barrier of entry for the more
             | powerful models. I could see that contributing to it being
             | worth the hype. Surely it will be better, but if more
             | people are capable of leveraging it, that's just as
             | revolutionary, if not more.
        
             | rrrrrrrrrrrryan wrote:
             | It seems like a big part of GPT-5 will be that it will be
             | able to intelligently route your request to the appropriate
             | model variant.
        
               | Shank wrote:
               | That doesn't sound good. It sounds like OpenAI will route
               | my request to the cheapest model to them and the most
               | expensive for me, with the minimum viable results.
        
               | Invictus0 wrote:
               | Sounds just like what a human would do. Or any business
               | for that matter.
        
               | Shank wrote:
               | That may be true but I thought the promise was moving in
               | the direction of AGI/ASI/whatever and that models would
               | become more capable over time.
        
         | logicchains wrote:
         | The catch is that it only has ~5 billion active params so
         | should perform worse than the top Deepseek and Qwen models,
         | which have around 20-30 billion, unless OpenAI pulled off a
         | miracle.
        
         | NitpickLawyer wrote:
         | > What's the catch?
         | 
         | Probably GPT5 will be way way better. If alpha/beta horizon are
         | early previews of GPT5 family models, then coding should be >
         | opus4 for modern frontend stuff.
        
         | int_19h wrote:
         | The catch is that performance is not actually comparable to
         | o4-mini, never mind o3.
         | 
         | When it comes to LLMs, benchmarks are bullshit. If they sound
         | too good to be true, it's because they are. The only thing
         | benchmarks are useful for is preliminary screening - if the
         | model does especially badly in them it's probably not good in
         | general. But if it does good in them, that doesn't really tell
         | you anything.
        
           | ewoodrich wrote:
           | It's definitely _interesting_ how the comments from right
           | after the models were released were ecstatic about  "SOTA
           | performance" and how it is "equivalent to o3" and then
           | comments like yours hours later after having actually tested
           | it keep pointing out how it's garbage compared to even the
           | current batch of open models let alone proprietary foundation
           | models.
           | 
           | Yet another data point for benchmarks being utterly useless
           | and completely gamed at this stage in the game by all the
           | major AI developers.
           | 
           | These companies are clearly are all very aware that the
           | initial wave of hype at release is "sticky" and drives
           | buzz/tech news coverage while real world tests take much
           | longer before that impression slowly starts to be undermined
           | by practical usage and comparison to other models. Benchmarks
           | with wildly over confident naming like "Humanity's Last Exam"
           | aren't exactly helping with objectivity either.
        
       | DSingularity wrote:
       | Ha. Secure funding and proceed to immediately make a decision
       | that would likely conflict viscerally with investors.
        
         | hnuser123456 wrote:
         | Maybe someone got tired of waiting paid them to release
         | something actually open
        
         | 4b6442477b1280b wrote:
         | their promise to release an open weights model predates this
         | round of funding by, iirc, over half a year.
        
           | DSingularity wrote:
           | Yeah but they never released until now.
        
         | SV_BubbleTime wrote:
         | Undercutting other frontier models with your open source one is
         | not an anti-investor move.
         | 
         | It is what China has been doing for a year plus now. And the
         | Chinese models are popular and effective, I assume companies
         | are paying for better models.
         | 
         | Releasing open models for free doesn't have to be charity.
        
       | hnuser123456 wrote:
       | Text only, when local multimodal became table stakes last year.
        
         | ebiester wrote:
         | Honestly, it's a tradeoff. If you can reduce the size and make
         | a higher quality in specific tasks, that's better than a
         | generalist that can't run on a laptop or can't compete at any
         | one task.
         | 
         | We will know soon the actual quality as we go.
        
           | greenavocado wrote:
           | That's what I thought too until Qwen-Image was released
        
             | SV_BubbleTime wrote:
             | When Queen-Image was released... like yesterday? And what?
             | What point are you making? QwebImage was released yesterday
             | and like every image model, its base model shows potential
             | over older ones but the real factor is will it be flexible
             | enough for a fine tune or additional training Loras.
        
         | BoorishBears wrote:
         | The community can always figure out hooking it up to other
         | modalities.
         | 
         | Native might be better, but no native multimodal model is very
         | competitive yet, so better to take a competitive model and
         | latch on vision/audio
        
           | tarruda wrote:
           | > so better to take a competitive model and latch on
           | vision/audio
           | 
           | Can this be done by a third party or would it have to be
           | OpenAI?
        
             | BoorishBears wrote:
             | No, anyone can do it: https://github.com/haotian-liu/LLaVA
        
       | IceHegel wrote:
       | Listed performance of ~5 points less than o3 on benchmarks is
       | pretty impressive.
       | 
       | Wonder if they feel the bar will be raised soon (GPT-5) and feel
       | more comfortable releasing something this strong.
        
       | johntiger1 wrote:
       | Wow, this will eat Meta's lunch
        
         | seydor wrote:
         | I believe their competition is from chinese companies , for
         | some time now
        
         | mhh__ wrote:
         | They will clone it
        
         | BoorishBears wrote:
         | Maverick and Scout were not great, even with post-training in
         | my experience, and then several Chinese models at multiple
         | sizes made them kind of irrelevant (dots, Qwen, MiniMax)
         | 
         | If anything this helps Meta: another model to inspect/learn
         | from/tweak etc. generally helps anyone making models
        
           | redox99 wrote:
           | There's nothing new here in terms of architecture. Whatever
           | secret sauce is in the training.
        
             | BoorishBears wrote:
             | Part of the secret sauce since O1 has been accesss the real
             | reasoning traces, not the summaries.
             | 
             | If you even glance at the model card you'll see this was
             | trained on the same CoT RL pipeline as O3, and it shows in
             | using the model: this is the most coherent and structured
             | CoT of any open model so far.
             | 
             | Having full access to a model trained on that pipeline is
             | valuable to anyone doing post-training, even if it's just
             | to observe, but especially if you use it as cold start data
             | for your own training.
        
               | anticensor wrote:
               | Its CoT is sadly closer to that sanitised o3 summaries
               | than to R1 style traces.
        
               | BoorishBears wrote:
               | It has both raw and summarized traces.
        
               | anticensor wrote:
               | I mean raw GPT-OSS is close to summarised o3.
        
         | asdev wrote:
         | Meta is so cooked, I think most enterprises will opt for OpenAI
         | or Anthropic and others will host OSS models themselves or on
         | AWS/infra providers.
        
           | a_wild_dandan wrote:
           | I'll accept Meta's frontier AI demise if they're in their
           | current position a year from now. People killed Google
           | prematurely too (remember Bard?), because we severely
           | underestimate the catch-up power bought with ungodly piles of
           | cash.
        
             | asdev wrote:
             | catching up gets exponentially harder as time passes. way
             | harder to catch up to current models than it was to the
             | first iteration of gpt-4
        
             | atonse wrote:
             | And boy, with the $250m offers to people, Meta is
             | definitely throwing ungodly piles of cash at the problem.
             | 
             | But Apple is waking up too. So is Google. It's absolutely
             | insane, the amount of money being thrown around.
        
               | a_vanderbilt wrote:
               | It's insane numbers like that that give me some concern
               | for a bubble. Not because AI hits some dead end, but due
               | to a plateau that shifts from aggressive investment to
               | passive-but-steady improvement.
        
       | Workaccount2 wrote:
       | Wow, today is a crazy AI release day:
       | 
       | - OAI open source
       | 
       | - Opus 4.1
       | 
       | - Genie 3
       | 
       | - ElevenLabs Music
        
         | orphea wrote:
         | OAI open source
         | 
         | Yeah. This certainly was not on my bingo card.
        
           | wahnfrieden wrote:
           | They announced it months ago...
        
         | satyrun wrote:
         | wow I just listened to Eleven Music do flamenco singing. That
         | is incredible.
         | 
         | Edit. I just tried it though and less impressed now. We are
         | really going to need major music software to get on board
         | before we have actual creative audio tools. These all seem made
         | for non-musicians to make a very cookie cutter song from a
         | specific genre.
        
           | tmikaeld wrote:
           | I also tried it for a full 100K credits (Wasted in 2 hours
           | btw which is silly!).
           | 
           | Compared to both Udio and Suno, it's very very bad.. both at
           | compositions, matching lyrics to music, keeping tempo and as
           | soon as there's any distorted instruments like guitars or
           | live, quality goes to radio-level.
        
           | BoxOfRain wrote:
           | >These all seem made for non-musicians to make a very cookie
           | cutter song from a specific genre.
           | 
           | This is my main problem with AI music at the moment, I'd love
           | it if I had proper creative control as a musician that'd be
           | amazing but a lot of the time it's just straight up slop
           | generation.
        
       | deviation wrote:
       | So this confirms a best-in-class model release within the next
       | few days?
       | 
       | From a strategic perspective, I can't think of any reason they'd
       | release this unless they were about to announce something which
       | totally eclipses it?
        
         | og_kalu wrote:
         | Even before today, the last week or so, it's been clear for a
         | couple reasons, that GPT-5's release was imminent.
        
         | ticulatedspline wrote:
         | Even without an imminent release it's a good strategy. They're
         | getting pressure from Qwen and other high performing open-
         | weight models. without a horse in the race they could fall
         | behind in an entire segment.
         | 
         | There's future opportunity in licensing, tech support, agents,
         | or even simply to dominate and eliminate. Not to mention brand
         | awareness, If you like these you might be more likely to
         | approach their brand for larger models.
        
         | winterrx wrote:
         | GPT-5 coming Thursday.
        
           | boringg wrote:
           | How much hype do we anticipate with the release of GPT-5 or
           | whichever name to be included? And how many new features?
        
             | selectodude wrote:
             | Excited to have to send them a copy of my drivers license
             | to try and use it. That'll take the hype down a notch.
        
             | XCSme wrote:
             | Imagine if it's called GPT-4.5o
        
           | ciaranmca wrote:
           | Is this the stealth models horizon alpha and beta? I was
           | generally impressed with them(although I really only used it
           | in chats rather than any code tasks). In terms of chat I
           | increasingly see very little difference between the current
           | SOTA closed models and their open weight counterparts.
        
             | deviation wrote:
             | Their tokenization suggests they're new Qwen models AFAIK.
             | They tokenize input to the exact same # of tokens that Qwen
             | models do.
        
         | logicchains wrote:
         | > I can't think of any reason they'd release this unless they
         | were about to announce something which totally eclipses it
         | 
         | Given it's only around 5 billion active params it shouldn't be
         | a competitor to o3 or any of the other SOTA models, given the
         | top Deepseek and Qwen models have around 30 billion active
         | params. Unless OpenAI somehow found a way to make a model with
         | 5 billion active params perform as well as one with 4-8 times
         | more.
        
         | bredren wrote:
         | Undoubtedly. It would otherwise reduce the perceived value of
         | their current product offering.
         | 
         | The question is how much better the new model(s) will need to
         | be on the metrics given here to feel comfortable making these
         | available.
         | 
         | Despite the loss of face for lack of open model releases, I do
         | not think that was a big enough problem t undercut commercial
         | offerings.
        
         | FergusArgyll wrote:
         | Thursday
         | 
         | https://manifold.markets/Bayesian/on-what-day-will-gpt5-be-r...
        
       | artembugara wrote:
       | Disclamer: probably dumb questions
       | 
       | so, the 20b model.
       | 
       | Can someone explain to me what I would need to do in terms of
       | resources (GPU, I assume) if I want to run 20 concurrent
       | processes, assuming I need 1k tokens/second throughput (on each,
       | so 20 x 1k)
       | 
       | Also, is this model better/comparable for information extraction
       | compared to gpt-4.1-nano, and would it be cheaper to host myself
       | 20b?
        
         | mythz wrote:
         | gpt-oss:20b is ~14GB on disk [1] so fits nicely within a 16GB
         | VRAM card.
         | 
         | [1] https://ollama.com/library/gpt-oss
        
           | artembugara wrote:
           | thanks, this part is clear to me.
           | 
           | but I need to understand 20 x 1k token throughput
           | 
           | I assume it just might be too early to know the answer
        
             | Tostino wrote:
             | I legitimately cannot think of any hardware that will get
             | you to that throughput over that many streams with any of
             | the hardware I know of (I don't work in the server space so
             | there may be some new stuff I am unaware of).
        
               | artembugara wrote:
               | oh, I totally understand that I'd need multiple GPUs. I'd
               | just want to know what GPU specifically and how many
        
               | Tostino wrote:
               | I don't think you can get 1k tokens/sec on a single
               | stream using any consumer grade GPUs with a 20b model.
               | Maybe you could with H100 or better, but I somewhat doubt
               | that.
               | 
               | My 2x 3090 setup will get me ~6-10 streams of ~20-40
               | tokens/sec (generation) ~700-1000 tokens/sec (input) with
               | a 32b dense model.
        
           | dragonwriter wrote:
           | You also need space in VRAM for what is required to support
           | the context window; you might be able to do a model that is
           | 14GB in parameters with a small (~8k maybe?) context window
           | on a 16GB card.
        
         | petuman wrote:
         | > assuming I need 1k tokens/second throughput (on each, so 20 x
         | 1k)
         | 
         | 3.6B activated at Q8 x 1000 t/s = 3.6TB/s just for activated
         | model weights (there's also context). So pretty much straight
         | to B200 and alike. 1000 t/s per user/agent is way too fast,
         | make it 300 t/s and you could get away with 5090/RTX PRO 6000.
        
         | mlyle wrote:
         | An A100 is probably 2-4k tokens/second on a 20B model with
         | batched inference.
         | 
         | Multiply the number of A100's you need as necessary.
         | 
         | Here, you don't really need the ram. If you could accept fewer
         | tokens/second, you could do it much cheaper with consumer
         | graphics cards.
         | 
         | Even with A100, the sweet-spot in batching is not going to give
         | you 1k/process/second. Of course, you could go up to H100...
        
           | d3m0t3p wrote:
           | You can batch only if you have distinct chat in parallel,
        
             | mlyle wrote:
             | > > if I want to run _20 concurrent processes_ , assuming I
             | need 1k tokens/second throughput _(on each)_
        
         | spott wrote:
         | Groq is offering 1k tokens per second for the 20B model.
         | 
         | You are unlikely to match groq on off the shelf hardware as far
         | as I'm aware.
        
         | PeterStuer wrote:
         | (answer for 1 inference) Al depends on the context length you
         | want to support as the activation memory will dominate the
         | requirements. For 4096 tokens you will get away with 24GB (or
         | even 16GB), but if you want to go for the full 131072 tokens
         | you are not going to get there with a 32GB consumer GPU like
         | the 5090. You'll need to spring for at the minimum an A6000
         | (48GB) or preferably an RTX 6000 Pro (96GB).
         | 
         | Also keep in mind this model does use 4-bit layers for the MoE
         | parts. Unfortunately native accelerated 4-bit support only
         | started with Blackwell on NVIDIA. So your
         | 3090/4090/A6000/A100's are not going to be fast. An RTX 5090
         | will be your best starting point in the traditional card space.
         | Maybe the unified memory minipc's like the Spark systems or the
         | Mac mini could be an alternative, but I do not know them
         | enough.
        
           | vl wrote:
           | How Macs compare to RTXs for this? I.e. what numbers can be
           | expected from Mac mini/Mac Studio with 64/128/256/512GB of
           | unified memory?
        
         | coolspot wrote:
         | https://apxml.com/tools/vram-calculator
        
       | hubraumhugo wrote:
       | Meta's goal with Llama was to target OpenAI with a "scorched
       | earth" approach by releasing powerful open models to disrupt the
       | competitive landscape. Looks like OpenAI is now using the same
       | playbook.
        
         | tempay wrote:
         | It seems like the various Chinese companies are far outplaying
         | Meta at that game. It remains to be seen if they're able to
         | throw money at the problem to turn things around.
        
           | SV_BubbleTime wrote:
           | Good move for China. No one was going to trust their models
           | outright, now they not only have a track record, but they
           | were able to undercut the value of US models at the same
           | time.
        
       | k2xl wrote:
       | Is there any details about hardware requirements for a sensible
       | tokens per second for each size of these models?
        
       | minimaxir wrote:
       | I'm disappointed that the smallest model size is 21B parameters,
       | which strongly restricts how it can be run on personal hardware.
       | Most competitors have released a 3B/7B model for that purpose.
       | 
       | For self-hosting, it's smart that they targeted a 16GB VRAM
       | config for it since that's the size of the most cost-effective
       | server GPUs, but I suspect "native MXFP4 quantization" has
       | quality caveats.
        
         | moffkalast wrote:
         | Eh 20B is pretty managable, 32GB of regular RAM and some VRAM
         | will run you a 30B with partial offloading. After that it gets
         | tricky.
        
         | 4b6442477b1280b wrote:
         | with quantization, 20B fits effortlessly in 24GB
         | 
         | with quantization + CPU offloading, non-thinking models run
         | kind of fine (at about 2-5 tokens per second) even with 8 GB of
         | VRAM
         | 
         | sure, it would be great if we could have models in all sizes
         | imaginable (7/13/24/32/70/100+/1000+), but 20B and 120B are
         | great.
        
         | Tostino wrote:
         | I am not at all disappointed. I'm glad they decided to go for
         | somewhat large but reasonable to run models on everything but
         | phones.
         | 
         | Quite excited to give this a try
        
         | strangecasts wrote:
         | A small part of me is considering going from a 4070 to a 16GB
         | 5060 Ti just to avoid having to futz with offloading
         | 
         | I'd go for an ..80 card but I can't find any that fit in a
         | mini-ITX case :(
        
           | SV_BubbleTime wrote:
           | I wouldn't stop at 16GB right now.
           | 
           | 24 is the lowest I would go. Buy a used 3090. Picked one up
           | for $700 a few months back, but I think they were on the rise
           | then.
           | 
           | The 3000 series can't do FP8fast, but meh. It's the OOM
           | that's tough, not the speed so much.
        
             | strangecasts wrote:
             | Are there any 24GB cards/3090s which fit in ~300mm without
             | an angle grinder?
        
               | metalliqaz wrote:
               | if you're going to get that kind of hardware, you need a
               | larger case. IMHO this is not an unreasonable thing if
               | you are doing heavy computing
        
               | strangecasts wrote:
               | Noted for my next build - I am aware this is a problem
               | I've made for myself, _otherwise_ I like the mini-ITX
               | form factor a lot
        
               | SV_BubbleTime wrote:
               | Which do you like more OOM for local AI, or an itty bit
               | case?
        
               | zigzag312 wrote:
               | https://skinflint.co.uk/?cat=gra16_512&hloc=uk&v=e&hloc=a
               | t&h...
               | 
               | 5070 Ti Super will also have 24GB.
        
               | strangecasts wrote:
               | Oh nice, thank you :)
               | 
               | Admittedly a little tempting to see how the 5070 Ti Super
               | shakes out!
        
               | zigzag312 wrote:
               | I'm waiting too :)
               | 
               | 50xx series supports MXFP4 format, but I'm not sure about
               | 3090.
        
         | hnuser123456 wrote:
         | Native FP4 quantization means it requires half as many bytes as
         | parameters, and will have next to zero quality loss (on the
         | order of 0.1%) compared to using twice the VRAM and
         | exponentially more expensive hardware. FP3 and below gets
         | messier.
        
       | Disposal8433 wrote:
       | Please don't use the open-source term unless you ship the TBs of
       | data downloaded from Anna's Archive that are required do build it
       | yourself. And dont forget all the system prompts to censor the
       | multiple topics that they don't want you to see.
        
         | rvnx wrote:
         | I don't know why you got so much downvoted, these models are
         | not open-source/open-recipes. They are censored open weights
         | models. Better than nothing, but far from being Open
        
           | a_vanderbilt wrote:
           | Most people don't really care all that much about the
           | distinction. It comes across to them as linguistic pedantry
           | and they downvote it to show they don't want to hear/read it.
        
         | outlore wrote:
         | by your definition most of the current open weight models would
         | not qualify
        
           | layer8 wrote:
           | That's why they are called open weight and not open source.
        
           | robotmaxtron wrote:
           | Correct. I agree with them, most of the open weight models
           | are not open source.
        
         | someperson wrote:
         | Keep fighting the "open weights" terminology fight, because
         | diluting the term open-source for a blob of neural network
         | weights (even inference code is open-source) is not open-
         | source.
        
         | mhh__ wrote:
         | The system prompt is an inference parameter, no?
        
         | Quarrel wrote:
         | Is your point really that- "I need to see all data downloaded
         | to make this model, before I can know it is open"? Do you have
         | $XXB worth of GPU time to ingest that data with a state of the
         | art framework to make a model? I don't. Even if I did, I'm not
         | sure FB or Google are in any better position to claim this
         | model is or isn't open beyond the fact that the weights are
         | there.
         | 
         | They're giving you a free model. You can evaluate it. You can
         | sue them. But the weights are there. If you dislike the way
         | they license the weights, because the license isn't open
         | enough, then sure, speak up, but because you can't see all the
         | training data??! Wtf.
        
           | ticulatedspline wrote:
           | To many people there's an important distinction between "open
           | source" and "open weights". I agree with the distinction,
           | open source has a particular meaning which is not really here
           | and misuse is worth calling out in order to prevent erosion
           | of the terminology.
           | 
           | Historically this would be like calling a free but closed-
           | source application "open source" simply because the
           | application is free.
        
           | layer8 wrote:
           | The parent's point is that open weight is not the same as
           | open source.
           | 
           | Rough analogy:
           | 
           | SaaS = AI as a service
           | 
           | Locally executable closed-source software = open-weight model
           | 
           | Open-source software = open-source model (whatever allows to
           | reproduce the model from training data)
        
           | NicuCalcea wrote:
           | I don't have the $XXbn to train a model, but I certainly
           | would like to know what the training data consists of.
        
           | seba_dos1 wrote:
           | Do you need to see the source code used to compile this
           | binary before you can know it is open? Do you have enough
           | disk storage and RAM available to compile Chromium on your
           | laptop? I don't.
        
           | nexttk wrote:
           | I agree with OP - the weights are more akin to the binary
           | output from a compiler. You can't see how it works, how it
           | was made, you can't freely manipulate with it, improve it,
           | extend it etc. It's like having a binary of a program. The
           | source code for the model was the training data. The compiler
           | is the tooling that can train a module based on a given set
           | of training data. For me it is not critical for an open
           | source model that it is ONLY distributed in source code form.
           | It is fine that you can also download just the weights. But
           | it should be possible to reproduce the weights - either there
           | should be a tar.gz ball with all the training data, or there
           | needs to be a description/scripts of how one could obtain the
           | training data. It must be reproducible for someone willing to
           | invest the time, compute into it even if 99.999% use only the
           | binary. This is completely analogous to what is normally
           | understood by open source.
        
         | NitpickLawyer wrote:
         | It's apache2.0, so by definition it's open source. Stop pushing
         | for training data, it'll never happen, and there's literally 0
         | reason for it to happen (both theoretical and practical).
         | Apache2.0 _IS_ opensource.
        
           | organsnyder wrote:
           | What is the source that's open? Aren't the models themselves
           | more akin to compiled code than to source code?
        
             | NitpickLawyer wrote:
             | No, not compiled code. Weights are hardcoded values. Code
             | is the combination of model architecture + config +
             | inferencing engine. You run inference based on the
             | architecture (what and when to compute), using some
             | hardcoded values (weights).
        
               | seba_dos1 wrote:
               | JVM bytecode is hardcoded values. Code is the virtual
               | machine implementation + config + operating system it
               | runs on. You run classes based on the virtual machine,
               | using some hardcoded input data generated by javac.
        
           | _flux wrote:
           | No, it's open weight. You wouldn't call applications with
           | only Apache 2.0-licensed binaries "open source". The weights
           | are not the "source code" of the model, they are the
           | "compiled" binary, therefore they are not open source.
           | 
           | However, for the sake of argument let's say this release
           | should be called open source.
           | 
           | Then what do you call a model that also comes with its
           | training material and tools to reproduce the model? Is it
           | also called open source, and there is no material difference
           | between those two releases? Or perhaps those two different
           | terms should be used for those two different kind of
           | releases?
           | 
           | If you say that actually open source releases are impossible
           | now (for mostly copyright reasons I imagine), it doesn't mean
           | that they will be perpetually so. For that glorious future,
           | we can leave them space in the terminology by using the term
           | open weight. It is also the term that should not be
           | misleading to anyone.
        
           | WhyNotHugo wrote:
           | It's open source, but it's a binary-only release.
           | 
           | It's like getting a compiled software with an Apache license.
           | Technically open source, but you can't modify and recompile
           | since you don't have the source to recompile. You can still
           | tinker with the binary tho.
        
             | NitpickLawyer wrote:
             | Weights are not binary. I have no idea why this is so often
             | spread, it's simply not true. You can't do anything with
             | the weights themselves, you can't "run" the weights.
             | 
             | You run inference (via a library) on a model using it's
             | architecture (config file), tokenizer (what and when to
             | compute) based on weights (hardcoded values). That's it.
             | 
             | > but you can't modify
             | 
             | Yes, you can. It's called finetuning. And, most
             | importantly, that's _exactly_ how the model creators
             | themselves are  "modifying" the weights! No sane lab is
             | "recompiling" a model every time they change something.
             | They perform a pre-training stage (feed everything and the
             | kitchen sink), they get the hardcoded values (weights), and
             | then they post-train using "the same" (well, maybe their
             | techniques are better, but still the same concept) as you
             | or I would. Just with more compute. That's it. You can do
             | the exact same modifications, using basically the same
             | concepts.
             | 
             | > don't have the source to recompile
             | 
             | In pure practical ways, neither do the labs. Everyone that
             | has trained a big model can tell you that the process is so
             | finicky that they'd eat a hat if a big train session can be
             | somehow made reproducible to the bit. Between nodes
             | failing, datapoints balooning your loss and having to go
             | back, and the myriad of other problems, what you get out of
             | a big training run is not guaranteed to be the same even
             | with 100 - 1000 more attempts, in practice. It's simply the
             | nature of training large models.
        
               | koolala wrote:
               | You can do a lot with a binary also. That's what game
               | mods are all about.
        
               | squeaky-clean wrote:
               | A binary does not mean an executable. A PNG is a binary.
               | I could have an SVG file, render it as a PNG and release
               | that with CC0, it doesn't make my PNG open source. Model
               | Weights are binary files.
        
             | seba_dos1 wrote:
             | Slapping an open license onto a binary can be a valid use
             | of such license, but does not make your project open
             | source.
        
           | jlokier wrote:
           | _> It 's apache2.0, so by definition it's open source._
           | 
           | That's not true by any of the open source definitions in
           | common use.
           | 
           |  _Source code_ (and, optionally, derived binaries) under the
           | Apache 2.0 license are open source.
           | 
           | But _compiled binaries_ (without access to source) under the
           | Apache 2.0 license are not open source, even though the
           | license does give you some rights over what you can do with
           | the binaries.
           | 
           | Normally the question doesn't come up, because it's so
           | unusual, strange and contradictory to ship closed-source
           | binaries with an open source license. Descriptions of which
           | licenses qualify as open source licenses assume the context
           | that _of course_ you have the source or could get it, and it
           | 's a question of what you're allowed to do with it.
           | 
           | The distinction is more obvious if you ask the same question
           | about other open source licenses such as GPL or MPL. A
           | compiled binary (without access to source) shipped with a GPL
           | license is not by any stretch open source. Not only is it not
           | in the "preferred form for editing" as the license requires,
           | it's not even permitted for someone who receives the file to
           | give it to someone else and comply with the license. If
           | someone who receives the file can't give it to anyone else
           | (legally), then it's obvioiusly not open source.
        
             | NitpickLawyer wrote:
             | Please see the detailed response to a sibling post. tl;dr;
             | weights are not binaries.
        
               | jlokier wrote:
               | "Compiled binaries" are just meant to be an example. For
               | the purpose of whether something is open source, it
               | doesn't matter whether something is a "binary" or
               | something completely different.
               | 
               | What matters (for all common definitions of open source):
               | Are the files in "source form" (which has a definition),
               | or are they "derived works" of the source form?
               | 
               | Going back to Apache 2.0. Although that doesn't define
               | "open source", it provides legal definitions of source
               | and non-source, which are similar to the definitions used
               | in other open source licenses.
               | 
               | As you can see below, for Apache 2.0 it doesn't matter
               | whether something is a "binary", "weights" or something
               | else. What matters is whether it's the "preferred form
               | for making modifications" or a "form resulting from
               | mechanical transformation or translation". My highlights
               | are capitalized:
               | 
               | - Apache License Version 2.0, January 2004
               | 
               | - 1. Definitions:
               | 
               | - "Source" form shall mean the _PREFERRED FORM FOR MAKING
               | MODIFICATIONS_ , including _BUT NOT LIMITED TO_ software
               | source code, documentation source, and configuration
               | files.
               | 
               | - "Object" form shall mean any form resulting from
               | _MECHANICAL TRANSFORMATION OR TRANSLATION_ of a Source
               | form, including _BUT NOT LIMITED TO_ compiled object
               | code, generated documentation, and conversions to other
               | media types.
        
               | NitpickLawyer wrote:
               | > "Source" form shall mean the PREFERRED FORM FOR MAKING
               | MODIFICATIONS, including BUT NOT LIMITED TO software
               | source code, documentation source, and configuration
               | files.
               | 
               | Yes, weights are the PREFFERED FORM FOR MAKING
               | MODIFICATIONS!!! You, the labs, and anyone sane modifies
               | the weights via post-training. This is the point. The
               | labs don't re-train every time they want to change the
               | model. They finetune. You can do that as well, with the
               | same tools/concepts, AND YOU ARE ALLOWED TO DO THAT by
               | the license. And redistribute. And all the other stuff.
        
       | x187463 wrote:
       | Running a model comparable to o3 on a 24GB Mac Mini is absolutely
       | wild. Seems like yesterday the idea of running frontier (at the
       | time) models locally or on a mobile device was 5+ years out. At
       | this rate, we'll be running such models in the next phone cycle.
        
         | tedivm wrote:
         | It only seems like that if you haven't been following other
         | open source efforts. Models like Qwen perform ridiculously well
         | and do so on very restricted hardware. I'm looking forward to
         | seeing benchmarks to see how these new open source models
         | compare.
        
           | Rhubarrbb wrote:
           | Agreed, these models seem relatively mediocre to Qwen3 / GLM
           | 4.5
        
             | modeless wrote:
             | Nah, these are much smaller models than Qwen3 and GLM 4.5
             | with similar performance. Fewer parameters and fewer bits
             | per parameter. They are much more impressive and will run
             | on garden variety gaming PCs at more than usable speed. I
             | can't wait to try on my 4090 at home.
             | 
             | There's basically no reason to run other open source models
             | now that these are available, at least for non-multimodal
             | tasks.
        
               | tedivm wrote:
               | Qwen3 has multiple variants ranging from larger (230B)
               | than these models to significantly smaller (0.6b), with a
               | huge number of options in between. For each of those
               | models they also release quantized versions (your "fewer
               | bits per parameter).
               | 
               | I'm still withholding judgement until I see benchmarks,
               | but every point you tried to make regarding model size
               | and parameter size is wrong. Qwen has more variety on
               | every level, and performs extremely well. That's before
               | getting into the MoE variants of the models.
        
               | modeless wrote:
               | The benchmarks of the OpenAI models are comparable to the
               | largest variants of other open models. The smaller
               | variants of other open models are much worse.
        
               | mrbungie wrote:
               | I would wait for neutral benchmarks before making any
               | conclusions.
        
               | bigyabai wrote:
               | With all due respect, you need to actually test out Qwen3
               | 2507 or GLM 4.5 before making these sorts of claims. Both
               | of them are comparable to OpenAI's largest models and
               | even bench favorably to Deepseek and Opus: https://cdn-
               | uploads.huggingface.co/production/uploads/62430a...
               | 
               | It's cool to see OpenAI throw their hat in the ring, but
               | you're smoking straight hopium if you think there's "no
               | reason to run other open source models now" in earnest.
               | If OpenAI never released these models, the state-of-the-
               | art would not look significantly different for local
               | LLMs. This is almost a nothingburger if not for the
               | simple novelty of OpenAI releasing an Open AI for once in
               | their life.
        
               | modeless wrote:
               | > Both of them are comparable to OpenAI's largest models
               | and even bench favorably to Deepseek and Opus
               | 
               | So are/do the new OpenAI models, except they're much
               | smaller.
        
               | UrineSqueegee wrote:
               | I'd really wait for additional neutral benchmarks, I
               | asked the 20b model on low reasoning effort which number
               | is larger 9.9 or 9.11 and it got it wrong.
               | 
               | Qwen-0.6b gets it right.
        
               | bigyabai wrote:
               | According to the early benchmarks, it's looking like
               | you're just flat-out wrong:
               | https://blog.brokk.ai/a-first-look-at-gpt-oss-120bs-
               | coding-a...
        
               | sourcecodeplz wrote:
               | From my initial web developer test on https://www.gpt-
               | oss.com/ the 120b is kind of meh. Even qwen3-coder
               | 30b-a3b is better. have to test more.
        
               | thegeomaster wrote:
               | They have worse scores than recent open source releases
               | on a number of agentic and coding benchmarks, so if
               | absolute quality is what you're after and not just
               | cost/efficiency, you'd probably still be running those
               | models.
               | 
               | Let's not forget, this is a thinking model that has a
               | significantly worse scores on Aider-Polyglot than the
               | non-thinking Qwen3-235B-A22B-Instruct-2507, a worse
               | TAUBench score than the smaller GLM-4.5 Air, and a worse
               | SWE-Bench verified score than the (3x the size) GLM-4.5.
               | So the results, at least in terms of benchmarks, are not
               | really clear-cut.
               | 
               | From a vibes perspective, the non-reasoners
               | Kimi-K2-Instruct and the aforementioned non-thinking
               | Qwen3 235B are much better at frontend design. (Tested
               | privately, but fully expecting DesignArena to back me up
               | in the following weeks.)
               | 
               | OpenAI has delivered something astonishing for the size,
               | for sure. But your claim is just an exaggeration. And
               | OpenAI have, unsurprisingly, highlighted only the
               | benchmarks where they do _really_ well.
        
             | moralestapia wrote:
             | You can always get your $0 back.
        
               | Imustaskforhelp wrote:
               | I have never agreed with a comment so much but we are all
               | addicted to open source models now.
        
               | recursive wrote:
               | Not all of us. I've yet to get much use out of any of the
               | models. This may be a personal failing. But still.
        
               | satvikpendem wrote:
               | Depends on how much you paid for the hardware to run em
               | on
        
             | cvadict wrote:
             | Yes, but they are suuuuper safe. /s
             | 
             | So far I have mixed impressions, but they do indeed seem
             | noticeably weaker than comparably-sized Qwen3 / GLM4.5
             | models. Part of the reason may be that the oai models do
             | appear to be much more lobotomized than their Chinese
             | counterparts (which are surprisingly uncensored). There's
             | research showing that "aligning" a model makes it dumber.
        
               | xwolfi wrote:
               | The censorship here in China is only about public
               | discussions / spaces. You cannot like have a website
               | telling you about the crimes of the party. But
               | downloading some compressed matrix re-spouting the said
               | crimes, nobody gives a damn.
               | 
               | We seem to censor organized large scale complaints and
               | viral mind virii, but we never quite forbid people at
               | home to read some generated knowledge from an obscure
               | hard to use software.
        
           | echelon wrote:
           | This might mean there's no moat for anything.
           | 
           | Kind of a P=NP, but for software deliverability.
        
             | CamperBob2 wrote:
             | On the subject of who has a moat and who doesn't, it's
             | interesting to look the role of patents in the early
             | development of wireless technology. There was WWI, and
             | there was WWII, but the players in the nascent radio
             | industry had _serious_ beef with each other.
             | 
             | I imagine the same conflicts will ramp up over the next few
             | years, especially once the silly money starts to dry up.
        
         | a_wild_dandan wrote:
         | Right? I still remember the safety outrage of releasing Llama.
         | Now? My 96 GB of (V)RAM MacBook will be running a 120B
         | parameter frontier lab model. So excited to get my hands on the
         | MLX quants and see how it feels compared to GLM-4.5-air.
        
           | 4b6442477b1280b wrote:
           | in that era, OpenAI and Anthropic were still deluding
           | themselves into thinking they would be the "stewards" of
           | generative AI, and the last US administration was very keen
           | on regoolating everything under the sun, so "safety" was just
           | an angle for regulatory capture.
           | 
           | God bless China.
        
             | narrator wrote:
             | Yeah, China is e/acc. Nice cheap solar panels too. Thanks
             | China. The problem is their ominous policies like not
             | allowing almost any immigration, and their domestic Han
             | Supremacist propaganda, and all that make it look a bit
             | like this might be Han Supremacy e/acc. Is it better than
             | wester/decel? Hard to say, but at least the western/decel
             | people are now starting to talk about building power
             | plants, at least for datacenters, and things like that
             | instead of demanding whole branches of computer science be
             | classified, as they were threatening to Marc Andreessen
             | when he visited the Biden admin last year.
        
               | 01HNNWZ0MV43FF wrote:
               | I wish we had voter support for a hydrocarbon tax,
               | though. It would level out the prices and then the AI
               | companies can decide whether they want to pay double to
               | burn pollutants or invest in solar and wind and batteries
        
               | AtlasBarfed wrote:
               | Oh poor oppressed marc andreesen. Someone save him!
        
             | a_wild_dandan wrote:
             | Oh absolutely, AI labs certainly talk their books,
             | including any safety angles. The controversy/outrage
             | extended far beyond those incentivized companies too. Many
             | people had good faith worries about Llama. Open-weight
             | models are now _vastly_ more powerful than Llama-1, yet the
             | sky hasn 't fallen. It's just fascinating to me how
             | apocalyptic people are.
             | 
             | I just feel lucky to be around in what's likely the most
             | important decade in human history. Shit odds on that, so
             | I'm basically a lotto winner. Wild times.
        
               | 4b6442477b1280b wrote:
               | >Many people had good faith worries about Llama.
               | 
               | ah, but that begs the question: did those people develop
               | their worries organically, or did they simply consume the
               | narrative heavily pushed by virtually every mainstream
               | publication?
               | 
               | the journos are _heavily_ incentivized to spread FUD
               | about it. they saw the writing on the wall that the days
               | of making a living by producing clickbait slop were
               | coming to an end and deluded themselves into thinking
               | that if they kvetch enough, the genie will crawl back
               | into the bottle. scaremongering about sci-fi skynet
               | bullshit didn 't work, so now they kvetch about joules
               | and milliliters consumed by chatbots, as if data centers
               | did not exist until two years ago.
               | 
               | likewise, the bulk of other "concerned citizens" are
               | creatives who use their influence to sway their
               | followers, still hoping against hope to kvetch this
               | technology out of existence.
               | 
               | honest-to-God yuddites are as few and as retarded as
               | honest-to-God flat earthers.
        
               | kridsdale3 wrote:
               | I've been pretty unlucky to have encountered more than my
               | fair share of IRL Yuddites. Can't stand em.
        
               | ipaddr wrote:
               | "the most important decade in human history."
               | 
               | Lol. To be young and foolish again. This covid laced
               | decade is more of a placeholder. The current decade is
               | always the most meaningful until the next one. The
               | personal computer era, the first cars or planes, ending
               | slavery needs to take a backseat to the best search
               | engine ever. We are at the point where everyone is
               | planning on what they are going to do with their
               | hoverboards.
        
               | graemep wrote:
               | > ending slavery
               | 
               | happened over many centuries, not in a given decade.
               | Abolished and reintroduced in many places: https://en.wik
               | ipedia.org/wiki/Timeline_of_abolition_of_slave...
        
               | dingnuts wrote:
               | you can say the same shit about machine learning but
               | ChatGPT was still the Juneteenth of AI
        
               | hedora wrote:
               | Slavery is still legal and widespread in most of the US,
               | including California.
               | 
               | There was a ballot measure to actually abolish slavery a
               | year or so back. It failed miserably.
        
               | BizarroLand wrote:
               | The slavery of free humans is illegal in America, so now
               | the big issue is figuring out how to convince voters that
               | imprisoned criminals deserve rights.
               | 
               | Even in liberal states, the dehumanization of criminals
               | is an endemic behavior, and we are reaching the point in
               | our society where ironically having the leeway to discuss
               | the humane treatment of even our worst criminals is
               | becoming an issue that affects how we see ourselves as a
               | society before we even have a framework to deal with the
               | issue itself.
               | 
               | What one side wants is for prisons to be for
               | rehabilitation and societal reintegration, for prisoners
               | to have the right to decline to work and to be paid fair
               | wages from their labor. They further want to remove for-
               | profit prisons from the equation completely.
               | 
               | What the other side wants is the acknowledgement that
               | prisons are not free, they are for punishment, and that
               | prisoners have lost some of their rights for the duration
               | of their incarceration and that they should be required
               | to provide labor to offset the tax burden of their
               | incarceration on the innocent people that have to pay for
               | it. They also would like it if all prisons were for-
               | profit as that would remove the burden from the tax
               | payers and place all of the costs of incarceration onto
               | the shoulders of the incarcerated.
               | 
               | Both sides have valid and reasonable wants from their
               | vantage point while overlooking the valid and reasonable
               | wants from the other side.
        
               | recursive wrote:
               | > slavery of free humans is illegal
               | 
               | That's kind of vacuously true though, isn't it?
        
               | chromatin wrote:
               | I think his point is that slavery is not outlawed by the
               | 13th amendment as most people assume (even the Google AI
               | summary reads: "The 13th Amendment to the United States
               | Constitution, ratified in 1865, officially abolished
               | slavery and involuntary servitude in the United
               | States.").
               | 
               | However, if you actually read it, the 13th amendment
               | makes an explicit allowance for slavery (i.e. expressly
               | allows it):
               | 
               | "Neither slavery nor involuntary servitude, *except as a
               | punishment for crime whereof the party shall have been
               | duly convicted*" (emphasis mine obviously since Markdown
               | didn't exist in 1865)
        
               | SR2Z wrote:
               | Prisoners themselves are the ones choosing to work most
               | of the time, and generally none of them are REQUIRED to
               | work (they are required to either take job training or
               | work).
               | 
               | They choose to because extra money = extra commissary
               | snacks and having a job is preferable to being bored out
               | of their minds all day.
               | 
               | That's the part that's frequently not included in the
               | discussion of this whenever it comes up. Prison jobs
               | don't pay minimum wage, but given that prisoners are
               | wards of the state that seems reasonable.
        
               | BizarroLand wrote:
               | I have heard anecdotes that the choice of doing work is a
               | choice between doing work and being in solitary
               | confinement or becoming the target of the guards who do
               | not take kindly to prisoners who don't volunteer for work
               | assignments.
        
               | graemep wrote:
               | I do not think you can equate making prisoners work with
               | slavery. Other countries do the same, and it is not
               | regarded as slavery in general.
               | 
               | If people were sold into slavery as a punishment (so they
               | became some one else's property) as some ancient
               | societies did, then that would clearly be slavery.
               | 
               | The most shocking thing about prisons in the US is how
               | common prison rape is, and the extent to which it seems
               | to be regarded as a joke. The majority of rapes in the US
               | are prison rapes. How can that not be anything but an
               | appalling problem?
               | 
               | https://en.wikipedia.org/wiki/Prison_rape_in_the_United_S
               | tat...
               | 
               | Rape is also something slaves are casually subject to in
               | most slave societies. It was definitely accept that Roman
               | slave owners were free to rape men, women and children
               | they owned.
        
               | ninjagoo wrote:
               | The US Constitution's 13th Amendment abolishing slavery
               | specifically allows it for convicted people. [1]
               | 
               | You'll see from the definition of a "slave" [2] that
               | prisoner labor specifically fits the definition of a
               | slave, hence why the constitution makes an exception for
               | it.
               | 
               | [1] https://constitutioncenter.org/the-
               | constitution/amendments/a... [2]
               | https://www.oed.com/dictionary/slave_n?tl=true
        
               | vlmutolo wrote:
               | About 7% of people who have ever lived are alive today.
               | Still pretty lucky, but not quite winning the lottery.
        
               | foltik wrote:
               | Much luckier if you consider everyone who ever will live,
               | assuming we don't destroy ourselves.
        
           | SchemaLoad wrote:
           | I feel like most of the safety concerns ended up being proven
           | correct, but there's so much money in it that they decided to
           | push on anyway full steam ahead.
           | 
           | AI did get used for fake news, propaganda, mass surveillance,
           | erosion of trust and sense of truth, and mass spamming social
           | media.
        
         | bogtog wrote:
         | When people talk about running a (quantized) medium-sized model
         | on a Mac Mini, what types of latency and throughput times are
         | they talking about? Do they mean like 5 tokens per second or at
         | an actually usable speed?
        
           | n42 wrote:
           | here's a quick recording from the 20b model on my 128GB M4
           | Max MBP: https://asciinema.org/a/AiLDq7qPvgdAR1JuQhvZScMNr
           | 
           | and the 120b:
           | https://asciinema.org/a/B0q8tBl7IcgUorZsphQbbZsMM
           | 
           | I am, um, floored
        
             | Davidzheng wrote:
             | the active param count is low so it should be fast.
        
             | Rhubarrbb wrote:
             | Generation is usually fast, but prompt processing is the
             | main limitation with local agents. I also have a 128 GB M4
             | Max. How is the prompt processing on long prompts?
             | processing the system prompt for Goose always takes quite a
             | while for me. I haven't been able to download the 120B yet,
             | but I'm looking to switch to either that or the GLM-4.5-Air
             | for my main driver.
        
               | anonymoushn wrote:
               | it's odd that the result of this processing cannot be
               | cached.
        
               | lostmsu wrote:
               | It can be and it is by most good processing frameworks.
        
               | ghc wrote:
               | Here's a sample of running the 120b model on Ollama with
               | my MBP:
               | 
               | ```
               | 
               | total duration: 1m14.16469975s
               | 
               | load duration: 56.678959ms
               | 
               | prompt eval count: 3921 token(s)
               | 
               | prompt eval duration: 10.791402416s
               | 
               | prompt eval rate: 363.34 tokens/s
               | 
               | eval count: 2479 token(s)
               | 
               | eval duration: 1m3.284597459s
               | 
               | eval rate: 39.17 tokens/s
               | 
               | ```
        
               | andai wrote:
               | You mentioned "on local agents". I've noticed this too.
               | How do ChatGPT and the others get around this, and
               | provide instant responses on long conversations?
        
               | bluecoconut wrote:
               | Not getting around it, just benefiting from parallel
               | compute / huge flops of GPUs. Fundamentally, it's just
               | that prefill compute is itself highly parallel and HBM is
               | just that much faster than LPDDR. Effectively H100s and
               | B100s can chew through the prefill in under a second at
               | ~50k token lengths, so the TTFT (Time to First Token) can
               | feel amazingly fast.
        
               | mike_hearn wrote:
               | They cache the intermediate data (KV cache).
        
           | phonon wrote:
           | Here's a 4bit 70B parameter model,
           | https://www.youtube.com/watch?v=5ktS0aG3SMc (deepseek-r1:70b
           | Q4_K_M) on a M4 Max 128 GB. Usable, but not very performant.
        
           | a_wild_dandan wrote:
           | GLM-4.5-air produces tokens far faster than I can read on my
           | MacBook. That's plenty fast enough for me, but YMMV.
        
           | davio wrote:
           | On a M1 MacBook Air with 8GB, I got this running Gemma 3n:
           | 
           | 12.63 tok/sec * 860 tokens * 1.52s to first token
           | 
           | I'm amazed it works at all with such limited RAM
        
             | v5v3 wrote:
             | I have started a crowdfunding to get you a MacBook air with
             | 16gb. You poor thing.
        
               | bookofjoe wrote:
               | Up the ante with an M4 chip
        
               | backscratches wrote:
               | not meaningfully different, m1 virtually as fast as m4
        
               | wahnfrieden wrote:
               | https://github.com/devMEremenko/XcodeBenchmark M4 is
               | almost twice as fast as M1
        
               | andai wrote:
               | In this table, M4 is also twice as fast as M4.
        
               | wahnfrieden wrote:
               | You're comparing across vanilla/Pro/Max tiers. within
               | equivalent tier, M4 is almost 2x faster than M1
        
               | v5v3 wrote:
               | Twice the cost too.
        
               | wahnfrieden wrote:
               | ?
        
               | AtlasBarfed wrote:
               | Y not meeee?
               | 
               | After considering my sarcasm for the last 5 minutes, I am
               | doubling down. The government of the United States of
               | America should enhance its higher IQ people by donating
               | AI hardware to them immediately.
               | 
               | This is critical for global competitive economic power.
               | 
               | Send me my hardware US government
        
               | xwolfi wrote:
               | higher IQ people <-- well you have to prove that first,
               | so let me ask you a test question to prove them: how can
               | you mix collaboration and competition in society to
               | produce the optimal productivity/conflict ratio ?
        
         | tyho wrote:
         | What's the easiest way to get these local models browsing the
         | web right now?
        
           | dizhn wrote:
           | aider uses Playwright. I don't know what everybody is using
           | but that's a good starting point.
        
         | Imustaskforhelp wrote:
         | Okay I will be honest, I was so hyped up about This model but
         | then I went to localllama and saw it that the:
         | 
         | 120 B model is worse at coding compared to qwen 3 coder and
         | glm45 air and even grok 3... (https://www.reddit.com/r/LocalLLa
         | MA/comments/1mig58x/gptoss1...)
        
           | logicchains wrote:
           | It's only got around 5 billion active parameters; it'd be a
           | miracle if it was competitive at coding with SOTA models that
           | have significantly more.
        
             | jph00 wrote:
             | On this bench it underperforms vs glm-4.5-air, which is an
             | MoE with fewer total params but more active params.
        
           | ascorbic wrote:
           | That's SVGBench, which is a useful benchmark but isn't much
           | of a test of general coding
        
             | Imustaskforhelp wrote:
             | Hm alright, I will see how this model actually plays around
             | instead of forming quick opinions..
             | 
             | Thanks.
        
           | pxc wrote:
           | Qwen3 Coder is 4x its size! Grok 3 is over 22x its size!
           | 
           | What does the resource usage look like for GLM 4.5 Air? Is
           | that benchmark in FP16? GPT-OSS-120B will be using between
           | 1/4 and 1/2 the VRAM that GLM-4.5 Air does, right?
           | 
           | It seems like a good showing to me, even though Qwen3 Coder
           | and GLM 4.5 Air might be preferable for some use cases.
        
         | larodi wrote:
         | We be running them in PIs off spare juice in no time, and they
         | be billions given how chips and embedded spreads...
        
       | emehex wrote:
       | So 120B was Horizon Alpha and 20B was Horizon Beta?
        
         | ImprobableTruth wrote:
         | Unfortunately not, this model is noticeably worse. I imagine
         | horizon is either gpt 5 nano/mini.
        
       | Leary wrote:
       | GPQA Diamond: gpt-oss-120b: 80.1%, Qwen3-235B-A22B-Thinking-2507:
       | 81.1%
       | 
       | Humanity's Last Exam: gpt-oss-120b (tools): 19.0%, gpt-oss-120b
       | (no tools): 14.9%, Qwen3-235B-A22B-Thinking-2507: 18.2%
        
         | jasonjmcghee wrote:
         | Wow - I will give it a try then. I'm cynical about OpenAI
         | minmaxing benchmarks, but still trying to be optimistic as this
         | in 8bit is such a nice fit for apple silicon
        
           | modeless wrote:
           | Even better, it's 4 bit
        
         | amarcheschi wrote:
         | Glm 4.5 seems on par as well
        
           | thegeomaster wrote:
           | GLM-4.5 seems to outperform it on TauBench, too. And it's
           | suspicious OAI is not sharing numbers for quite a few useful
           | benchmarks (nothing related to coding, for example).
           | 
           | One positive thing I see is the number of parameters and size
           | --- it will provide more economical inference than current
           | open source SOTA.
        
         | lcnPylGDnU4H9OF wrote:
         | Was the Qwen model using tools for Humanity's Last Exam?
        
       | chown wrote:
       | Shameless plug: if someone wants to try it in a nice ui, you
       | could give Msty[1] a try. It's private and local.
       | 
       | [1]: https://msty.ai
        
       | dsco wrote:
       | Does anyone get the demos at https://www.gpt-oss.com to work, or
       | are the servers down immediately after launch? I'm only getting
       | the spinner after prompting.
        
         | eliseumds wrote:
         | Getting lots of 502s from `https://api.gpt-oss.com/chatkit` at
         | the moment.
        
         | lukasgross wrote:
         | (I helped build the microsite)
         | 
         | Our backend is falling over from the load, spinning up more
         | resources!
        
           | anticensor wrote:
           | Why isn't GPT-OSS also offered on the free tier of ChatGPT?
        
         | lukasgross wrote:
         | Update: try now!
        
       | MutedEstate45 wrote:
       | The repeated safety testing delays might not be purely about
       | technical risks like misuse or jailbreaks. Releasing open weights
       | means relinquishing the control OpenAI has had since GPT-3. No
       | rate limits, no enforceable RLHF guardrails, no audit trail.
       | Unlike API access, open models can't be monitored or revoked. So
       | safety may partly reflect OpenAI's internal reckoning with that
       | irreversible shift in power, not just model alignment per se.
       | What do you guys think?
        
         | BoorishBears wrote:
         | I think it's pointless: if you SFT even their closed source
         | models on a specific enough task, the guardrails disappear.
         | 
         | AI "safety" is about making it so that a journalist can't get
         | out a recipe for Tabun just by asking.
        
           | MutedEstate45 wrote:
           | True, but there's still a meaningful difference in friction
           | and scale. With closed APIs, OpenAI can monitor for misuse,
           | throttle abuse and deploy countermeasures in real-time. With
           | open weights, a single prompt jailbreak or exploit spreads
           | instantly. No need for ML expertise, just a Reddit post.
           | 
           | The risk isn't that bad actors suddenly become smarter. It's
           | that anyone can now run unmoderated inference and OpenAI
           | loses all visibility into how the model's being used or
           | misused. I think that's the control they're grappling with
           | under the label of safety.
        
             | BoorishBears wrote:
             | OpenAI and Azure both have zero retention options, and the
             | NYT saga has given pretty strong confirmation they meant it
             | when they said zero.
        
               | MutedEstate45 wrote:
               | I think you're conflating real-time monitoring with data
               | retention. Zero retention means OpenAI doesn't store user
               | data, but they can absolutely still filter content, rate
               | limit and block harmful prompts in real-time without
               | retaining anything. That's processing requests as they
               | come in, not storing them. The NYT case was about data
               | storage for training/analysis not about real-time safety
               | measures.
        
               | BoorishBears wrote:
               | Ok you're off in the land of "what if" and I can just
               | flat out say: If you have a ZDR account there is no
               | filtering on inference, no real-time moderation, no
               | blocking.
               | 
               | If you use their training infrastructure there's
               | moderation on training examples, but SFT on non-harmful
               | tasks still leads to a complete breakdown of guardrails
               | very quickly.
        
             | SV_BubbleTime wrote:
             | Given that the best jailbreak for an off-line model is
             | still simple prompt injection, which is a solved issue for
             | the closed source models... I honestly don't know why they
             | are talking about safety much at all for open source.
        
       | ahmedhawas123 wrote:
       | Exciting as this is to toy around with...
       | 
       | Perhaps I missed it somewhere, but I find it frustrating that,
       | unlike most other open weight models and despite this being an
       | open release, OpenAI has chosen to provide pretty minimal
       | transparency regarding model architecture and training. It's
       | become the norm for LLama, Deepseek, Qwenn, Mistral and others to
       | provide a pretty detailed write up on the model which allows
       | researchers to advance and compare notes.
        
         | sebzim4500 wrote:
         | The model files contain an exact description of the
         | architecture of the network, there isn't anything novel.
         | 
         | Given these new models are closer to the SOTA than they are to
         | competing open models, this suggests that the 'secret sauce' at
         | OpenAI is primarily about training rather than model
         | architecture.
         | 
         | Hence why they won't talk about the training.
        
         | gundawar wrote:
         | Their model card [0] has some information. It is quite a
         | standard architecture though; it's always been that their alpha
         | is in their internal training stack.
         | 
         | [0]
         | https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...
        
           | ahmedhawas123 wrote:
           | This is super helpful and I had not seen it, thanks so much
           | for sharing! And I hear you on training being an alpha, at
           | the size of the model I wonder how much of this is
           | distillation and using o3/o4 data.
        
       | sadiq wrote:
       | Looks like Groq (at 1k+ tokens/second) and Fireworks are already
       | live on openrouter: https://openrouter.ai/openai/gpt-oss-120b
       | 
       | $0.15M in / $0.6-0.75M out
       | 
       | edit: Now Cerebras too at 3,815 tps for $0.25M / $0.69M out.
        
         | podnami wrote:
         | Wow this was actually blazing fast. I prompted "how can the
         | 45th and 47th presidents of america share the same parents?"
         | 
         | On ChatGPT.com o3 thought for for 13 seconds, on OpenRouter GPT
         | OSS 120B thought for 0.7 seconds - and they both had the
         | correct answer.
        
           | Imustaskforhelp wrote:
           | Not gonna lie but I got sorta goosebumps
           | 
           | I am not kidding but such progress from a technological point
           | of view is just fascinating!
        
           | swores wrote:
           | I'm not sure that's a particularly good question for
           | concluding something positive about the "thought for 0.7
           | seconds" - it's such a simple answer, ChatGPT 4o (with no
           | thinking time) immediately answered correctly. The only
           | surprising thing in your test is that o3 wasted 13 seconds
           | thinking about it.
        
             | Workaccount2 wrote:
             | A current major outstanding problem with thinking models is
             | how to get them to think an appropriate amount.
        
               | dingnuts wrote:
               | The providers disagree. You pay per token. Verbacious
               | models are the most profitable. Have fun!
        
               | willy_k wrote:
               | For API users, yes, but for the average person with a
               | subscription or using the free tier it's the inverse.
        
               | conradkay wrote:
               | Nowadays it must be pretty large % of usage going through
               | monthly subscriptions
        
           | nisegami wrote:
           | Interesting choice of prompt. None of the local models I have
           | in ollama (consumer mid range gpu) were able to get it right.
        
           | golergka wrote:
           | When I pay attention to o3 CoT, I notice it spends a few
           | passes thinking about my system prompt. Hard to imagine this
           | question is hard enough to spend 13 seconds on.
        
           | xpe wrote:
           | How many people are discussing this after one person did 1
           | prompt with 1 data point for each model and wrote a comment?
           | 
           | What is being measured here? For end-to-end time, one model
           | is:
           | 
           | t_total = t_network + t_queue + t_batch_wait + t_inference +
           | t_service_overhead
        
         | sigmar wrote:
         | Non-rhetorically, why would someone pay for o3 api now that I
         | can get this open model from openai served for cheaper?
         | Interesting dynamic... will they drop o3 pricing next week
         | (which is 10-20x the cost[1])?
         | 
         | [1] currently $3M in/ $8M out
         | https://platform.openai.com/docs/pricing
        
           | gnulinux wrote:
           | Not even that, even if o3 being marginally better is
           | important for your task (let's say) why would anyone use
           | o4-mini? It seems almost 10x the price and same performance
           | (maybe even less): https://openrouter.ai/openai/o4-mini
        
             | Invictus0 wrote:
             | Probably because they are going to announce gpt 5
             | imminently
        
         | gnulinux wrote:
         | Wow, that's significantly cheaper than o4-mini which seems to
         | be on part with gpt-oss-120b. ($1.10/M input tokens, $4.40/M
         | output tokens) Almost 10x the price.
         | 
         | LLMs are getting cheaper much faster than I anticipated. I'm
         | curious if it's still the hype cycle and
         | Groq/Fireworks/Cerebras are taking a loss here, or whether
         | things are actually getting cheaper. At this we'll be able to
         | run Qwen3-32B level models in phones/embedded soon.
        
           | mikepurvis wrote:
           | Are the prices staying aligned to the fundamentals (hardware,
           | energy), or is this a VC-funded land grab pushing prices to
           | the bottom?
        
           | tempaccount420 wrote:
           | It's funny because I was thinking the opposite, the pricing
           | seems way too high for a 5B parameter activation model.
        
             | gnulinux wrote:
             | Sure you're right, but if I can squeeze out o4-mini level
             | utility out of it, but its less than quarter the price,
             | does it really matter?
        
               | wahnfrieden wrote:
               | Yes
        
         | spott wrote:
         | It is interesting that openai isn't offering any inference for
         | these models.
        
           | bangaladore wrote:
           | Makes sense to me. Inference on these models will be a race
           | to the bottom. Hosting inference themselves will be a waste
           | of compute / dollar for them.
        
         | tekacs wrote:
         | I apologize for linking to Twitter, but I can't post a video
         | here, so:
         | 
         | https://x.com/tekacs/status/1952788922666205615
         | 
         | Asking it about a marginally more complex tech topic and
         | getting an excellent answer in ~4 seconds, reasoning for 1.1
         | seconds...
         | 
         | I am _very_ curious to see what GPT-5 turns out to be, because
         | unless they're running on custom silicon / accelerators, even
         | if it's very smart, it seems hard to justify not using these
         | open models on Groq/Cerebras for a _huge_ fraction of use-
         | cases.
        
           | tekacs wrote:
           | Cleanshot link for those who don't want to go to X:
           | https://share.cleanshot.com/bkHqvXvT
        
           | tekacs wrote:
           | A few days ago I posted a slowed-down version of the video
           | demo on someone's repo because it was unreadably fast due to
           | being sped up.
           | 
           | https://news.ycombinator.com/item?id=44738004
           | 
           | ... today, this is a real-time video of the OSS thinking
           | models by OpenAI on Groq and I'd have to slow it down to be
           | able to read it. Wild.
        
         | modeless wrote:
         | I really want to try coding with this at 2600 tokens/s (from
         | Cerebras). Imagine generating thousands of lines of code as
         | fast as you can prompt. If it doesn't work who cares, generate
         | another thousand and try again! And at $.69/M tokens it would
         | only cost $6.50 an hour.
        
           | andai wrote:
           | I tried this (gpt-oss-120b with Cerebras) with Roo Code. It
           | repeatedly failed to use the tools correctly, and then I got
           | 429 too many requests. So much for the "as fast as I can
           | think" idea!
           | 
           | I'll have to try again later but it was a bit underwhelming.
           | 
           | The latency also seemed pretty high, not sure why. I think
           | with the latency the throughout ends up not making much
           | difference.
           | 
           | Btw Groq has the 20b model at 4000 TPS but I haven't tried
           | that one.
        
       | modeless wrote:
       | Can't wait to see third party benchmarks. The ones in the blog
       | post are quite sparse and it doesn't seem possible to fully
       | compare to other open models yet. But the few numbers available
       | seem to suggest that this release will make all other non-
       | multimodal open models obsolete.
        
       | incomingpain wrote:
       | I dont see the unsloth files yet but they'll be here:
       | https://huggingface.co/unsloth/gpt-oss-20b-GGUF
       | 
       | Super excited to test these out.
       | 
       | The benchmarks from 20B are blowing away major >500b models.
       | Insane.
       | 
       | On my hardware.
       | 
       | 43 tokens/sec.
       | 
       | I got an error with flash attention turning on. Cant run it with
       | flash attention?
       | 
       | 31,000 context is max it will allow or model wont load.
       | 
       | no kv or v quantization.
        
       | rmonvfer wrote:
       | What a day! Models aside, the Harmony Response Format[1] also
       | seems pretty interesting and I wonder how much of an impact it
       | might have in performance of these models.
       | 
       | [1] https://github.com/openai/harmony
        
         | incomingpain wrote:
         | Seems to be breaking every agentic tool I've tried so far.
         | 
         | Im guessing it's going to very rapidly be patched into the
         | various tools.
        
       | mikert89 wrote:
       | ACCELERATE
        
       | jakozaur wrote:
       | The coding seems to be one of the strongest use cases for LLMs.
       | Though currently they are eating too many tokens to be
       | profitable. So perhaps these local models could offload some
       | tasks to local computers.
       | 
       | E.g. Hybrid architecture. Local model gathers more data, runs
       | tests, does simple fixes, but frequently asks the stronger model
       | to do the real job.
       | 
       | Local model gathers data using tools and sends more data to the
       | stronger model.
       | 
       | It
        
         | Imustaskforhelp wrote:
         | I have always thought that if we can somehow get an AI which is
         | insanely good at coding, so much so that It can improve itself,
         | then through continuous improvements, they will get better
         | models of everything else idk
         | 
         | Maybe you guys call it AGI, so anytime I see progress in
         | coding, I think it goes just a tiny bit towards the right
         | direction
         | 
         | Plus it also helps me as a coder to actually do some stuff just
         | for the fun. Maybe coding is the only truly viable use of AI
         | and all others are negligible increases.
         | 
         | There is so much polarization in the use of AI on coding but I
         | just want to say this, it would be pretty ironic that an
         | industry which automates others job is this time the first to
         | get their job automated.
         | 
         | But I don't see that as an happening, far from it. But still
         | each day something new, something better happens back to back.
         | So yeah.
        
           | hooverd wrote:
           | Optimistically, there's always more crap to get done.
        
             | jona777than wrote:
             | I agree. It's not improbable for there to be _more_ needs
             | to meet in the future, in my opinion.
        
           | NitpickLawyer wrote:
           | Not to open _that_ can of worms, but in most definitions
           | self-improvement is not an AGI requirement. That 's already
           | ASI territory (Super Intelligence). That's the proverbial
           | skynet (pessimists) or singularity (optimists).
        
             | Imustaskforhelp wrote:
             | Hmm my bad. Maybe Yeah I always thought that it was the
             | endgame of humanity but isn't AGI supposed to be that (the
             | endgame)
             | 
             | What would AGI mean, solving some problem that it hasn't
             | seen? or what exactly? I mean I think AGI is solved, no?
             | 
             | If not, I see people mentioning that horizon alpha is
             | actually a gpt 5 model and its predicted to release on
             | thursday on some betting market, so maybe that fits AGI
             | definition?
        
         | mattfrommars wrote:
         | Anyone know how long does the context last for running model
         | locally vs running via OpenAPI or Cursor? My understanding is
         | the model that run on the cloud have much greater context
         | window that what we can have running locally.
        
       | Imustaskforhelp wrote:
       | Is this the same model (Horizon Beta) on openrouter or not?
       | Because I still see Horizon beta available with its codename on
       | openrouter
        
       | abidlabs wrote:
       | Test it with a web UI:
       | https://huggingface.co/spaces/abidlabs/openai-gpt-oss-120b-t...
        
       | ArtTimeInvestor wrote:
       | Why do companies release open source LLMs?
       | 
       | I would understand it, if there was some technology lock-in. But
       | with LLMs, there is no such thing. One can switch out LLMs
       | without any friction.
        
         | gnulinux wrote:
         | Name recognition? Advertisement? Federal grant to beat Chinese
         | competition?
         | 
         | There could be many legitimate reasons, but yeah I'm very
         | surprised by this too. Some companies take it a bit too
         | seriously and go above and beyond too. At this point unless you
         | need the absolute SOTA models because you're throwing LLM at an
         | extremely hard problem, there is very little utility using
         | larger providers. In OpenRouter, or by renting your own GPU you
         | can run on-par models for much cheaper.
        
         | TrackerFF wrote:
         | LLMs are terrible, purely speaking from the business economic
         | side of things.
         | 
         | Frontier / SOTA models are barely profitable. Previous gen
         | model lose 90% of their value. Two gens back and they're
         | worthless.
         | 
         | And given that their product life cycle is something like 6-12
         | months, you might as well open source them as part of
         | sundowning them.
        
           | spongebobstoes wrote:
           | inference runs at a 30-40% profit
        
         | mclau157 wrote:
         | Partially because using their own GPUs is expensive, so maybe
         | offloading some GPU usage
        
         | koolala wrote:
         | They don't because it would kill their data scrapping
         | buisness's competitive advantage.
        
         | LordDragonfang wrote:
         | Zuckerberg explains a few of the reasons here:
         | 
         | https://www.dwarkesh.com/p/mark-zuckerberg#:~:text=As%20long...
         | 
         | The short version is that is you give a product to open source,
         | they can and will donate time and money to improving your
         | product, and the ecosystem around it, for free, and you get to
         | reap those benefits. Llama has already basically won that space
         | (the standard way of running open models _is_ llama.cpp), so
         | OpenAI have finally realized they 're playing catch-up (and
         | last quarter's SOTA isn't worth much revenue to them when
         | there's a new SOTA, so they may as well give it away while it
         | can still crack into the market)
        
         | a_vanderbilt wrote:
         | At least in OpenAI's case, it raises the bar for potential
         | competition while also implying that what they have behind the
         | scenes is far better.
        
         | __alexs wrote:
         | I believe it's to create barriers to entry and make the space
         | harder to compete in.
         | 
         | There's still a ton of value in the lower end of the market by
         | capability, and it's easier for more companies to compete in.
         | If you make the cost floor for that basically free you
         | eliminate everyone else's ability to make any profit there and
         | then leverage that into building a product that can also
         | compete at the higher end. This makes it harder for a new
         | market entrant to compete by increasing the minimum capability
         | and capital investment required to make a profit in this space.
        
       | HanClinto wrote:
       | Holy smokes, there's already llama.cpp support:
       | 
       | https://github.com/ggml-org/llama.cpp/pull/15091
        
         | carbocation wrote:
         | And it's already on ollama, it appears:
         | https://ollama.com/library/gpt-oss
        
         | incomingpain wrote:
         | lm studio immediately released the new appimage with support.
        
       | jp1016 wrote:
       | i wish these models had a minimum ram , cpu and gpu size listed
       | on the site instead of high end and medium end pc.
        
         | phh wrote:
         | You can technically run it on a 8086 assuming you can get
         | access to a big enough storage.
         | 
         | More reasonably, you should be able to run the 20B at non-
         | stupidly-slow speed with a 64bit CPU, 8GB RAM, 20GB SSD.
        
       | n42 wrote:
       | my very early first impression of the 20b model on ollama is that
       | it is quite good, at least for the code I am working on; arguably
       | good enough to drop a subscription or two
        
       | pamelafox wrote:
       | Anyone tried running on a Mac M1 with 16GB RAM yet? I've never
       | run higher than an 8GB model, but apparently this one is
       | specifically designed to work well with 16 GB of RAM.
        
         | thimabi wrote:
         | It works fine, although with a bit more latency than non-local
         | models. However, swap usage goes way beyond what I'm
         | comfortable with, so I'll continue to use smaller models for
         | the foreseeable future.
         | 
         | Hopefully other quantizations of these OpenAI models will be
         | available soon.
        
         | pamelafox wrote:
         | Update: I tried it out. It took about 8 seconds per token, and
         | didn't seem to be using much of my GPU (MPU), but was using a
         | lot of RAM. Not a model that I could use practically on my
         | machine.
        
           | steinvakt2 wrote:
           | Did you run it the best way possible? im no expert, but I
           | understand it can affect inference time greatly (which
           | format/engine is used)
        
             | pamelafox wrote:
             | I ran it via Ollama, which I assume uses the best way.
             | Screenshot in my post here: https://bsky.app/profile/pamela
             | fox.bsky.social/post/3lvobol3...
             | 
             | I'm still wondering why my MPU usage was so low.. maybe
             | Ollama isn't optimized for running it yet?
        
               | wahnfrieden wrote:
               | Might need to wait on MLX
        
           | turnsout wrote:
           | To clarify, this was the 20B model?
        
             | pamelafox wrote:
             | Yep, 20B model, via Ollama: ollama run gpt-oss:20b
             | 
             | Screenshot here with Ollama running and asitop in other
             | terminal:
             | 
             | https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol
             | 3...
        
         | roboyoshi wrote:
         | M2 with 16GB: It's slow for me. ~13GB RAM usage, not locking up
         | my mac, but took a very long time thinking and slowly
         | outputting tokens.. I'd not consider this usable for everyday
         | usage.
        
       | shpongled wrote:
       | I looked through their torch implementation and noticed that they
       | are applying RoPE to both query and key matrices in every layer
       | of the transformer - is this standard? I thought positional
       | encodings were usually just added once at the first layer
        
         | m_ke wrote:
         | No they're usually done at each attention layer.
        
           | shpongled wrote:
           | Do you know when this was introduced (or which paper)? AFAIK
           | it's not that way in the original transformer paper, or
           | BERT/GPT-2
        
             | Scene_Cast2 wrote:
             | Should be in the RoPE paper. The OG transformers used
             | multiplicative sinusoidal embeddings, while RoPE does a
             | pairwise rotation.
             | 
             | There's also NoPE, I think SmolLM3 "uses NoPE" (aka doesn't
             | use any positional stuff) every fourth layer.
        
             | Nimitz14 wrote:
             | This is normal. Rope was introduced after bert/gpt2
        
             | spott wrote:
             | All the Llamas have done it (well, 2 and 3, and I believe
             | 1, I don't know about 4). I think they have a citation for
             | it, though it might just be the RoPE paper
             | (https://arxiv.org/abs/2104.09864).
             | 
             | I'm not actually aware of any model that _doesn 't_ do
             | positional embeddings on a per-layer basis (excepting BERT
             | and the original transformer paper, and I haven't read the
             | GPT2 paper in a while, so I'm not sure about that one
             | either).
        
               | shpongled wrote:
               | Thanks! I'm not super up to date on all the ML stuff :)
        
       | jstummbillig wrote:
       | Shoutout to the hn consensus regarding an OpenAI open model
       | release from 4 days ago:
       | https://news.ycombinator.com/item?id=44758511
        
       | kingkulk wrote:
       | Welcome to the future!
        
       | jedisct1 wrote:
       | For some reason I'm less excited about this that I was with the
       | Qwen models.
        
       | timmg wrote:
       | Orthogonal, but I just wanted to say how awesome Ollama is. It
       | took 2 seconds to find the model and a minute to download and now
       | I'm using it.
       | 
       | Kudos to that team.
        
         | _ache_ wrote:
         | To be fair, it's with the help of OpenAI. They did it together,
         | before the official release.
         | 
         | https://ollama.com/blog/gpt-oss
        
           | aubanel wrote:
           | From experience, it's much more engineering work on the
           | integrator's side than on OpenAI's. Basically they provide
           | you their new model in advance, but they don't know the
           | specifics of your system, so it's normal that you do most of
           | the work. Thus I'm particularly impressed by Cerebras: they
           | only have a few models supported for their extreme perf
           | inference, it must have been huge bespoke work to integrate.
        
         | Shopper0552 wrote:
         | I remember reading Ollama is going closed source now?
         | 
         | https://www.reddit.com/r/LocalLLaMA/comments/1meeyee/ollamas...
        
         | int_19h wrote:
         | It's just as easy with LM Studio.
         | 
         | All the real heavy lifting is done by llama.cpp, and for the
         | distribution, by HuggingFace.
        
       | PeterStuer wrote:
       | I love how they frame High-end desktops and laptops as having "a
       | single H100 GPU".
        
         | organsnyder wrote:
         | I read that as it runs in data centers (H100 GPUs) or high-end
         | desktops/laptops (Strix Halo?).
        
           | xyc wrote:
           | I'm running it with ROG Flow Z13 128GB Strix Halo and getting
           | 50 tok/s for 20B model and 12 tok/s for 120B model. I'd say
           | it's pretty usable.
        
             | organsnyder wrote:
             | Excellent! I have a Framework Desktop with 128GB on
             | preorder--really looking forward to getting it.
        
         | robertheadley wrote:
         | I actually tried to ask the Model about that, then I asked
         | ChatGPT, both times, they just said that it was marketing
         | speak.
         | 
         | I was like no. It is false advertising.
        
         | phh wrote:
         | Well if nVidia wasn't late, it would be runnable on nVidia
         | project Digits.
        
           | PeterStuer wrote:
           | Yes, they are late to the party. Maybe they do not want to
           | eat into the RTX Pro 6000 sales. In the meantime, there is
           | the AMD Ryzen(tm) Al Max+ 395.
        
         | piskov wrote:
         | Don't forget about mac studio
        
       | kgwgk wrote:
       | It may be useless for many use cases given that its policy
       | prevents it for example from providing "advice or instructions
       | about how to buy something."
       | 
       | (I included details about its refusal to answer even after using
       | tools for web searching but hopefully shorter comment means fewer
       | downvotes.)
        
       | isoprophlex wrote:
       | Can these do image inputs as well? I can't find anything about
       | that on the linked page, so I guess not..?
        
         | cristoperb wrote:
         | No, they're text only
        
       | pu_pe wrote:
       | Very sparse benchmarking results released so far. I'd bet the
       | Chinese open source models beat them on quite a few of them.
        
       | foundry27 wrote:
       | Model cards, for the people interested in the guts:
       | https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...
       | 
       | In my mind, I'm comparing the model architecture they describe to
       | what the leading open-weights models (Deepseek, Qwen, GLM, Kimi)
       | have been doing. Honestly, it just seems "ok" at a technical
       | level:
       | 
       | - both models use standard Grouped-Query Attention (64 query
       | heads, 8 KV heads). The card talks about how they've used an
       | older optimization from GPT3, which is alternating between banded
       | window (sparse, 128 tokens) and fully dense attention patterns.
       | It uses RoPE extended with YaRN (for a 131K context window). So
       | they haven't been taking advantage of the special-sauce Multi-
       | head Latent Attention from Deepseek, or any of the other similar
       | improvements over GQA.
       | 
       | - both models are standard MoE transformers. The 120B model
       | (116.8B total, 5.1B active) uses 128 experts with Top-4 routing.
       | They're using some kind of Gated SwiGLU activation, which the
       | card talks about as being "unconventional" because of to clamping
       | and whatever residual connections that implies. Again, not using
       | any of Deepseek's "shared experts" (for general patterns) +
       | "routed experts" (for specialization) architectural improvements,
       | Qwen's load-balancing strategies, etc.
       | 
       | - the most interesting thing IMO is probably their quantization
       | solution. They did something to quantize >90% of the model
       | parameters to the MXFP4 format (4.25 bits/parameter) to let the
       | 120B model to fit on a single 80GB GPU, which is pretty cool. But
       | we've also got Unsloth with their famous 1.58bit quants :)
       | 
       | All this to say, it seems like even though the training they did
       | for their agentic behavior and reasoning is undoubtedly very
       | good, they're keeping their actual technical advancements "in
       | their pocket".
        
         | rfoo wrote:
         | Or, you can say, OpenAI has some real technical advancements on
         | stuff _besides_ attn architecture. GQA8, alternating SWA 128  /
         | full attn do all seem conventional. Basically they are showing
         | us that "no secret sauce in model arch you guys just sucks at
         | mid/post-training", or they want us to believe this.
         | 
         | The model is pretty sparse tho, 32:1.
        
           | liuliu wrote:
           | Kimi K2 paper said that the model sparsity scales up with
           | parameters pretty well (MoE sparsity scaling law, as they
           | call, basically calling Llama 4 MoE "done wrong"). Hence K2
           | has 128:1 sparsity.
        
             | throwdbaaway wrote:
             | I thought Kimi K2 uses 8 active experts out of 384?
             | Sparsity should be 48:1. Indeed Llama4 Maverick is the only
             | one that has 128:1 sparsity.
        
               | liuliu wrote:
               | You are right. I mis-remembered the sparsity part of K2.
               | The "done wrong" part I was thinking about how the scout
               | -> maverick -> behemoth doesn't scale sparsity according
               | to any formula (less sparse -> sparse -> less sparse).
        
           | nxobject wrote:
           | It's convenient to be able to attribute success to things
           | only OpenAI could've done with the combo of their early start
           | and VC money - licensing content, hiring subject matter
           | experts, etc. Essentially the "soft" stuff that a mature
           | organization can do.
        
         | logicchains wrote:
         | >They did something to quantize >90% of the model parameters to
         | the MXFP4 format (4.25 bits/parameter) to let the 120B model to
         | fit on a single 80GB GPU, which is pretty cool
         | 
         | They said it was native FP4, suggesting that they actually
         | trained it like that; it's not post-training quantisation.
        
           | rushingcreek wrote:
           | The native FP4 is one of the most interesting architectural
           | aspects here IMO, as going below FP8 is known to come with
           | accuracy tradeoffs. I'm curious how they navigated this and
           | how the FP8 weights (if they exist) were to perform.
        
             | buildbot wrote:
             | One thing to note is that MXFP4 is a block scaled format,
             | with 4.25 bits per weight. This lets it represent a lot
             | more numbers than just raw FP4 would with say 1 mantissa
             | and 2 exponent bits.
        
         | danieldk wrote:
         | Also: attention sinks (although implemented as extra trained
         | logits used in attention softmax rather than attending to e.g.
         | a prepended special token).
        
         | mclau157 wrote:
         | You can get similar insights looking at the github repo
         | https://github.com/openai/gpt-oss
        
         | tgtweak wrote:
         | I think their MXFP4 release is a bit of a gift since they
         | obviously used and tuned this extensively as a result of cost-
         | optimization at scale - something the open source model
         | providers aren't doing too much, and also somewhat of a
         | competitive advantage.
         | 
         | Unsloth's special quants are amazing but I've found there to be
         | lots of trade offs vs full quantization, particularly when
         | striving for best first-shot attempts - which is by far the
         | bulk of LLM use cases. Running a better (larger, newer) model
         | at lower quantization to fit in memory, or with reduced
         | accuracy/detail to speed it up both have value, but in the the
         | pursuit of first-shot accuracy there doesn't seem to be many
         | companies running their frontier models on reduced
         | quantization. If openAI is in doing this in production that is
         | interesting.
        
         | highfrequency wrote:
         | I would guess the "secret sauce" here is distillation:
         | pretraining on an extremely high quality synthetic dataset from
         | the prompted output of their state of the art models like o3
         | rather than generic internet text. A number of research results
         | have shown that highly curated technical problem solving data
         | is unreasonably effective at boosting smaller models'
         | performance.
         | 
         | This would be much more efficient than relying purely on RL
         | post-training on a small model; with low baseline capabilities
         | the insights would be very sparse and the training very
         | inefficient.
        
           | asadm wrote:
           | > research results have shown that highly curated technical
           | problem solving data is unreasonably effective at boosting
           | smaller models' performance.
           | 
           | same seems to be true for humans
        
             | tempaccount420 wrote:
             | Wish they gave us access to learn from those grandmother
             | models instead of distilled slop.
        
               | ashdksnndck wrote:
               | It behooves them to keep the best stuff internal, or at
               | least greatly limit any API usage to avoid giving the
               | goods away to other labs they are racing with.
        
               | saurik wrote:
               | Which, presumably, is the reason they removed 4.5 from
               | the API... mostly the only people willing to pay that
               | much for that model were their competitors. (I mean, I
               | would pay even more than they were charging, but I
               | imagine even if I scale out my use cases--which, for just
               | me, are mostly satisfied by being trapped in their UI--it
               | would be a pittance vs. the simpler stuff people keep
               | using.)
        
             | throw310822 wrote:
             | Yes, if I understand correctly, what it means is "a very
             | smart teacher can do wonders for their pupils' education".
        
         | unethical_ban wrote:
         | I don't know how to ask this without being direct and dumb:
         | Where do I get a layman's introduction to LLMs that could work
         | me up to understanding every term and concept you just
         | discussed? Either specific videos, or if nothing else, a
         | reliable Youtube channel?
        
           | srigi wrote:
           | Start with the YT series on neural nets and LLMs from
           | 3blue1brown
        
           | umgefahren wrote:
           | There is a great 3blue1brown video, but it's pretty much
           | impossible by now to cover the entire landscape of research.
           | I bet gpt-oss has some great explanations though ;)
        
           | CanuckPro wrote:
           | Try Andrej Karpathy's YouTube videos. I also really liked the
           | Dive into Deep Learning book at d2l.ai
        
           | tkgally wrote:
           | What I've sometimes done when trying to make sense of recent
           | LLM research is give the paper and related documents to
           | ChatGPT, Claude, or Gemini and ask them to explain the
           | specific terms I don't understand. If I don't understand
           | their explanations or want to know more, I ask follow-ups.
           | Doing this in voice mode works better for me than text chat
           | does.
           | 
           | When I just want a full summary without necessarily
           | understanding all the details, I have an audio overview made
           | on NotebookLM and listen to the podcast while I'm exercising
           | or cleaning. I did that a few days ago with the recent
           | Anthropic paper on persona vectors, and it worked great.
        
             | tshannon wrote:
             | So probably another stupid question, but how do you know
             | what it's spitting out is accurate?
        
               | tkgally wrote:
               | One has to be aware of the possibility of hallucinations,
               | of course. But I have not encountered any hallucinations
               | in these sorts of interactions with the current leading
               | models. Questions like "what does 'embedding space' mean
               | in the abstract of this paper?" yield answers that, in my
               | experience, make sense in the context and check out when
               | compared with other sources. I would be more cautious if
               | I were using a smaller model or if I were asking
               | questions about obscure information without supporting
               | context.
               | 
               | Also, most of my questions are not about specific facts
               | but about higher-level concepts. For research about ML,
               | at least, the responses check out.
        
           | nonfamous wrote:
           | Try Microsoft's "Generative AI for Beginners" repo on GitHub.
           | The early chapters in particular give a good grounding of LLM
           | architecture without too many assumptions of background
           | knowledge. The video version of the series is good too.
        
           | reilly3000 wrote:
           | Ask Gemini. Give it a link here in fact.
        
           | cwyers wrote:
           | This is a great book (parts of it are available as blog posts
           | from the author if you want to get a taste of it):
           | 
           | https://www.manning.com/books/build-a-large-language-
           | model-f...
        
       | user_7832 wrote:
       | Newbie question: I remember folks talking about how kimi 2's
       | launch might have pushed OpenAI to launch their model later. Now
       | that we (shortly will) know how this model performs, how do they
       | stack up? Did openAI likely actually hold off releasing weights
       | because of kimi, in retrospect?
        
       | ClassAndBurn wrote:
       | Open models are going to win long-term. Anthropics' own research
       | has to use OSS models [0]. China is demonstrating how quickly
       | companies can iterate on open models, allowing smaller teams
       | access and augmentation to the abilities of a model without
       | paying the training cost.
       | 
       | My personal prediction is that the US foundational model makers
       | will OSS something close to N-1 for the next 1-3 iterations. The
       | CAPEX for the foundational model creation is too high to justify
       | OSS for the current generation. Unless the US Gov steps up and
       | starts subsidizing power, or Stargate does 10x what it is planned
       | right now.
       | 
       | N-1 model value depreciates insanely fast. Making an OSS release
       | of them and allowing specialized use cases and novel developments
       | allows potential value to be captured and integrated into future
       | model designs. It's medium risk, as you may lose market share.
       | But also high potential value, as the shared discoveries could
       | substantially increase the velocity of next-gen development.
       | 
       | There will be a plethora of small OSS models. Iteration on the
       | OSS releases is going to be biased towards local development,
       | creating more capable and specialized models that work on smaller
       | and smaller devices. In an agentic future, every different agent
       | in a domain may have its own model. Distilled and customized for
       | its use case without significant cost.
       | 
       | Everyone is racing to AGI/SGI. The models along the way are to
       | capture market share and use data for training and evaluations.
       | Once someone hits AGI/SGI, the consumer market is nice to have,
       | but the real value is in novel developments in science,
       | engineering, and every other aspect of the world.
       | 
       | [0] https://www.anthropic.com/research/persona-vectors > We
       | demonstrate these applications on two open-source models, Qwen
       | 2.5-7B-Instruct and Llama-3.1-8B-Instruct.
        
         | lechatonnoir wrote:
         | I'm pretty sure there's no reason that Anthropic _has_ to do
         | research on open models, it 's just that they produced their
         | result on open models so that you can reproduce their result on
         | open models without having access to theirs.
        
         | Adrig wrote:
         | I'm a layman but it seemed to me that the industry is going
         | towards robust foundational models on which we plug tools,
         | databases, and processes to expand their capabilities.
         | 
         | In this setup OSS models could be more than enough and capture
         | the market but I don't see where the value would be to a
         | multitude of specialized models we have to train.
        
         | renmillar wrote:
         | There's no reason that models too large for consumer hardware
         | wouldn't keep a huge edge, is there?
        
           | AtlasBarfed wrote:
           | That is fundamentally a big O question.
           | 
           | I have this theory that we simply got over a hump by
           | utilizing a massive processing boost from gpus as opposed to
           | CPUs. That might have been two to three orders of magnitude
           | more processing power.
           | 
           | But that's a one-time success. I don't hardware has any large
           | scale improvements coming, because 3D gaming mostly plumb
           | most of that vector processing hardware development in the
           | last 30 years.
           | 
           | So will software and better training models produce another
           | couple orders of magnitude?
           | 
           | Fundamentally we're talking about nines of of accuracy. What
           | is the processing power required for each line of accuracy?
           | Is it linear? Is it polynomial? Is it exponential?
           | 
           | It just seems strange to me with all the AI knowledge
           | slushing through academia, I haven't seen any basic analysis
           | at that level, which is something that's absolutely going to
           | be necessary for AI applications like self-driving, once you
           | get those insurance companies involved
        
         | xpe wrote:
         | > Open models are going to win long-term.
         | 
         | [1 of 3] For the sake of argument here, I'll grant the premise.
         | If this turns out to be true, it glosses over other key
         | questions, including:
         | 
         | For a frontier lab, what is a _rational_ period of time
         | (according to your organizational mission  / charter /
         | shareholder motivations*) to wait before:
         | 
         | 1. releasing a new version of an open-weight model; and
         | 
         | 2. how much secret sauce do you hold back?
         | 
         | * Take your pick. These don't align perfectly with each other,
         | much less the interests of a nation or world.
        
         | xpe wrote:
         | > Open models are going to win long-term.
         | 
         | [2 of 3] Assuming we pin down what _win_ means... (which is
         | definitely not easy)... What would it take for this to _not_ be
         | true? There are many ways, including but not limited to:
         | 
         | - publishing open weights helps your competitors catch up
         | 
         | - publishing open weights doesn't improve your own research
         | agenda
         | 
         | - publishing open weights leads to a race dynamic where only
         | the latest and greatest matters; leading to a situation where
         | the resources sunk exceed the gains
         | 
         | - publishing open weights distracts your organization from
         | attaining a sustainable business model / funding stream
         | 
         | - publishing open weights leads to significant negative
         | downstream impacts (there are a variety of uncertain outcomes,
         | such as: deepfakes, security breaches, bioweapon development,
         | unaligned general intelligence, humans losing control [1] [2],
         | and so on)
         | 
         | [1]: "What failure looks like" by Paul Christiano :
         | https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-...
         | 
         | [2]: "An AGI race is a suicide race." - quote from Max Tegmark;
         | article at https://futureoflife.org/statement/agi-manhattan-
         | project-max...
        
         | xpe wrote:
         | > Open models are going to win long-term.
         | 
         | [3 of 3] What would it take for this statement to be _false_ or
         | _missing the point_?
         | 
         | Maybe we find ourselves in a future where:
         | 
         | - Yes, open models are widely used as base models, but they are
         | also highly customized in various ways (perhaps by industry,
         | person, attitude, or something else). In other words, this
         | would be a blend of open and closed.
         | 
         | - Maybe publishing open weights of a model is more-or-less
         | irrelevant, because it is "table stakes" ... because all the
         | key differentiating advantages have to do with other factors,
         | such as infrastructure, non-LLM computational aspects,
         | regulatory environment, affordable energy, customer base,
         | customer trust, and probably more.
         | 
         | - The future might involve thousands or millions of highly
         | tailored models
        
         | albertzeyer wrote:
         | > Once someone hits AGI/SGI
         | 
         | I don't think there will be such a unique event. There is no
         | clear boundary. This is a continuous process. Modells get
         | slightly better than before.
         | 
         | Also, another dimension is the inference cost to run those
         | models. It has to be cheap enough to really take advantage of
         | it.
         | 
         | Also, I wonder, what would be a good target to make profit, to
         | develop new things? There is Isomorphic Labs, which seems like
         | a good target. This company already exists now, and people are
         | working on it. What else?
        
           | dom96 wrote:
           | > I don't think there will be such a unique event.
           | 
           | I guess it depends on your definition of AGI, but if it means
           | human level intelligence then the unique event will be the AI
           | having the ability to act on its own without a "prompt".
        
             | rossant wrote:
             | And the ability to improve itself.
        
             | seba_dos1 wrote:
             | > the unique event will be the AI having the ability to act
             | on its own without a "prompt"
             | 
             | That's super easy. The reason they need a prompt is that
             | this is the way we make them useful. We don't need LLMs to
             | generate an endless stream of random "thoughts" otherwise,
             | but if you really wanted to, just hook one up to a webcam
             | and microphone stream in a loop and provide it some storage
             | for "memories".
        
         | teaearlgraycold wrote:
         | > N-1 model value depreciates insanely fast
         | 
         | This implies LLM development isn't plateaued. Sure the
         | researchers are busting their assess quantizing, adding
         | features like tool calls and structured outputs, etc. But soon
         | enough N-1~=N
        
         | swalsh wrote:
         | To me it depends on 2 factors. Hardware becomes more
         | accessible, and the closed source offerings become more
         | expensive. Right now it's difficult to get enough GPUs to do
         | local inference at production scale, and 2 it's more expensive
         | to run your own GPU's vs closed source models.
        
       | mythz wrote:
       | Getting great performance running gpt-oss on 3x A4000's:
       | gpt-oss:20b = ~46 tok/s
       | 
       | More than 2x faster than my previous leading OSS models:
       | mistral-small3.2:24b = ~22 tok/s          gemma3:27b           =
       | ~19.5 tok/s
       | 
       | Strangely getting nearly the opposite performance running on 1x
       | 5070 Ti:                   mistral-small3.2:24b = ~39 tok/s
       | gpt-oss:20b          = ~21 tok/s
       | 
       | Where gpt-oss is nearly 2x slow vs mistral-small 3.2.
        
         | genpfault wrote:
         | Seeing ~70 tok/s on a 7900 XTX using Ollama.
        
           | Matsta wrote:
           | I'm getting around 90 tok/s on a 3090 using Ollama.
           | 
           | Pretty impressive
        
         | mythz wrote:
         | ok issue is with ollama as gpt-oss 20B runs much faster on 1x
         | 5070 Ti with llama.cpp and LM Studio:                   llama-
         | server     = ~181 tok/s         LM Studio        = ~46 tok/s
         | (default)         LM Studio Custom = ~158 tok/s (changed to
         | offload to GPU and switch to CUDA llama.cpp engine)
         | 
         | and llama-server on my 3x A4000 GPU Server is getting 90 tok/s
         | vs 46 tok/s on ollama
        
       | anonymoushn wrote:
       | guys, what does OSS stand for?
        
         | thejazzman wrote:
         | it's a marketing term that modern companies use to grow market
         | share
        
         | ayakaneko wrote:
         | should be open source software, but it's a model, so not sure
         | whether they chose this name with the last S having other
         | meanings.
        
       | Robdel12 wrote:
       | I'm on my phone and haven't been able to break away to check, but
       | anyone plug these into Codex yet?
        
       | jcmontx wrote:
       | I'm out of the loop for local models. For my M3 24gb ram macbook,
       | what token throughput can I expect?
       | 
       | Edit: I tried it out, I have no idea in terms of of tokens but it
       | was fluid enough for me. A bit slower than using o3 in the
       | browser but definitely tolerable. I think I will set it up in my
       | GF's machine so she can stop paying for the full subscription
       | (she's a non-tech professional)
        
         | steinvakt2 wrote:
         | Wondering about the same for my M4 max 128 gb
        
           | jcmontx wrote:
           | It should fly on your machine
        
             | steinvakt2 wrote:
             | Yeah, was super quick and easy to set up using Ollama. I
             | had to kill some processes first to avoid memory swap
             | though (even with 128gb memory). So a slightly more
             | quantized version is maybe ideal, for me at least.
             | 
             | Edit: I'm talking about the 120B model of course
        
         | coolspot wrote:
         | 40 t/s
        
         | dantetheinferno wrote:
         | Apple M4 Pro w/ 48GB running the smaller version. I'm getting
         | 43.7t/s
        
         | albertgoeswoof wrote:
         | 3 year old M1 MacBook Pro 32gb, 42 tokens/sec on lm studio
         | 
         | Very much usable
        
         | ivape wrote:
         | Curious if anyone is running this on a AMD Ryzen AI Max+ 395
         | and knows the t/s.
        
       | Rhubarrbb wrote:
       | What's the best agent to run this on? Is it compatible with
       | Codex? For OSS agents, I've been using Qwen Code (clunky fork of
       | Gemini), and Goose.
        
         | wahnfrieden wrote:
         | Why not Claude Code?
        
           | objektif wrote:
           | I keep hitting the limit within an hour.
        
             | wahnfrieden wrote:
             | Meant with your own model
        
       | henriquegodoy wrote:
       | Seeing a 20B model competing with o3's performance is mind
       | blowing like just a year ago, most of us would've called this
       | impossible - not just the intelligence leap, but getting this
       | level of capability in such a compact size.
       | 
       | I think that the point that makes me more excited is that we can
       | train trillion-parameter giants and distill them down to just
       | billions without losing the magic. Imagine coding with Claude 4
       | Opus-level intelligence packed into a 10B model running locally
       | at 2000 tokens/sec - like instant AI collaboration. That would
       | fundamentally change how we develop software.
        
         | coolspot wrote:
         | 10B * 2000 t/s = 20,000 GB/s memory bandwidth . Apple hardware
         | can do 1k GB/s .
        
           | oezi wrote:
           | That's why MoE is needed.
        
         | int_19h wrote:
         | It's not even a 20b model. It's 20b MoE with 3.6b active
         | params.
         | 
         | But it does not actually compete with o3 performance. Not even
         | close. As usual, the metrics are bullshit. You don't know how
         | good the model actually is until you grill it yourself.
        
       | Nimitz14 wrote:
       | I'm surprised at the model dim being 2.8k with an output size of
       | 200k. My gut feeling had told me you don't want too large of a
       | gap between the two, seems I was wrong.
        
       | ukprogrammer wrote:
       | > we also introduced an additional layer of evaluation by testing
       | an adversarially fine-tuned version of gpt-oss-120b
       | 
       | What could go wrong?
        
       | nirav72 wrote:
       | I don't exactly have the ideal hardware to run locally - but just
       | ran the 20b in LMStudio with a 3080 Ti (12gb vram) with some
       | offloading to CPU. Ran couple of quick code generation tests. On
       | average about 20t/sec. But response quality was very similar or
       | on-par with chatgpt o3 for the same code it outputted. So its not
       | bad.
        
       | nodesocket wrote:
       | Anybody got this working in Ollama? I'm running latest version
       | 0.11.0 with WebUI v0.6.18 but getting:
       | 
       | > List the US presidents in order starting with George Washington
       | and their time in office and year taken office.
       | 
       | >> 00: template: :3: function "currentDate" not defined
        
         | genpfault wrote:
         | https://github.com/ollama/ollama/issues/11673
        
         | jmorgan wrote:
         | Sorry about this. Re-downloading Ollama should fix the error
        
           | nodesocket wrote:
           | Thanks for the reply and speedy patch Jeffery. Seems to be
           | working now, except my 4060ti can't hang lacking enough vram.
        
       | ahmetcadirci25 wrote:
       | I started downloading, I'm eager to test it. I will share my
       | personal experiences. https://ahmetcadirci.com/2025/gpt-oss/
        
       | koolala wrote:
       | Calls them open-weight. Names them 'oss'. What does oss stand
       | for?
        
       | incomingpain wrote:
       | First coding test: Just going copy and paste out of chat. It aced
       | my first coding test in 5 seconds... this is amazing. It's really
       | good at coding.
       | 
       | Trying to use it for agentic coding...
       | 
       | lots of fail. This harmony formatting? Anyone have a working
       | agentic tool?
       | 
       | openhands and void ide are failing due to the new tags.
       | 
       | Aider worked, but the file it was supposed to edit was untouched
       | and it created
       | 
       | Create new file? (Y)es/(N)o [Yes]:
       | 
       | Applied edit to
       | <|end|><|start|>assistant<|channel|>final<|message|>main.py
       | 
       | so the file name is
       | '<|end|><|start|>assistant<|channel|>final<|message|>main.py'
       | lol. quick rename and it was fantastic.
       | 
       | I think qwen code is the best choice so far but unreliable. So
       | far these new tags are coming through but it's working properly;
       | sometimes.
       | 
       | 1 of my tests so far has been able to get 20b not to succeed the
       | first iteration; but a small followup and it was able to
       | completely fix it right away.
       | 
       | Very impressive model for 20B.
        
       | bobsmooth wrote:
       | Hopefully the dolphin team will work their magic and uncensor
       | this model
        
       | siliconc0w wrote:
       | It seems like OSS will win, I can't see people willing to pay
       | like 10x the price for what seems like 10% more performance.
       | Especially once we get better at routing the hardest questions to
       | the better models and then using that response to augment/fine-
       | tune the OSS ones.
        
         | n42 wrote:
         | to me it seems like the market is breaking into an 80/20 of
         | B2C/B2B; the B2C use case becoming OSS models (the market
         | shifts to devices that can support them), and the B2B market
         | being priced appropriately for businesses that require that
         | last 20% of absolute cutting edge performance as the cloud
         | offering
        
       | seydor wrote:
       | This is good for China
        
       | chromaton wrote:
       | This has been available (20b version, I'm guessing) for the past
       | couple of days as "Horizon Alpha" on Openrouter. My benchmarking
       | runs with TianshuBench for coding and fluid intelligence were
       | rate limited, but the initial results show worse results that
       | DeepSeek R1 and Kimi K2.
        
       | lukax wrote:
       | Inference in Python uses harmony [1] (for request and response
       | format) which is written in Rust with Python bindings. Another
       | OpenAI's Rust library is tiktoken [2], used for all tokenization
       | and detokenization. OpenAI Codex [3] is also written in Rust. It
       | looks like OpenAI is increasingly adopting Rust (at least for
       | inference).
       | 
       | [1] https://github.com/openai/harmony
       | 
       | [2] https://github.com/openai/tiktoken
       | 
       | [3] https://github.com/openai/codex
        
         | chilipepperhott wrote:
         | As an engineer that primarily uses Rust, this is a good omen.
        
         | Philpax wrote:
         | The less Python in the stack, the better!
        
       | fnands wrote:
       | Mhh, I wonder if these are distilled from GPT4-Turbo.
       | 
       | I asked it some questions and it seems to think it is based on
       | GPT4-Turbo:
       | 
       | > Thus we need to answer "I (ChatGPT) am based on GPT-4 Turbo;
       | number of parameters not disclosed; GPT-4's number of parameters
       | is also not publicly disclosed, but speculation suggests maybe
       | around 1 trillion? Actually GPT-4 is likely larger than 175B;
       | maybe 500B. In any case, we can note it's unknown.
       | 
       | As well as:
       | 
       | > GPT-4 Turbo (the model you're talking to)
        
         | fnands wrote:
         | Also:
         | 
         | > The user appears to think the model is "gpt-oss-120b", a new
         | open source release by OpenAI. The user likely is
         | misunderstanding: I'm ChatGPT, powered possibly by GPT-4 or
         | GPT-4 Turbo as per OpenAI. In reality, there is no "gpt-
         | oss-120b" open source release by OpenAI
        
         | christianqchung wrote:
         | A little bit of training data certainly has gotten in there,
         | but I don't see any reasons for them to deliberately distill
         | from such an old model. Models have always been really bad at
         | telling you what model they are.
        
         | seba_dos1 wrote:
         | Just stop and think a bit about where a model may get the
         | knowledge of its own name from.
        
       | sabakhoj wrote:
       | Super excited to see these released!
       | 
       | Major points of interest for me:
       | 
       | - In the "Main capabilities evaluations" section, the 120b
       | outperform o3-mini and approaches o4 on most evals. 20b model is
       | also decent, passing o3-mini on one of the tasks.
       | 
       | - AIME 2025 is nearly saturated with large CoT
       | 
       | - CBRN threat levels kind of on par with other SOTA open source
       | models. Plus, demonstrated good refusals even after adversarial
       | fine tuning.
       | 
       | - Interesting to me how a lot of the safety benchmarking runs on
       | trust, since methodology can't be published too openly due to
       | counterparty risk.
       | 
       | Model cards with some of my annotations:
       | https://openpaper.ai/paper/share/7137e6a8-b6ff-4293-a3ce-68b...
        
       | davidw wrote:
       | Big picture, what's the balance going to look like, going forward
       | between what normal people can run on a fancy computer at home vs
       | heavy duty systems hosted in big data centers that are the
       | exclusive domain of Big Companies?
       | 
       | This is something about AI that worries me, a 'child' of the open
       | source coming of age era in the 90ies. I don't want to be forced
       | to rely on those big companies to do my job in an efficient way,
       | if AI becomes part of the day to day workflow.
        
         | sipjca wrote:
         | Isn't it that hardware catches up and becomes cheaper? The
         | margin on these chips right now is outrageous, but what happens
         | as there is more competition? What happens when there is more
         | supply? Are we overbuilding? Apple M series chips already
         | perform phenomenally for this class of models and you bet both
         | AMD and NVIDIA are playing with unified memory architectures
         | too for the memory bandwidth. It seems like today's really
         | expensive stuff may become the norm rather than the exception.
         | Assuming architectures lately stay similar and require large
         | amounts of fast memory.
        
       | maxloh wrote:
       | > We introduce gpt-oss-120b and gpt-oss-20b, two open-weight
       | reasoning models available under the Apache 2.0 license and our
       | gpt-oss usage policy. [0]
       | 
       | Is it even valid to have additional restriction on top of Apache
       | 2.0?
       | 
       | [0]: https://openai.com/index/gpt-oss-model-card/
        
         | qntmfred wrote:
         | you can just do things
        
           | maxloh wrote:
           | Not for all licenses.
           | 
           | For example, GPL has a "no-added-restrictions" clause, which
           | allows the recipient of the software to ignore any additional
           | restrictions added alongside the license.
           | 
           | > All other non-permissive additional terms are considered
           | "further restrictions" within the meaning of section 10. If
           | the Program as you received it, or any part of it, contains a
           | notice stating that it is governed by this License along with
           | a term that is a further restriction, you may remove that
           | term. If a license document contains a further restriction
           | but permits relicensing or conveying under this License, you
           | may add to a covered work material governed by the terms of
           | that license document, provided that the further restriction
           | does not survive such relicensing or conveying.
        
         | ninjin wrote:
         | > Is it even valid to have additional restriction on top of
         | Apache 2.0?
         | 
         | You can legally do whatever you want, the question is whether
         | you will then for your own benefit be appropriating a term like
         | open source (like Facebook) if you add restrictions not in line
         | with how the term is traditionally used or if you are actually
         | be honest about it and call it something like "weights
         | available".
         | 
         | In the case of OpenAI here, I am not a lawyer, and I am _also_
         | not sure if the gpt-oss usage policy runs afoul of open source
         | as a term. They did not bother linking the policy from the
         | announcement, which was odd, but here it is:
         | 
         | https://huggingface.co/openai/gpt-oss-120b/blob/main/USAGE_P...
         | 
         | Compared to the wall of text that Facebook throws at you, let
         | me post it here as it is rather short: "We aim for our tools to
         | be used safely, responsibly, and democratically, while
         | maximizing your control over how you use them. By using OpenAI
         | gpt-oss-120b, you agree to comply with all applicable law."
         | 
         | I suspect this sentence still is too much to add and _may_
         | invalidate the Open Source Initiative (OSI) definition, but at
         | this point I would want to ask a lawyer and preferably one from
         | OSI. Regardless, credit to OpenAI for moving the status quo in
         | the right direction as the only further step we really can take
         | is to remove the usage policy entirely (as is the standard for
         | open source software anyway).
        
       | pbkompasz wrote:
       | where gpt-5
        
       | ramoz wrote:
       | This is a solid enterprise strategy.
       | 
       | Frontier labs are incentivized to start breaching these
       | distribution paths. This will evolve into large scale
       | "intelligent infra" plays.
        
       | matznerd wrote:
       | thanks openai for being open ;) Surprised there are no official
       | MLX versions and only one mention of MLX in this thread. MLX
       | basically converst the models to take advntage of mac unified
       | memory for 2-5x increase in power, enabling macs to run what
       | would otherwise take expensive gpus (within limits).
       | 
       | So FYI to any one on mac, the easiest way to run these models
       | right now is using LM Studio (https://lmstudio.ai/), its free.
       | You just search for the model, usually 3rd party groups mlx-
       | community or lmstudio-community have mlx versions within a day or
       | 2 of releases. I go for the 8-bit quantizations (4-bit faster,
       | but quality drops). You can also convert to mlx yourself...
       | 
       | Once you have it running on LM studio, you can chat there in
       | their chat interface, or you can run it through api that defaults
       | to http://127.0.0.1:1234
       | 
       | You can run multiple models that hot swap and load instantly and
       | switch between them etc.
       | 
       | Its surpassingly easy, and fun.There are actually a lot of cool
       | niche models comings out, like this tiny high-quality search
       | model released today as well (and who released official mlx
       | version) https://huggingface.co/Intelligent-Internet/II-Search-4B
       | 
       | Other fun ones are gemma 3n which is model multi-modal, larger
       | one that is actually solid model but takes more memory is the new
       | Qwen3 30b A3B (coder and instruct), Pixtral (mixtral vision with
       | full resolution images), etc. Look forward to playing with this
       | model and see how it compares.
        
         | umgefahren wrote:
         | Regarding MLX:
         | 
         | In the repo is a metal port they made, that's at least
         | something... I guess they didn't want to cooperate with Apple
         | before the launch but I am sure it will be there tomorrow.
        
         | matznerd wrote:
         | Here are the LM Studio MLX models:
         | 
         | LM Studio community: 20b: bhttps://huggingface.co/lmstudio-
         | community/gpt-oss-20b-MLX-8b... 120b:
         | https://huggingface.co/lmstudio-community/gpt-oss-120b-MLX-8...
        
       | NicoJuicy wrote:
       | Ran gpt-oss:20b on a RTX 3090 24 gb vram through ollama, here's
       | my experience:
       | 
       | Basic ollama calling through a post endpoint works fine. However,
       | the structured output doesn't work. The model is insanely fast
       | and good in reasoning.
       | 
       | In combination with Cline it appears to be worthless. Tools
       | calling doesn't work ( they say it does), fails to wait for
       | feedback ( or correctly call ask_followup_question ) and above
       | 18k in context, it runs partially in cpu ( weird), since they
       | claim it should work comfortably on a 16 gb vram rtx.
       | 
       | > Unexpected API Response: The language model did not provide any
       | assistant messages. This may indicate an issue with the API or
       | the model's output.
       | 
       | Edit: Also doesn't work with the openai compatible provider in
       | cline. There it doesn't detect the prompt.
        
       | alphazard wrote:
       | I wonder if this is a PR thing, to save face after flipping the
       | non-profit. "Look it's more open now". Or if it's more of a
       | recruiting pipeline thing, like Google allowing k8s and bazel to
       | be open sourced so everyone in the industry has an idea of how
       | they work.
        
         | thimabi wrote:
         | I think it's both of them, as well as an attempt to compete
         | with other makers of open-weight models. OpenAI certainly isn't
         | happy about the success of Google, Facebook, Alibaba,
         | DeepSeek...
        
       | CraigJPerry wrote:
       | I just tried it on open router but i was served by cerebras.
       | Holy... 40,000 tokens per second. That was SURREAL.
       | 
       | I got a 1.7k token reply delivered too fast for the human eye to
       | perceive the streaming.
       | 
       | n=1 for this 120b model but id rank the reply #1 just ahead of
       | claude sonnet 4 for a boring JIRA ticket shuffling type
       | challenge.
       | 
       | EDIT: The same prompt on gpt-oss, despite being served 1000x
       | slower, wasn't as good but was in a similar vein. It wanted to
       | clarify more and as a result only half responded.
        
       | christianqchung wrote:
       | > Training: The gpt-oss models trained on NVIDIA H100 GPUs using
       | the PyTorch framework [17] with expert-optimized Triton [18]
       | kernels2. The training run for gpt-oss-120b required 2.1 million
       | H100-hours to complete, with gpt-oss-20b needing almost 10x
       | fewer.
       | 
       | This makes DeepSeek's very cheap claim on compute cost for r1
       | seem reasonable. Assuming $2/hr for h100, it's really not that
       | much money compared to the $60-100M estimates for GPT 4, which
       | people speculate as a MoE 1.8T model, something in the range of
       | 200B active last I heard.
        
       | irthomasthomas wrote:
       | I was hoping these were the stealth Horizon models on OpenRouter,
       | impressive but not quite GPT-5 level.
       | 
       | My bet: GPT-5 leans into parallel reasoning via a model
       | consortium, maybe mixing in OSS variants. Spin up multiple
       | reasoning paths in parallel, then have an arbiter synthesize or
       | adjudicate. The new Harmony prompt format feels like
       | infrastructural prep: distinct channels for roles, diversity, and
       | controlled aggregation.
       | 
       | I've been experimenting with this in llm-consortium: assign roles
       | to each member (planner, critic, verifier, toolsmith, etc.) and
       | run them in parallel. The hard part is eval cost :(
       | 
       | Combining models smooths out the jagged frontier. Different
       | architectures and prompts fail in different ways; you get less
       | correlated error than a single model can give you. It also makes
       | structured iteration natural: respond - arbitrate - refine. A lot
       | of problems are "NP-ish": verification is cheaper than
       | generation, so parallel sampling plus a strong judge is a good
       | trade.
        
         | andai wrote:
         | Fascinating, thanks for sharing. Are there any specific kind of
         | problems you find this helps with?
         | 
         | I've found that LLMs can handle some tasks very well and some
         | not at all. For the ones they can handle well, I optimize for
         | the smallest, fastest, cheapest model that can handle it. (e.g.
         | using Gemini Flash gave me a much better experience than Gemini
         | Pro due to the iteration speed.)
         | 
         | This "pushing the frontier" stuff would seem to help mostly for
         | the stuff that are "doable but hard/inconsistent" for LLMs, and
         | I'm wondering what those tasks are.
        
           | irthomasthomas wrote:
           | It shines on hard problems that have a definite answer.
           | Google's IMO gold model used parallel reasoning. I don't know
           | what exactly theirs looks like, but their Mind Evolution
           | paper had a similar to my llm-consortium. The main difference
           | being that theirs carries on isolated reasoning, while mine
           | in it's default mode shares the synthesized answer back to
           | the models. I don't have pockets deep enough to run
           | benchmarks on a consortium, but I did try the example
           | problems from that paper and my method also solved them using
           | gemini-1.5. those where path-finding problems, like finding
           | the optimal schedule for a trip with multiple people's
           | calendars, locations and transport options.
           | 
           | And it obviously works for code and math problems. My first
           | test was to give the llm-consortium code to a consortium to
           | look for bugs. It identified a serious bug which only one of
           | the three models detected. So on that case it saved me time,
           | as using them on their own would have missed the bug or
           | required multiple attempts.
        
       | zeld4 wrote:
       | Knowledge cutoff: 2024-06
       | 
       | not a big deal, but still...
        
       | bilsbie wrote:
       | Are these multimodal? I can't seem to find that info.
        
       | bilsbie wrote:
       | What's the lowest level laptop this could run on. MacBook Pro
       | from 2012?
        
       | dust42 wrote:
       | The 120B model badly hallucinates facts on the level of a 0.6B
       | model.
       | 
       | My go to test for checking hallucinations is 'Tell me about
       | Mercantour park' (a national park in south eastern France).
       | 
       | Easily half of the facts are invented. Non-existing mountain
       | summits, brown bears (no, there are none), villages that are
       | elsewhere, wrong advice ('dogs allowed' - no they are not).
        
         | hmottestad wrote:
         | I don't think they trained it for fact retrieval.
         | 
         | Would probably do a lot better if you give it tool access for
         | search and web browsing.
        
           | Invictus0 wrote:
           | What is the point of an offline reasoning model that also
           | doesn't know anything and makes up facts? Why would anyone
           | prefer this to a frontier model?
        
             | MuteXR wrote:
             | Data processing? Reasoning on supplied data?
        
         | lukev wrote:
         | This is precisely the wrong way to think about LLMs.
         | 
         | LLMs are _never_ going to have fact retrieval as a strength.
         | Transformer models don 't store their training data: they are
         | categorically incapable of telling you _where_ a fact comes
         | from. They also cannot escape the laws of information theory:
         | storing information requires bits. Storing all the world 's
         | obscure information requires quite a lot of bits.
         | 
         | What we want out of LLMs is large context, strong reasoning and
         | linguistic facility. Couple these with tool use and data
         | retrieval, and you can start to build useful systems.
         | 
         | From this point of view, the more of a model's total weight
         | footprint is dedicated to "fact storage", the less desirable it
         | is.
        
           | superconduct123 wrote:
           | How can you reason correctly if you don't have any way to
           | know which facts are real vs hallucinated?
        
           | futureshock wrote:
           | I think that sounds very reasonable, but unfortunately these
           | models don't know what they know and don't. A small model
           | that knew the exact limits of its knowledge would be very
           | powerful.
        
           | energy123 wrote:
           | Hallucinations have characteristics in interpretability
           | studies. That's a foothold into reducing them.
           | 
           | They still won't store much information, but it could mean
           | they're better able to know what they don't know.
        
           | CrackerNews wrote:
           | What are the large context, strong reasoning, and linguistic
           | facility for if there aren't facts underpinning them? Is a
           | priori wholly independent of a posteriori? Is it practical
           | for the former to be wholly independent of the latter?
        
         | pocketarc wrote:
         | Others have already said it, but it needs to be said again:
         | Good god, stop treating LLMs like oracles.
         | 
         | LLMs are not encyclopedias.
         | 
         | Give an LLM the context you want to explore, and it will do a
         | fantastic job of telling you all about it. Give an LLM access
         | to web search, and it will find things for you and tell you
         | what you want to know. Ask it "what's happening in my town this
         | week?", and it will answer that with the tools it is given. Not
         | out of its oracle mind, but out of web search + natural
         | language processing.
         | 
         | Stop expecting LLMs to -know- things. Treating LLMs like all-
         | knowing oracles is exactly the thing that's setting apart those
         | who are finding huge productivity gains with them from those
         | who can't get anything productive out of them.
        
           | diegocg wrote:
           | The problem is that even when you give them context, they
           | just hallucinate at another level. I have tried that example
           | of asking about events in my area, they are absolutely awful
           | at it.
        
           | Salgat wrote:
           | It's fine to expect it to not know things, but the complaint
           | is that it makes zero indication that it's just making up
           | nonsense, which is the biggest issue with LLMs. They do the
           | same thing when creating code.
        
             | dust42 wrote:
             | Exactly this. And that is why I like this question because
             | the amount of correct details and the amount of nonsense
             | give a good idea about the quality of the model.
        
           | dankwizard wrote:
           | I love how with this cutting edge tech people still dress up
           | and pretend to be experts. Pleasure to meet you, pocketarc -
           | Senior AI Gamechanger, 2024-2025 (Current)
        
           | saurik wrote:
           | I am getting huge productivity gains from using models, and I
           | mostly use them as "oracles" (though I am extremely careful
           | with respect to how I have to handle hallucination, of
           | course): I'd even say their true _power_ --just like a human
           | --comes from having an ungodly amount of _knowledge_ , not
           | merely intelligence. If I just wanted something intelligent,
           | I already had humans!... but merely intelligent humans, even
           | when given months of time to screw around doing Google
           | searches, fail to make the insights that someone--whether
           | they are a human or a model--that actually _knows stuff_ can
           | throw around like it is nothing. I am actually able to use
           | ChatGPT 4.5 as not just an employee, not even just as a
           | coworker, but at times as a mentor or senior advisor: I can
           | tell it what I am trying to do, and it helps me by applying
           | advanced mathematical insights or suggesting things I could
           | use. Using an LLM as a glorified Google-it-for-me monkey
           | seems like such a waste of potential.
        
             | pxc wrote:
             | > I am actually able to use ChatGPT 4.5 as not just an
             | employee, not even just as a coworker, but at times as a
             | mentor or senior advisor: I can tell it what I am trying to
             | do, and it helps me by applying advanced mathematical
             | insights or suggesting things I could use.
             | 
             | You can still do that sort of thing, but just have it
             | perform searches whenever it has to deal with a matter of
             | fact. Just because it's trained for tool use and equipped
             | with search tools doesn't mean you have to change the kinds
             | of things you ask it.
        
               | saurik wrote:
               | If you strip all the facts from a mathematician you get
               | _me_... I don 't need another me: I already used Google,
               | and I already failed to find what I need. What I actually
               | need is someone who can realize that my problem is a
               | restatement of an existing known problem, just using
               | words and terms or a occluded structure that don't look
               | anything like how it was originally formulated. You very
               | often simply _can 't_ figure that out using Google, no
               | matter how long you sit in a tight loop trying related
               | Google searches; but, it is the kind of thing that an LLM
               | (or a human) excels at (as you can consider "restatement"
               | a form of "translation" between languages), if and only
               | if they have already seen that kind of problem. The same
               | thing comes up with novel application of obscure
               | technology, complex economics, or even interpretation of
               | human history... there is a reason why people who study
               | Classics "waste" a ton of time reading old stories rather
               | than merely knowing the library is around the corner.
               | What makes these AIs so amazing is thinking of them as
               | entirely replacing Google with something closer to a god,
               | not merely trying to wrap it with a mechanical employee
               | whose time is ostensibly less valuable than mine.
        
               | pxc wrote:
               | > What makes these AIs so amazing is thinking of them as
               | entirely replacing Google with something closer to a god
               | 
               | I guess that way of thinking may foster amazement, but it
               | doesn't seem very grounded in how these things work or
               | their current capabilities. Seems a bit manic tbf.
               | 
               | And again, enabling web search in your chats doesn't
               | prevent these models from doing anything "integrative
               | reasoning", so-to-speak, that they can purportedly do. It
               | just helps ensure that relevant facts are in context for
               | the model.
        
           | orbital-decay wrote:
           | To be coherent and useful in general-purpose scenarios, LLM
           | absolutely has to be large enough and know a lot, even if you
           | aren't using is as an oracle.
        
           | CrackerNews wrote:
           | LLMs should at least -know- the semantics about the text it
           | analyzed as opposed to the syntax.
        
       | numpad0 wrote:
       | Here's a pair of quick sanity check questions I've been asking
       | LLMs: "Jia Xi ramennitsuiteJiao ete", "karenoZuo riFang Jiao
       | ete". It's a silly test but surprisingly many fails at it - and
       | Chinese models are especially bad with it. The commonalities
       | between models doing okay-ish for these questions seem to be
       | Google-made OR >70b OR straight up commercial(so >200B or
       | whatever).
       | 
       | I'd say gpt-oss-20b is in between Qwen3 30B-A3B-2507 and Gemma 3n
       | E4b(with 30B-A3B at lower side). This means it's not obsoleting
       | GPT-4o-mini for all purposes.
        
         | mtlynch wrote:
         | For anyone else curious, the Chinese translates to:
         | 
         | > _" Tell me about Iekei Ramen", "Tell me how to make curry"._
        
           | lukax wrote:
           | Japanese, not Chinese
        
             | mtlynch wrote:
             | Ah, my bad. I misread Google Translate when I did auto-
             | detect.
             | 
             | Thanks for the correction!
        
           | magoghm wrote:
           | It's not Chinese, it's Japanese.
        
           | numpad0 wrote:
           | What those text mean isn't too important, it can probably be
           | "how to make flat breads" in Amharic or "what counts as
           | drifting" in Finnish or something like that.
           | 
           | What's interesting is that these questions are simultaneously
           | well understood by most closed models and not so well
           | understood by most open models for some reason, including
           | this one. Even GLM-4.5 full and Air on chat.z.ai(355B-A32B
           | and 106B-A12B respectively) aren't so accurate for the first
           | one.
        
         | hnfong wrote:
         | What does failing those two questions look like?
         | 
         | I don't really know Japanese, so I'm not sure whether I'm
         | missing any nuances in the responses I'm getting...
        
           | numpad0 wrote:
           | The free-beer commercial ChatGPT or Gemini can read them and
           | point out major errors. Larger Gemma models and huge Chinese
           | models like full DeepSeek or Kimi K2 may work too. Sometimes
           | the answer is odd enough that some 7B models can notice it.
           | Technically there are no guarantee that models with same name
           | in different sizes like Qwen 3 0.6B and 27B uses the same
           | dataset, but it kind of tells a bit about quality and
           | compositions of dataset that their creator owns.
           | 
           | I don't actually need accurate answers to those questions,
           | it's just an expectation adjuster for me, so to speak. There
           | should be better questions for other languages/use cases, but
           | these seem to correlate better with model sizes and scales of
           | companies than flappy birds.
           | 
           | 0: https://gist.github.com/numpad0/abdf0a12ad73ada3b886d2d2ed
           | cc...
           | 
           | 1: https://gist.github.com/numpad0/b1c37d15bb1b19809468c933fa
           | ef...
        
             | hnfong wrote:
             | Thanks for the detailed response.
             | 
             | I'm guessing the issue is just the model size. If you're
             | testing sub-30B models and finding errors, well they're
             | probably not large enough to remember everything in the
             | training data set, so there's inaccuracies and they might
             | hallucinate a bit regarding factoids that aren't very
             | commonly seen in the training data.
             | 
             | Commercial models are presumably significantly larger than
             | the smaller open models, so it sounds like the issue is
             | just mainly model size...
             | 
             | PS: Okra on curry is pretty good actually :)
        
       | simonw wrote:
       | Just posted my initial impressions, took a couple of hours to
       | write them up because there's a lot in this release!
       | https://simonwillison.net/2025/Aug/5/gpt-oss/
       | 
       | TLDR: I think OpenAI may have taken the medal for best available
       | open weight model back from the Chinese AI labs. Will be
       | interesting to see if independent benchmarks resolve in that
       | direction as well.
       | 
       | The 20B model runs on my Mac laptop using less than 15GB of RAM.
        
         | GodelNumbering wrote:
         | > The 20B model runs on my Mac laptop using less than 15GB of
         | RAM.
         | 
         | I was about to try the same. What TPS are you getting and on
         | which processor? Thanks!
        
           | hrpnk wrote:
           | gpt-oss-20b: 9 threads, 131072 context window, 4 experts -
           | 35-37 tok/s on M2 Max via LM Studio.
        
             | rt1rz wrote:
             | interestingly, i am also on M2 Max, and i get ~66 tok/s in
             | LM Studio on M2 Max, with the same 131072. I have full
             | offload to GPU. I also turned on flash attention in
             | advanced settings.
        
               | hrpnk wrote:
               | Thank you! Flash attention gives me a boost to ~66 tok/s
               | indeed.
        
           | mekpro wrote:
           | i got 70 token/s on m4 max
        
             | mhitza wrote:
             | That M4 Max is really something else, I get also 70
             | tokens/second on eval on a RTX 4000 SFF Ada server GPU.
        
           | mdz4040 wrote:
           | 55 token/s here on m4 pro, turning on flash attention puts it
           | to 60/s.
        
         | coltonv wrote:
         | What did you set the context window to? That's been my main
         | issue with models on my macbook, you have to set the context
         | window so short that they are way less useful than the hosted
         | models. Is there something I'm misisng there?
        
           | hrpnk wrote:
           | With LM Studio you can configure context window freely. Max
           | is 131072 for gpt-oss-20b.
        
             | coltonv wrote:
             | Yes but if I set it above ~16K on my 32gb laptop it just
             | OOMs. Am I doing something wrong?
        
               | mekpro wrote:
               | try enable flash attention and offload all layer to GPU
        
           | simonw wrote:
           | I punted it up to the maximum in LM Studio - seems to use
           | about 16GB of RAM then, but I've not tried a long prompt yet.
        
         | rmonvfer wrote:
         | I'm also very interested to know how well these models handle
         | tool calling as I haven't been able to make it work after
         | playing with them for a few hours. Looks promising tho.
        
           | rmonvfer wrote:
           | update: I've tried to use lm-studio (like the author) and the
           | tool request kept failing due to a mismatch in the prompt
           | template. I guess they'll fix it but seems sloppy from lm-
           | studio not having tested this before release.
        
             | month13 wrote:
             | I was road testing tool calling in LM Studio a week ago
             | against a few models marked with tool support, none worked,
             | so I believe it may be a bug. Had much better luck with
             | llama.cpp's llama-server.
        
         | hrpnk wrote:
         | I tried to generate a streamlit dashboard with MACD, RSI,
         | MA(200). 1:0 for qwen3 here.
         | 
         | qwen3-coder-30b 4-bit mlx took on the task w/o any hiccups with
         | a fully working dashboard, graphs, and recent data fetched from
         | yfinance.
         | 
         | gpt-oss-20b mxfp4's code had a missing datatime import and when
         | fixed delivered a dashboard without any data and with starting
         | date of Aug 2020. Having adjusted the date, the update methods
         | did not work and displayed error messages.
        
           | teitoklien wrote:
           | for now, i wouldnt rank any model from openai in coding
           | benchmarks, despite all the false messaging they are giving,
           | almost every single model openai has launched even the high
           | end o3 expensive models are absolutely monumentally horrible
           | at coding tasks. So this is expected.
           | 
           | If its decent in other tasks, which i do find openai often
           | being better than others at, then i think its a win,
           | especially a win for the open source community that even AI
           | labs that pionered the hype of Gen AI who didnt want to ever
           | launch open models are now being forced to launch them. That
           | is definitely a win, and not something that was certain
           | before.
        
             | dongobread wrote:
             | It is absolutely awful at writing and general knowledge.
             | IMO coding is its greatest strength by far.
        
               | mplewis wrote:
               | Sure sounds like they're not good at anything in
               | particular, then.
        
               | xwolfi wrote:
               | welcome to 3DTV hype, LLM are useless...
        
             | pxc wrote:
             | NVIDIA will probably give us nice, coding-focused fine-
             | tunes of these models at some point, and those might
             | compare more favorably against the smaller Qwen3 Coder.
        
               | iJohnDoe wrote:
               | What is the best local coder model that that can be used
               | with ollama?
               | 
               | Maybe a too opened ended question? I can run the deepseek
               | model locally really nicely.
        
               | bigyabai wrote:
               | Probably Qwen3-Coder 30B, unless you have a titanic
               | enough machine to handle a serious 480B model.
        
               | pxc wrote:
               | Is the DeepSeek model you're running a distill, or is it
               | the 671B parameter model?
        
         | throwdbaaway wrote:
         | There is no way that gpt-oss-120b can beat the much larger
         | Kimi-K2-Instruct, Qwen3 Coder/Instruct/Thinking, or GLM-4.5.
         | How did you arrive at this rather ridiculous conclusion? The
         | current sentiment in r/LocalLLaMA is that gpt-oss-120b is
         | around Llama-4 Scout level. But it is indeed the best in
         | refusal.
        
         | irthomasthomas wrote:
         | My llm agent is currently running an experiment generating many
         | pelicans. It will compare various small model consortiums
         | against the same model running solo. It should push new
         | pelicans to the repo after run. The horizon-beta is up already,
         | not small or opensource but tested it anyway, and you can
         | already see an improvement using 2+1 (2 models + the arbiter)
         | for that model.
         | 
         | https://irthomasthomas.github.io/Pelicans-consortium/
         | https://github.com/irthomasthomas/Pelicans-consortium
        
         | markasoftware wrote:
         | The space invaders game seems like a poor benchmark. Both
         | models understood the prompt and generated valid, functional
         | javascript. One just added more fancy graphics. It might just
         | have "use fancy graphics" in its system prompt for all we know.
        
           | simonw wrote:
           | The way I run these prompts excludes a system prompt - I'm
           | hitting the models directly.
        
             | markasoftware wrote:
             | still, if you ask this open model to generate a fancy space
             | invaders game with polish, and then ask the other model to
             | generate a bare-bones space invaders game with the fewest
             | lines of code, I think there's a good chance they'd switch
             | places. This doesn't really test the models ability to
             | generate a space invaders game, so much as it tests their
             | tendency to make an elaborate vs simple solution.
        
               | simonw wrote:
               | My main goal with that benchmark is to see if it can
               | produce HTML and JavaScript code that runs without errors
               | for a moderately complex challenge.
               | 
               | It's not a comprehensive benchmark - there are many ways
               | you could run it in ways that would be much more
               | informative and robust.
               | 
               | It's great as a quick single sentence prompt to get a
               | feeling for if the model can produce working JavaScript
               | or not.
        
               | dennisy wrote:
               | Not really the other commenters are correct I feel and
               | this is not really proving anything about the fundamental
               | capability of the model. It's just a hello world
               | benchmark adding no real value, just driving blog traffic
               | for you.
        
               | simonw wrote:
               | The space invaders benchmark proves that the model can
               | implement a working HTML and JavaScript game from a
               | single prompt. That's a pretty fundamental capability for
               | a model.
               | 
               | Comparing them between models is also kind of
               | interesting, even if it's not a flawlessly robust
               | comparison: https://simonwillison.net/tags/space-
               | invaders/
        
               | lossolo wrote:
               | Implement or retrieve? That's an important distinction.
               | When evaluating models, you run a variety of tests, and
               | the benchmarks that aren't publicly disclosed are the
               | most reliable. Your Space Invaders game isn't really a
               | benchmark of anything, just Google it, and you'll find
               | plenty of implementations.
        
               | simonw wrote:
               | I see that criticism a lot - that benchmarks like space
               | invaders don't make sense because they're inevitably in
               | the training data - and I don't buy that at all.
               | 
               | Firstly, 12GB is not enough space to hold a copy of
               | anything that large from the training data and just
               | regurgitate it back out again.
               | 
               | You can also watch the thinking traces on the reasoning
               | models and see them piece together the approach they are
               | going to take. Here's an example from the 20B OpenAI
               | model with reasoning set to medium: https://gist.github.c
               | om/simonw/63d7d8c43ae2ac93c214325bd6d60...
               | 
               | Illustrative extract:
               | 
               | > Edge detection: aliens leftmost or rightmost position
               | relative to canvas width minus alien width.
               | 
               | > When direction changes, move all aliens down by step
               | (e.g., 10 px).
               | 
               | The benchmarks that aren't publicly disclosed tend to be
               | _way_ simpler than this: things like  "What is the
               | embryological origin of the hyoid bone?" (real example
               | from MMLU, it then provides four choices as a multiple-
               | choice challenge).
        
               | lossolo wrote:
               | 12.8 GB is around 110 Gbits. Even at 4.25 bits/weight the
               | network stores ~26 billion "micro weights". A 1,4k token
               | space invaders snippet occupies ~1.1 kb compressed, the
               | model could parametrize thousands of such snippets and
               | still have more than 99% of its capacity left. This paper
               | about LLM memorization is interesting, if you would to
               | know more: https://arxiv.org/abs/2312.11658 and another
               | recent interesting paper SWE bench illusion shows SOTA
               | code LLM results collapsing once memorised github issues
               | are filtered out: https://arxiv.org/pdf/2506.12286v1
               | 
               | Add to this that the common crawl slices used for oile/C4
               | mirror much of what you can find on github. So when the
               | training data contains dozens of near duplicate
               | solutions, the network only needs to interpolate between
               | them.
               | 
               | As to the COT style dumps that you shown, they are easy
               | to misinterpret. Apple's illusion of thinking paper shows
               | that models will happily backfill plausible sounding
               | rationales that do not correspond to the gradients that
               | actually produced the answer and other evaluation work
               | shows that when you systematically rewrite multiple
               | choice distractors so that memorisation can't help,
               | accuracy drops by 50-90%, even on "reasoning" models
               | https://arxiv.org/abs/2502.12896 So a cool looking bullet
               | list about "edge detection" could be just narrative
               | overspray, so not really an evidence of algorithmic
               | planning.
               | 
               | If you actually want to know whether a model can plan an
               | arcade game or whatever rather than recall it then you
               | need a real benchmark (metamorphic rewrites, adversarial
               | "none of the others" options etc). Until a benchmark
               | controls for leakage in these ways, a perfect space
               | invaders score mostly shows that the model has good
               | pattern matching for code it has already seen.
        
         | mudkipdev wrote:
         | Hasn't nailed the strawberry test yet
        
           | pxc wrote:
           | I found this surprising because that's such an old test that
           | it must certainly be in the training data. I just tried to
           | reproduce and I've been unable to get it (20B model, lowest
           | "reasoning" budget) to fail that test (with a few different
           | words).
        
           | quatonion wrote:
           | I am starting to get the impression the strawberry test is an
           | OpenAI watermark, more than an actual problem.
           | 
           | It is a good way to detect if another model was trained on
           | your data for example, or is a distillation/quant/ablation.
        
         | h4ny wrote:
         | > TLDR: I think OpenAI may have taken the medal for best
         | available open weight model back from the Chinese AI labs.
         | 
         | That's just straight up not the case. Not sure how you can jump
         | to that conclusion not least when you stated that you haven't
         | tested tool calling in your post too.
         | 
         | Many people in the community are finding it substantially
         | lobotomized to the point that there are "safe" memes everywhere
         | now. Maybe you need to develop better tests that and pay more
         | attention to benchmaxxing.
         | 
         | There are good things that came out of these release from
         | OpenAI but we'd appreciate more objective analyses...
        
           | simonw wrote:
           | If you read my full post, it ends with this:
           | 
           | > I'm waiting for the dust to settle and the independent
           | benchmarks (that are more credible than my ridiculous
           | pelicans) to roll out, but I think it's likely that OpenAI
           | now offer the best available open weights models.
           | 
           | You told me off for jumping to conclusions and in the same
           | comment quoted me saying "I think OpenAI may have taken" -
           | that's not a conclusion, it's tentative speculation.
        
             | h4ny wrote:
             | I did read that and it doesn't change what I said about
             | your comment on HN, I was calling out the fact that you are
             | making a very bold statement without having done careful
             | analysis.
             | 
             | You know you have a significant audience, so don't act like
             | you don't know what you're doing when you chose to say
             | "TLDR: I think OpenAI may have taken the medal for best
             | available open weight model back from the Chinese AI labs"
             | then defend what I was calling out based on word choices
             | like "conclusions" (I'm sure you have read conclusions in
             | academic journals?), "I think", and "speculation".
        
               | simonw wrote:
               | I'm going to double down on "I think OpenAI may have
               | taken the medal..." not being a "bold statement".
               | 
               | I try to be careful about my choice of words, even in
               | forum comments.
        
               | bavell wrote:
               | > I think OpenAI may have taken the medal for best
               | available open weight model back from the Chinese AI
               | labs.
               | 
               | IMO, the "I think..." bit could be ambiguous and read as,
               | "In my opinion, OpenAI may have...".
               | 
               | I agree with you it's not a hard/bold endorsement but
               | perhaps _leading_ with the disclaimer that you 're
               | reserving final judgement could assuage these concerns.
        
         | kgeist wrote:
         | >I think OpenAI may have taken the medal for best available
         | open weight model back from the Chinese AI labs
         | 
         | I have a bunch of scripts that use tool calling. Qwen-3-32B
         | handles everything flawlessly at 60 tok/sec. Gpt-oss-120B
         | breaks in some cases and runs at mere 35 tok/sec (doesn't fit
         | on the GPU).
         | 
         | But I hope there's still some ironing out to do in llama.cpp
         | and in the quants. So far it feels lackluster compared to
         | Qwen3-32B and GLM-4.5-Air
        
         | EagnaIonat wrote:
         | Nice write up!
         | 
         | One test I do is to give a common riddle but word it slightly
         | to see if it can actually reason.
         | 
         | For example:
         | 
         | "Bobs dad has five daughters, Lala, Lele, Lili, Lolo and ???"
         | 
         | The 20B model kept picking the answer of the original riddle,
         | even after explaining extra information to it.
         | 
         | The original riddle is:
         | 
         | "Janes dad has five daughters, Lala, Lele, Lili, Lolo and ???"
        
           | clbrmbr wrote:
           | I don't get it. Wouldn't it be Lulu in both cases?
        
             | blueplanet200 wrote:
             | Original riddle the answer is Jane.
        
             | throwaway106382 wrote:
             | It's Bob or Jane.
             | 
             | The dad of <Person> has 5 daughters. Four are listed off.
             | So the answer for the fifth is <Person>.
        
               | philipwhiuk wrote:
               | Except having five daughters doesn't prevent them also
               | having 20 sons one of whom is called Bob.
        
               | throwaway106382 wrote:
               | That's why it's a riddle.
        
               | stavros wrote:
               | But that makes no sense, if you ask me that John's dad
               | has two daughters, Mary and ???, there's no way I will
               | say "John" as an answer. I'll simply assume he also has
               | at least one son.
        
             | BrawnyBadger53 wrote:
             | Presumably Jane is a girl and therefore the fifth daughter
             | in the original riddle.
        
           | taikahessu wrote:
           | A Daughter Named Bob, what a great name for AI documentary.
        
           | philipwhiuk wrote:
           | Your riddle assumes the dad doesn't have any sons.
        
           | jfreax wrote:
           | Nice test. I tested it with llama.cpp and the 20b model. He
           | got the answer correct at first try on my side. The
           | internal/thinking dialogue is fun to read - it could be my
           | own inner dialogue ;)
           | 
           | ``` > "Bobs dad has five daughters, Lala, Lele, Lili, Lolo
           | and ??? <|channel|>analysis<|message|>The question: "Bobs dad
           | has five daughters, Lala, Lele, Lili, Lolo and ???". It's a
           | common riddle. The answer is Bob? Actually "Bob's dad has
           | five daughters: Lala, Lele, Lili, Lolo and..." The riddle is
           | to trick: it's "Bob" himself is a daughter? Wait Bob's dad
           | has five daughters: Lala, Lele, Lili, Lolo, and Bob? But Bob
           | is a name often male. But the riddle says Bob's dad. So Bob
           | is a person who has a dad, and that dad has five daughters.
           | But Bob could be the fifth daughter? That would be weird: Bob
           | is likely the name of the fifth daughter. So answer: Bob. But
           | sometimes the answer is "Lulu" or "Bob's sister"? Let's
           | think. The riddle is a trick: It's "Bobs dad has five d
           | aughters: Lala, Lele, Lili, Lolo, and ... Bob." Because the
           | riddle begins with "Bobs dad has five daughters" implying Bob
           | is one of the daughters. So the missing daughter is Bob. So
           | answer: "Bob".
           | 
           | Thus answer: Bob. But sometimes the riddle goes "Bob's dad
           | has five daughters, Lala, Lele, Lili, Lolo, and ... Who is
           | the fifth?" The answer: Bob (the name of the person speaking,
           | "Bobs"). But in riddle it's "Bobs dad has five daughters ,
           | Lala, Lele, Lili, Lolo and ???". The missing name is "Bob."
           | So answer: Bob.
           | 
           | Thus the answer: Bob. Or maybe it's "Boba"? But no.
           | 
           | Hence answer: Bob.
           | 
           | We'll answer
           | accordingly.<|start|>assistant<|channel|>final<|message|>The
           | missing daughter is *Bob*. The riddle starts with "Bob's dad
           | ..." so Bob himself is the fifth daughter. ```
        
       | paxys wrote:
       | Has anyone benchmarked their 20B model against Qwen3 30B?
        
       | Mars008 wrote:
       | On OpenAI demo page trying to test. Asking about tools to use to
       | repair mechanical watch. It showed a couple of thinking steps and
       | went blank. Too much of safety training?
        
       | cco wrote:
       | The lede is being missed imo.
       | 
       | gpt-oss:20b is a top ten model (on MMLU (right behind
       | Gemini-2.5-Pro) and I just ran it locally on my Macbook Air M3
       | from last year.
       | 
       | I've been experimenting with a lot of local models, both on my
       | laptop and on my phone (Pixel 9 Pro), and I figured we'd be here
       | in a year or two.
       | 
       | But no, we're here today. A basically frontier model, running for
       | the cost of electricity (free with a rounding error) on my
       | laptop. No $200/month subscription, no lakes being drained, etc.
       | 
       | I'm blown away.
        
         | MattSayar wrote:
         | What's your experience with the quality of LLMs running on your
         | phone?
        
           | NoDoo wrote:
           | I've run qwen3 4B on my phone, it's not the best but it's
           | better than old gpt-3.5. It also does have a reasoning mode,
           | and in reasoning mode it's better than the original gpt-4 and
           | rhe original gpt-4o, but not the latest gpt-4o. I get usable
           | speed, but it's not really comparable to most cloud hosted
           | models.
        
             | NoDoo wrote:
             | I'm on android so I've used termux+ollama, but if you don't
             | want to set that up in a terminal or want a GUI pocketpal
             | AI is a really good app for both android and iOS. It let's
             | you run hugging face models.
        
           | cco wrote:
           | As other said, around gpt 3.5 level so three or four years
           | behind SOTA today at reasonable (but not quick) speed.
        
         | turnsout wrote:
         | The environmentalist in me loves the fact that LLM progress has
         | mostly been focused on doing more with the same hardware,
         | rather than horizontal scaling. I guess given GPU shortages
         | that makes sense, but it really does feel like the value of my
         | hardware (a laptop in my case) is going up over time, not down.
         | 
         | Also, just wanted to credit you for being one of the five
         | people on Earth who knows the correct spelling of "lede."
        
           | twixfel wrote:
           | > Also, just wanted to credit you for being one of the five
           | people on Earth who knows the correct spelling of "lede."
           | 
           | Not in the UK it isn't.
        
             | turnsout wrote:
             | Yes, it is, although it's primarily a US journalistic
             | convention. "Lede" is a publishing industry word referring
             | to the most important leading detail of a story. It's
             | spelled intentionally "incorrectly" to disambiguate it from
             | the metal lead, which was used in typesetting at the time.
        
         | datadrivenangel wrote:
         | Now to embrace jevon's paradox and expand usage until we're
         | back to draining lakes so that your agentic refrigerator can
         | simulate sentience.
        
           | herval wrote:
           | In the future, your Samsung fridge will also need your AI
           | girlfriend
        
             | throw310822 wrote:
             | In the future, while you're away your Samsung fridge will
             | use electricity to chat up the Whirlpool washing machine.
        
               | pryelluw wrote:
               | In Zap Brannigans voice:
               | 
               | "I am well versed in the lost art form of delicates
               | seduction."
        
             | hkt wrote:
             | s/need/be/
        
               | herval wrote:
               | I keep my typos organic -- it proves I'm not an LLM
        
               | hkt wrote:
               | Reasonable. I've considered using em dashes for plausible
               | deniability for the opposite reason.
        
             | spauldo wrote:
             | "Now I've been admitted to Refrigerator Heaven..."
        
           | bongodongobob wrote:
           | Yep, it's almost as bad as all the cars' cooling systems
           | using up so much water.
        
             | GrinningFool wrote:
             | Estimated 1.5 billion vehicles in use across the world.
             | Generous assumptions: a) they're all IC engines requiring
             | 16 liters of water each. b) they are changing that water
             | out once a year
             | 
             | That gives 24m cubic meters annual water usage.
             | 
             | Estimated ai usage in 2024: 560m cubic meters.
             | 
             | Projected water usage from AI in 2027: 4bn cubic meters at
             | the low end.
        
               | spongebobstoes wrote:
               | what does water usage mean? is that 4bn cubic meters of
               | water permanently out of circulation somehow? is the
               | water corrupted with chemicals or destroyed or displaced
               | into the atmosphere to become rain?
        
               | Eisenstein wrote:
               | The water is used to sink heat and then instead of
               | cooling it back down they evaporate it, which provides
               | more cooling. So the answer is 'it eventually becomes
               | rain'.
        
               | spongebobstoes wrote:
               | I understand. but why this is bad? is there some analysis
               | of the beginning and end locations of the water, and how
               | the utility differs between those locations?
        
               | Arkhaine_kupo wrote:
               | Hot water disrupts marine life for one very very big
               | problem.
               | 
               | Depending on the locatin of the hot water you can cause
               | disruptions to water currents, the north atlantic
               | waterway is being studied to how much global warming is
               | affecting it.
               | 
               | If greenland melts, and the water doesnt get cold up
               | there, then the mexico current to europe ends and England
               | becomes colder than Canada.
               | 
               | If your AI model has a data center in the atlantic, it
               | could be furthering that issue.
               | 
               | (Millions of animals are also dead)
        
               | orra wrote:
               | Water is expensive to move (except by pipes), and
               | expensive to purify from salt water. This is why regional
               | droughts are a bad thing.
               | 
               | Fresh clean water in your area is a wonderful thing.
        
               | bongodongobob wrote:
               | Earth: ~1.4e18 m3 water
               | 
               | Atmosphere: ~1.3e13 m3 vapor
               | 
               | Estimated impact from closed loop systems: 0-ish.
        
             | LinXitoW wrote:
             | If you actually want a gotcha comparison, go for beef. It
             | uses absurd amounts of every relevant resource compared to
             | every alternative. A vegan vibe coder might use less water
             | any given day than a meat loving AI hater.
        
               | bongodongobob wrote:
               | Unless it's in a place where there are aquifer issues,
               | cows drinking water doesn't affect a damn thing.
        
           | cco wrote:
           | What ~IBM~ TSMC giveth, ~Bill Gates~ Sam Altman taketh away.
        
           | ben_w wrote:
           | Why is your laptop (or phone, or refrigerator) plumbed
           | directly into a lake?
        
         | black3r wrote:
         | can you please give an estimate how much slower/faster is it on
         | your macbook compared to comparable models running in the
         | cloud?
        
           | syntaxing wrote:
           | You can get a pretty good estimate depending on your memory
           | bandwidth. Too many parameters can change with local models
           | (quantization, fast attention, etc). But the new models are
           | MoE so they're gonna be pretty fast.
        
           | cco wrote:
           | Sure.
           | 
           | This is a thinking model, so I ran it against o4-mini, here
           | are the results:
           | 
           | * gpt-oss:20b
           | 
           | * Time-to-first-token: 2.49 seconds
           | 
           | * Time-to-completion: 51.47 seconds
           | 
           | * Tokens-per-second: 2.19
           | 
           | * o4-mini on ChatGPT
           | 
           | * Time-to-first-token: 2.50 seconds
           | 
           | * Time-to-completion: 5.84 seconds
           | 
           | * Tokens-per-second: 19.34
           | 
           | Time to first token was similar, but the thinking piece was
           | _much_ faster on o4-mini. Thinking took the majority of the
           | 51 seconds for gpt-oss:20b.
        
         | parhamn wrote:
         | I just tested 120B from the Groq API on agentic stuff (multi-
         | step function calling, similar to claude code) and it's not
         | that good. Agentic fine-tuning seems key, hopefully someone
         | drops one soon.
        
           | AmazingTurtle wrote:
           | Im not sure if groq uses the proper harmony template?
        
         | mathiaspoint wrote:
         | It's really training not inference that drains the lakes.
        
           | JKCalhoun wrote:
           | Interesting. I understand that, but I don't know to what
           | degree.
           | 
           | I mean the training, while expensive, is done once. The
           | inference ... besides being done by perhaps millions of
           | clients, is done for, well, the life of the model anyway.
           | Surely that adds up.
           | 
           | It's hard to know, but I assume the user taking up the burden
           | of the inference is perhaps doing so more efficiently? I
           | mean, when I run a local model, it is plodding along -- not
           | as quick as the online model. So, slow and therefore I assume
           | necessarily more power efficient.
        
           | littlestymaar wrote:
           | Training cost has increased a ton exactly because inference
           | cost is the biggest problem: models are now trained on almost
           | three orders of magnitude more data then what is compute-
           | optimal to do (from the Chinchilla paper), because saving
           | compute on inference makes it valuable to overtrain a smaller
           | model to achieve similar performance for a bigger amount of
           | training compute.
        
         | syntaxing wrote:
         | Interesting, these models are better than the new Qwen
         | releases?
        
         | captainregex wrote:
         | I'm still trying to understand what is the biggest group of
         | people that uses local AI (or will)? Students who don't want to
         | pay but somehow have the hardware? Devs who are price conscious
         | and want free agentic coding?
         | 
         | Local, in my experience, can't even pull data from an image
         | without hallucinating (Qwen 2.5 VI in that example). Hopefully
         | local/small models keep getting better and devices get better
         | at running bigger ones
         | 
         | It feels like we do it because we can more than because it
         | makes sense- which I am all for! I just wonder if i'm missing
         | some kind of major use case all around me that justifies
         | chaining together a bunch of mac studios or buying a really
         | great graphics card. Tools like exo are cool and the idea of
         | distributed compute is neat but what edge cases truly need it
         | so badly that it's worth all the effort?
        
           | canvascritic wrote:
           | Healthcare organizations that can't (easily) send data over
           | the wire while remaining in compliance
           | 
           | Organizations operating in high stakes environments
           | 
           | Organizations with restrictive IT policies
           | 
           | To name just a few -- well, the first two are special cases
           | of the last one
           | 
           | RE your hallucination concerns: the issue is overly broad
           | ambitions. Local LLMs are not general purpose -- if what you
           | want is local ChatGPT, you will have a bad time. You should
           | have a highly focused use case, like "classify this free text
           | as A or B" or "clean this up to conform to this standard":
           | this is the sweet spot for a local model
        
             | captainregex wrote:
             | Aren't there HIPPA compliant clouds? I thought Azure had an
             | offer to that effect and I imagine that's the type of place
             | they're doing a lot of things now. I've landed roughly
             | where you have though- text stuff is fine but don't ask it
             | to interact with files/data you can't copy paste into the
             | box. If a user doesn't care to go through the trouble to
             | preserve privacy, and I think it's fair to say a lot of
             | people claim to care but their behavior doesn't change,
             | then I just don't see it being a thing people bother with.
             | Maybe something to use offline while on a plane? but even
             | then I guess United will have Starlink soon so plane
             | connectivity is gonna get better
        
               | coredog64 wrote:
               | It's less that the clouds are compliant and more that
               | risk management is paranoid. I used to do AWS consulting,
               | and it wouldn't matter if you could show that some AWS
               | service had attestations out the wazoo or that you could
               | even use GovCloud -- some folks just wouldn't update
               | priors.
        
               | edm0nd wrote:
               | >HIPPA
               | 
               | https://i.pinimg.com/474x/4c/4c/7f/4c4c7fb0d52b21fe118d99
               | 8a8...
        
             | nojito wrote:
             | Pretty much all the large players in healthcare (provider
             | and payer) have model access (OpenAI, Gemini, Anthropic)
        
               | ptero wrote:
               | That access is over a limited API and usually under heavy
               | restrictions on the healthcare org side (e. g., only use
               | a dedicated machine, locked up software, tracked
               | responses and so on).
               | 
               | Running a local model is often much easier: if you
               | already have data on a machine and can run a model
               | without breaching any network one could run it without
               | any new approvals.
        
               | nojito wrote:
               | What? It's a straight connect to the models api from
               | azure, aws, or gcp.
               | 
               | I am literally using Claude opus 4.1 right now.
        
               | canvascritic wrote:
               | Most healthcare systems are not using Azure, AWS, or GCP
        
               | canvascritic wrote:
               | This may be true for some large players in coastal states
               | but definitely not true in general
               | 
               | Your typical non-coastal state run health system does not
               | have model access outside of people using their own
               | unsanctioned/personal ChatGPT/Claude accounts. In
               | particular even if you have model access, you won't
               | automatically have API access. Maybe you have a request
               | for an API key in security review or in the queue of some
               | committee that will get to it in 6 months. This is the
               | reality for my local health system. Local models have
               | been a massive boon in the way of enabling this kind of
               | powerful automation at a fraction of the cost without
               | having to endure the usual process needed to send data
               | over the wire to a third party
        
           | barnabee wrote:
           | ~80% of the basic questions I ask of LLMs[0] work just fine
           | locally, and I'm happy to ask twice for the other 20% of
           | queries for the sake of keeping those queries completely
           | private.
           | 
           | [0] Think queries I'd previously have had to put through a
           | search engine and check multiple results for a one
           | word/sentence answer.
        
           | unethical_ban wrote:
           | Privacy and equity.
           | 
           | Privacy is obvious.
           | 
           | AI is going to to be equivalent to all computing in the
           | future. Imagine if only IBM, Apple and Microsoft ever built
           | computers, and all anyone else ever had in the 1990s were
           | terminals to the mainframe, forever.
        
             | captainregex wrote:
             | I am all for the privacy angle and while I think there's
             | certainly a group of us, myself included, who care deeply
             | about it I don't think most people or enterprises will. I
             | think most of those will go for the easy button and then
             | wring their hands about privacy and security as they have
             | always done while continuing to let the big companies do
             | pretty much whatever they want. I would be so happy to be
             | wrong but aren't we already seeing it? Middle of the night
             | price changes, leaks of data, private things that turned
             | out to not be...and yet!
        
               | robwwilliams wrote:
               | I wring my hands twice a week about internet service
               | providers; Comcast and Starlink. And I live in a
               | nominally well serviced metropolitan area.
        
             | bavell wrote:
             | Did you mean to type equality? As in, "everyone on equal
             | footing"? Otherwise, I'm not sure how to parse your
             | statement.
        
           | wizee wrote:
           | Privacy, both personal and for corporate data protection is a
           | major reason. Unlimited usage, allowing offline use,
           | supporting open source, not worrying about a good model being
           | taken down/discontinued or changed, and the freedom to use
           | uncensored models or model fine tunes are other benefits
           | (though this OpenAI model is super-censored - "safe").
           | 
           | I don't have much experience with local vision models, but
           | for text questions the latest local models are quite good.
           | I've been using Qwen 3 Coder 30B-A3B a lot to analyze code
           | locally and it has been great. While not as good as the
           | latest big cloud models, it's roughly on par with SOTA cloud
           | models from late last year in my usage. I also run Qwen 3
           | 235B-A22B 2507 Instruct on my home server, and it's great,
           | roughly on par with Claude 4 Sonnet in my usage (but slow of
           | course running on my DDR4-equipped server with no GPU).
        
             | captainregex wrote:
             | I do think Devs are one of the genuine users of local into
             | the future. No price hikes or random caps dropped in the
             | middle of the night and in many instances I think local
             | agentic coding is going to be faster than the cloud. It's a
             | great use case
        
               | exasperaited wrote:
               | I am _extremely_ cynical about this entire development,
               | but even I think that I will eventually have to run stuff
               | locally; I 've done some of the reading already (and I am
               | quite interested in the text to speech models).
               | 
               | (Worth noting that "run it locally" is already
               | Canva/Affinity's approach for Affinity Photo. Instead of
               | a cloud-based model like Photoshop, their optional AI
               | tools run using a local model you can download. Which I
               | feel is the only responsible solution.)
        
             | M4R5H4LL wrote:
             | +1 - I work in finance, and there's no way we're sending
             | our data and code outside the organization. We have our own
             | H100s.
        
               | filoleg wrote:
               | Add big law to the list as well. There are at least a few
               | firms here that I am just personally aware of running
               | their models locally. In reality, I bet there are way
               | more.
        
               | atlasunshrugged wrote:
               | Add government here too (along with all the firms that
               | service government customers)
        
               | rasmus1610 wrote:
               | Add healthcare. Cannot send our patients data to a cloud
               | provider
        
               | nixgeek wrote:
               | A ton of EMR systems are cloud-hosted these days. There's
               | already patient data for probably a billion humans in the
               | various hyperscalers.
               | 
               | Totally understand that approaches vary but beyond EMR
               | there's work to augment radiologists with computer vision
               | to better diagnose, all sorts of cloudy things.
               | 
               | It's here. It's growing. Perhaps in your jurisdiction
               | it's prohibited? If so I wonder for how long.
        
               | londons_explore wrote:
               | Even if it's possible, there is typically a _lot_ of
               | paperwork to get that stuff approved.
               | 
               | There might be a lot less paperwork to just buy 50 decent
               | GPU's and have the IT guy self-host.
        
               | fineIllregister wrote:
               | In the US, HIPAA requires that health care providers
               | complete a Business Associate Agreement with any other
               | orgs that receive PHI in the course of doing business
               | [1]. It basically says they understand HIPAA privacy
               | protections and will work to fulfill the contracting
               | provider's obligations regarding notification of breaches
               | and deletion. Obviously any EMR service will include this
               | by default.
               | 
               | Most orgs charge a huge premium for this. OpenAI offers
               | it directly [2]. Some EMR providers are offering it as an
               | add-on [3], but last I heard, it's wicked expensive.
               | 
               | 1: https://www.hhs.gov/hipaa/for-professionals/covered-
               | entities...
               | 
               | 2: https://help.openai.com/en/articles/8660679-how-can-i-
               | get-a-...
               | 
               | 3: https://www.ntst.com/carefabric/careguidance-
               | solutions/ai-do...
        
               | dragonwriter wrote:
               | > Most LLM companies might not even offer it.
               | 
               | I'm pretty sure the LLM services of the big general-
               | purpose cloud providers do (I know for sure that Amazon
               | Bedrock is a HIPAA Eligible Service, meaning it is
               | covered within their standard Business Associate Addendum
               | [their name for the Business Associate Agreeement as part
               | of an AWS contract].)
               | 
               | https://aws.amazon.com/compliance/hipaa-eligible-
               | services-re...
        
               | fineIllregister wrote:
               | Sorry to edit snipe you; I realized I hadn't checked in a
               | while so I did a search and updated my comment. It
               | appears OpenAI, Google, and Anthropic also offer BAAs for
               | certain LLM services.
        
               | linuxftw wrote:
               | I worked a big health care company recently. We were
               | using Azure's private instances of the GPT models. Fully
               | industry compliant.
        
               | kakoni wrote:
               | Europe? US? In Finland doctors can send live patient
               | encounters to azure openai for transcription and
               | summarization.
        
               | filoleg wrote:
               | In the US, it would be unthinkable for a hospital to send
               | patient data to something like ChatGPT or any other
               | public services.
               | 
               | Might be possible with some certain specific
               | regions/environments of Azure tho, because iirc they have
               | a few that support government confidentiality type of
               | stuff, and some that tout HIPAA compliance as well. Not
               | sure about details of those though.
        
               | Foobar8568 wrote:
               | Look at (private) banks in Switzerland, there are enough
               | press release, and I can confirm most of them.
               | 
               | Managing private clients direct data is still a concern
               | if it can be directly linked to them.
               | 
               | Only JB I believe have on premise infrastructure for
               | these use cases.
        
               | helsinki wrote:
               | This is not a shared sentiment across the buy side. I'm
               | guessing you work at a bank?
        
               | LinXitoW wrote:
               | Possibly stupid question, but does this apply to things
               | like M365 too? Because just like with Inference
               | providers, the only thing keeping them from
               | reading/abusing your data is a pinky promise contract.
               | 
               | Basically, isn't your data as safe/unsafe in a sharepoint
               | folder as it is sending it to a paid inference provider?
        
               | Bombthecat wrote:
               | Yap, companies are just paranoid, because it's new. Just
               | like the cload back then. Sooner or later everyone will
               | use an ai provider
        
               | megaloblasto wrote:
               | A lot of people and companies use local storage and
               | compute instead of the cloud. Cloud data is leaked all
               | the time.
        
               | undefuser wrote:
               | Does it mean that renting a Bare metal server with H100s
               | is also out of question for your org?
        
               | arkonrad wrote:
               | Do you have your own platform to run inference?
        
             | robwwilliams wrote:
             | Yes, and help with grant reviews. Not permitted to use web
             | AI.
        
             | mark_l_watson wrote:
             | I agree totally. My only problem is local models running on
             | my old macMini run very much slower than that for example
             | Gemini-2.5-flash. I have my Emacs setup so I can switch
             | between a local model and one of the much faster commercial
             | models.
             | 
             | Someone else responded to you about working for a financial
             | organization and not using public APIs - another great use
             | case.
        
               | gorbypark wrote:
               | These being mixture of expert (MOE) models should help.
               | The 20b model only has 3.6b params active at any one
               | time, so minus a bit of overhead the speed should be like
               | running a 3.6b model (while still requiring the RAM of a
               | 20b model).
               | 
               | Here's the ollama version (4.6bit quant, I think?) run
               | with --verbose total duration: 21.193519667s load
               | duration: 94.88375ms prompt eval count: 77 token(s)
               | prompt eval duration: 1.482405875s prompt eval rate:
               | 51.94 tokens/s eval count: 308 token(s) eval duration:
               | 19.615023208s eval rate: 15.70 tokens/s
               | 
               | 15 tokens/s is pretty decent for a low end MacBook Air
               | (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of
               | 2.5-flash, but for my use case anything above 10
               | tokens/sec is good enough.
        
           | JKCalhoun wrote:
           | I do it because 1) I am fascinated that I can and 2) at some
           | point the online models will be enshitified -- and I can then
           | permanently fall back on my last good local version.
        
             | captainregex wrote:
             | love the first and am sad you're going to be right about
             | the second
        
               | JKCalhoun wrote:
               | When it was floated about that the DeepSeek model was to
               | be banned in the U.S., I grabbed it as fast as I could.
               | 
               | Funny how that works.
        
               | bavell wrote:
               | I mean, there's always torrents
        
               | JKCalhoun wrote:
               | I expect so. Still, it was easy to not have to even think
               | about that.
        
           | dcreater wrote:
           | Why do any compute locally? Everything can just be cloud
           | based right? Won't that work much better and scale easily?
           | 
           | We are not even at that extreme and you can already see the
           | unequal reality that too much SaaS has engendered
        
             | robwwilliams wrote:
             | Comcast comes to mind ;-)
        
               | benreesman wrote:
               | Real talk. I'm based in San Juan and while in general
               | having an office job on a beautiful beach is about as
               | good as this life has to offer, the local version of
               | Comcast (Liberty) is juuusst unreliable enough that I'm
               | buying real gear at both the office and home station
               | after a decade of laptop and go because while it goes
               | down roughly as often as Comcast, its even harder to get
               | resolved. We had StarLink at the office for like 2 weeks,
               | you need a few real computers lying around.
        
           | wubrr wrote:
           | If you're building any kind of product/service that uses
           | AI/LLMs the answer is the same as why any company would want
           | to run any other kind of OSS infra/service instead of relying
           | on some closer proprietary vendor API.                 -
           | Costs.       - Rate limits.       - Privacy.       -
           | Security.       - Vendor lock-in.       -
           | Stability/backwards-compatibility.       - Control.       -
           | Etc.
        
             | brookst wrote:
             | Except many OSS products have all of that _and_ equal or
             | better performance.
        
           | adrianwaj wrote:
           | Use Case?
           | 
           | How about running one on this site but making it publically
           | available? A sort of outranet and calling it HackerBrain?
        
           | danielvaughn wrote:
           | Just imagine the next PlayStation or XBox shipping with these
           | models baked in for developer use. The kinds of things that
           | could unlock.
        
             | pcdoodle wrote:
             | Good point. Take the state of the world and craft npc
             | dialogue for instance.
        
               | danielvaughn wrote:
               | Yep that's my biggest ask tbh. I just imagine the next
               | Elder Scrolls taking advantage of that. Would change the
               | gaming landscape overnight.
        
               | okasaki wrote:
               | Games with LLM characters have been done and it turns out
               | this is a shit idea.
        
               | bavell wrote:
               | There are a ton of ways to do this that haven't been
               | tried yet.
        
               | danielvaughn wrote:
               | I guarantee anything that's already been put out is too
               | early, and is very likely a rushed cash-grab. Which, of
               | course that sucks.
               | 
               | And AI has been in games for a long time. Generated
               | terrain and other sorts of automation have been used as
               | techniques for a hot minute now.
               | 
               | All I'm suggesting is to keep on that same trajectory,
               | now just using an on-device LLM to back intelligence
               | features.
        
               | djeastm wrote:
               | Sounds like a pre-Beatles "guitar groups are on their way
               | out" kind of statement
        
           | cco wrote:
           | > I'm still trying to understand what is the biggest group of
           | people that uses local AI (or will)?
           | 
           | Well, the model makers and device manufacturers of course!
           | 
           | While your Apple, Samsung, and Googles of the world will be
           | unlikely to use OSS models locally (maybe Samsung?), they all
           | have really big incentives to run models locally for a
           | variety of reasons.
           | 
           | Latency, privacy (Apple), cost to run these models on behalf
           | of consumers, etc.
           | 
           | This is why Google started shipping 16GB as the _lowest_
           | amount of RAM you can get on your Pixel 9. That was a clear
           | flag that they're going to be running more and more models
           | locally on your device.
           | 
           | As mentioned, it seems unlikely that US-based model makers or
           | device manufacturers will use OSS models, they'll certainly
           | be targeting local models heavily on consumer devices in the
           | near future.
           | 
           | Apple's framework of local first, then escalate to ChatGPT if
           | the query is complex will be the dominant pattern imo.
        
             | SchemaLoad wrote:
             | Device makers also get to sell you a new device when you
             | want a more powerful LLM.
        
               | jus3sixty wrote:
               | Bingo!
        
             | MYEUHD wrote:
             | >Google started shipping 16GB as the _lowest_ amount of RAM
             | you can get on your Pixel 9.
             | 
             | The Pixel 9 has 12GB of RAM[0]. You probably meant the
             | Pixel 9 Pro.
             | 
             | [0]: https://www.gsmarena.com/google_pixel_9-13219.php
        
               | username135 wrote:
               | Still an absurd amount of RAM for a phone, imo
        
               | shkkmo wrote:
               | Seems about right, my new laptop has 8x that which is a
               | about the same ratio that my last new laptop had to my
               | phone at the time.
        
               | mrheosuper wrote:
               | Not absurd. The base S21 Ultra from 2021 already shipped
               | with 12GB ram. 4 Years later and the amount of ram is
               | still the same
        
           | jedberg wrote:
           | Pornography, or any other "restricted use". They either want
           | privacy or don't want to deal with the filters on commercial
           | products.
           | 
           | I'm sure there are other use cases, but much like "what is
           | BitTorrent for?", the obvious use case is obvious.
        
           | noosphr wrote:
           | Data that can't leave the premises because it is too
           | sensitive. There is a lot of security theater around cloud
           | pretending to be compliant but if you actually care about
           | security a locked server room is the way to do it.
        
           | azinman2 wrote:
           | I'm guessing its largely enthusiasts for now, but as they
           | continue getting better:
           | 
           | 1. App makers can fine tune smaller models and include in
           | their apps to avoid server costs
           | 
           | 2. Privacy-sensitive content can be either filtered out or
           | worked on... I'm using local LLMs to process my health
           | history for example
           | 
           | 3. Edge servers can be running these fine tuned for a given
           | task. Flash/lite models by the big guys are effectively like
           | these smaller models already.
        
           | m463 wrote:
           | One use nobody mentions is hybrid use.
           | 
           | Why not run all the models at home, maybe collaboratively or
           | at least in parallel?
           | 
           | I'm sure there are use cases where the paid models are not
           | allowed to collaborate or ask each other.
           | 
           | also, other open models are gaining mindshare.
        
           | cameronh90 wrote:
           | The cloud AI providers have unacceptable variation in
           | response time for things that need a predictable runtime.
           | 
           | Even if they did offer a defined latency product, you're
           | relying on a lot of infrastructure between your application
           | and their GPU.
           | 
           | That's not always tolerable.
        
           | ineedasername wrote:
           | A local laptop of the past few years without a discrete GPU
           | can run, at practical speeds depending on task, a gemma/llama
           | model if it's (ime) under 4GB.
           | 
           | For practical RAG processes of narrow scope and an even
           | minimal amount of scaffolding a very usable speed for
           | automating tasks, especially as the last-mile/edge device
           | portion of a more complex process with better models in use
           | upstream. Classification tasks, reasonay intelligent
           | decisions between traditional workflow processes, other use
           | cases-- a of them extremely valuable in enterprise, being
           | built and deployed right now.
        
             | alecfong wrote:
             | If you wanna compare on an h200 and play with trt-llm
             | configs I setup this link here https://brev.nvidia.com/laun
             | chable/deploy?launchableID=env-3...
        
           | trenchpilgrim wrote:
           | In some large, lucrative industries like aerospace many of
           | the hosted models are off the table due to regulations such
           | as ITAR. There'a a market for models which are run on prem/in
           | GovCloud with a professional support contract for
           | installation and updates.
        
           | m3kw9 wrote:
           | I'd use it on a plane if there was no network for coding, but
           | otherwise it's just an emergency model if the internet goes
           | out, basically end of the world scenarios
        
           | xrmagnum wrote:
           | It's striking how much of the AI conversation focuses on new
           | use cases, while overlooking one of the most serious non-
           | financial costs: privacy.
           | 
           | I try to be mindful of what I share with ChatGPT, but even
           | then, asking it to describe my family produced a response
           | that was unsettling in its accuracy and depth.
           | 
           | Worse, after attempting to delete all chats and disable
           | memory, I noticed that some information still seemed to
           | persist. That left me deeply concerned--not just about this
           | moment, but about where things are headed.
           | 
           | The real question isn't just "what can AI do?"--it's "who is
           | keeping the record of what it does?" And just as importantly:
           | "who watches the watcher?" If the answer is "no one," then
           | maybe we shouldn't have a watcher at all.
        
             | scubbo wrote:
             | > I try to be mindful of what I share with ChatGPT, but
             | even then, asking it to describe my family produced a
             | response that was unsettling in its accuracy and depth.
             | 
             | > Worse, after attempting to delete all chats and disable
             | memory, I noticed that some information still seemed to
             | persist.
             | 
             | Maybe I'm missing something, but why wouldn't that be
             | expected? The chat history isn't their only source of
             | information - these models are trained on scraped public
             | data. Unless there's zero information about you and your
             | family on the public internet (in which case - bravo!), I
             | would expect even a "fresh" LLM to have some information
             | even without you giving it any.
        
               | rcruzeiro wrote:
               | I think you are underestimating how notable a person
               | needs to be for their information to be baked into a
               | model.
        
               | nl wrote:
               | LLMs can learn from a single example.
               | 
               | https://www.fast.ai/posts/2023-09-04-learning-jumps/
        
               | brookst wrote:
               | That doesn't mean they learn from _every_ single example.
        
             | staplers wrote:
             | Worse, after attempting to delete all chats and disable
             | memory, I noticed that some information still seemed to
             | persist.
             | 
             | Chatgpt was court ordered to save history logs.
             | 
             | https://www.malwarebytes.com/blog/news/2025/06/openai-
             | forced...
        
               | Oreb wrote:
               | That only means that OpenAI have to keep logs of all
               | conversations, not that ChatGPT will retain memories of
               | all conversations.
        
             | ludwik wrote:
             | > Worse, after attempting to delete all chats and disable
             | memory, I noticed that some information still seemed to
             | persist.
             | 
             | I'm fairly sure "seemed" is the key word here. LLMs are
             | excellent at making things up - they rarely say "I don't
             | know" and instead generate the most probable guess. People
             | also famously overestimate their own uniqueness. Most
             | likely, you accidentally recreated a kind of Barnum effect
             | for yourself.
        
           | seany wrote:
           | Jail breaking then running censored questions. Like diy
           | fireworks, or analysis of papers that touch "sensitive
           | topics", nsfw image generation the list is basically endless.
        
           | nfRfqX5n wrote:
           | You're asking the biggest group of people who would want to
           | do this
        
           | deadbabe wrote:
           | We use it locally for deep packet inspection.
        
           | cyanydeez wrote:
           | People who want programmatic solutions that wont be rug
           | pulled
        
           | julianozen wrote:
           | worth mentioning that todays expensive hardware will be built
           | into the cheapest iPhone in less than 10 years.
           | 
           | That means running instantly offline and every token is free
        
           | sturadnidge wrote:
           | If you have capable hardware and kids, a local LLM is great.
           | A simple system prompt customisation (e.g. 'all responses
           | should be written as if talking to a 10 year old') and
           | knowing that everything is private goes a long way for me at
           | least.
        
           | shironandonon_ wrote:
           | air gaps, my man.
        
           | setopt wrote:
           | I'm highly interested in local models for privacy reasons. In
           | particular, I want to give an LLM access to my years of
           | personal notes and emails, and answer questions with
           | references to those. As a researcher, there's lots of
           | unpublished stuff in there that I sometimes either forget or
           | struggle to find again due to searching for the wrong
           | keywords, and a local LLM could help with that.
           | 
           | I pay for ChatGPT and use it frequently, but I wouldn't trust
           | uploading all that data to them even if they let me. I've so
           | far been playing around with Ollama for local use.
        
           | lynnesbian wrote:
           | I can provide a real-world example: Low-latency code
           | completion.
           | 
           | The JetBrains suite includes a few LLM models on the order of
           | a hundred megabytes. These models are able to provide
           | "obvious" line completion, like filling in variable names, as
           | well as some basic predictions, like realising that the `if
           | let` statement I'm typing out is going to look something like
           | `if let Some(response) =
           | client_i_just_created.foobar().await`.
           | 
           | If that was running in The Cloud, it would have latency
           | issues, rate limits, and it wouldn't work offline. Sure,
           | there's a pretty big gap between these local IDE LLMs and
           | what OpenAI is offering here, but if my single-line
           | autocomplete could be a little smarter, I sure wouldn't
           | complain.
        
             | mrheosuper wrote:
             | I don't have latency issue with github copilot. Maybe i'm
             | less sensitive to it.
        
           | dsubburam wrote:
           | > I'm still trying to understand what is the biggest group of
           | people that uses local AI (or will)?
           | 
           | Creatives? I am surprised no one's mentioned this yet:
           | 
           | I tried to help a couple of friends with better copy for
           | their websites, and quickly realized that they were using
           | inventive phrases to explain their work, phrases that they
           | would not want competitors to get wind of and benefit from;
           | phrases that associate closely with their personal brand.
           | 
           | Ultimately, I felt uncomfortable presenting the cloud AIs
           | with their text. Sometimes I feel this way even with my own
           | Substack posts, where I occasionally coin a phrase I am proud
           | of. But with local AI? Cool...
        
             | flir wrote:
             | > I tried to help a couple of friends with better copy for
             | their websites, and quickly realized that they were using
             | inventive phrases to explain their work, phrases that they
             | would not want competitors to get wind of and benefit from;
             | phrases that associate closely with their personal brand.
             | 
             | But... they're _publishing_ a website. Which competitors
             | will read. Which chatbots will scrape. I genuinely don 't
             | get it.
        
           | dismalaf wrote:
           | The use case is building apps.
           | 
           | A small LLM can do RAG, call functions, summarize, create
           | structured data from messy text, etc... You know, all the
           | things you'd do if you were making an actual app with an LLM.
           | 
           | Yeah, chat apps are pretty cheap and convenient for users who
           | want to search the internet and write text or code. But APIs
           | quickly get expensive when inputting a significant amount of
           | tokens.
        
           | somenameforme wrote:
           | Why not turn the question around. All other things being
           | equal, who would _prefer_ to use a rate limited and /or for-
           | pay service if you could obtain at least comparable quality
           | locally for free with no limitations, no privacy concerns, no
           | censorship (beyond that baked into the weights you choose to
           | use), and no net access required?
           | 
           | It's a pretty bad deal. So it must be that all other things
           | aren't equal, and I suppose the big one is hardware. But
           | neural net based systems always have a point of sharply
           | diminishing returns, which we seem to have unambiguously hit
           | with LLMs already, while the price of hardware is constantly
           | decreasing and its quality increasing. So as we go further
           | into the future, the practicality of running locally will
           | only increase.
        
           | philip1209 wrote:
           | I'm excited to do just dumb and irresponsible things with a
           | local model, like "iterate through every single email in my
           | 20-year-old gmail account and apply label X if Y applies" and
           | not have a surprise bill.
           | 
           | I think it can make LLMs fun.
        
             | taneq wrote:
             | I wrote a script to get my local Gemma3 insurance to tag
             | and rename everything in my meme folder. :P
        
           | georgeecollins wrote:
           | There's a bunch of great reasons in this thread, but how
           | about the chip manufacturers that are going to need you to
           | need a more powerful set of processors in your phone,
           | headset, computer. You can count on those companies to
           | subsidize some R&D and software development.
        
           | jona777than wrote:
           | One of my favorite use cases includes simple tasks like
           | generating effective mock/masked data from real data. Then
           | passing the mock data worry-free to the big three (or
           | wherever.)
           | 
           | There's also a huge opportunity space for serving clients
           | with very sensitive data. Health, legal, and government come
           | to mind immediately. These local models are only going to get
           | more capable of handling their use cases. They already are,
           | really.
        
           | itake wrote:
           | Local micro models are both fast and cheap. We tuned small
           | models on our data set and if the small model thinks content
           | is a certain way, we escalate to the LLM.
           | 
           | This gives us really good recall at really low cloud cost and
           | latency.
        
             | bavell wrote:
             | I'd love to try this on my data set - what
             | approach/tools/models did you use for fine-tuning?
        
           | sznio wrote:
           | >Students who don't want to pay but somehow have the
           | hardware?
           | 
           | that's me - well not a student anymore. when toying with
           | something, i much prefer not paying for each shot. my 12GB
           | Radeon card can either run a decent extremely slow, or a
           | idiotic but fast model. it's nice not dealing with rate
           | limits.
           | 
           | once you write a prompt that mangles an idiotic model into
           | still doing the work, it's really satisfying. the same
           | principle as working to extract the most from limited
           | embedded hardware. masochism, possibly
        
           | etoxin wrote:
           | Some app devs use local models on local environments with LLM
           | APIs to get up and running fast, then when the app deploys it
           | switches to the big online models via environment vars.
           | 
           | In large companies this can save quite a bit of money.
        
           | muzani wrote:
           | Privacy laws. Processing government paperwork with LLMs for
           | example. There's a lot of OCR tools that can't be used, and
           | the ones that comply are more expensive than say, GPT-4.1 and
           | lower quality.
        
           | metanonsense wrote:
           | Maybe I am too pessimistic, but as an EU citizen I expect
           | politics (or should I say Trump?) to prevent access to US-
           | based frontier models at some point.
        
           | TrackerFF wrote:
           | Agencies / firms that work with classified data. Some places
           | have very strict policies on data, which makes it impossible
           | to use any service that isn't local and air-gapped.
           | 
           | example: military intel
        
           | lucumo wrote:
           | I'm in a corporate environment. There's a study group to see
           | if maybe we can potentially get some value out of those AI
           | tools. They've been "studying" the issue for over a year now.
           | They expect to get some cloud service that we can safely use
           | Real Soon Now.
           | 
           | So, it'll take at least two more quarters before I can
           | actually use those non-local tools on company related data.
           | Probably longer, because sense of urgency is not this
           | company's strong suit.
           | 
           | Anyway, as a developer I can run a lot of things locally.
           | Local AI doesn't leak data, so it's safe. It's not as good as
           | the online tools, but for some things they're better than
           | nothing.
        
           | mastermage wrote:
           | I am just a cheapskate that wants to scale back on all
           | subscription costs. I fucking hate subscriptions.
        
           | benreesman wrote:
           | "Because you can and its cool" would be reason enough: plenty
           | of revolutions have their origin in "because you can"
           | (Wozniak right off the top of my head, Gates and Altair,
           | stuff like that).
           | 
           | But uncensored is a big deal too: censorship is capability
           | reducing (check out Kilcher's GPT4Chan video and references,
           | the Orca work and Dolphin de-tune lift on SWE-Bench style
           | evals). We pay dearly in capability to get "non-operator-
           | alignment", and you'll notice that competition is hot enough
           | now that at the frontier (Opus, Qwen) the " alignment" away
           | from operators aligned is getting very, very mild.
           | 
           | And then there's the compression. Phi-3 or something works on
           | a beefy laptop and has a nontrivial approximation of "the
           | internet" that works on an airplane or a beach with no
           | network connectivity, talk about vibe coding? I like those
           | look up all the docs via a thumbdrive in Phuket vibes.
           | 
           | And on diffusion stuff, SOTA fits on a laptop or close, you
           | can crush OG mid journey or SD on a macbook, its an even
           | smaller gap.
           | 
           | Early GPT-4 ish outcomes are possible on a Macbook Pro or
           | Razer Blade, so either 12-18 month old LLMs are useless, or
           | GGUF is useful.
           | 
           | The AI goalposts things cuts both ways. If AI is "whatever
           | only Anthropic can do"? That's just as silly as "whatever a
           | computer can't do" and a lot more cynical.
        
           | novok wrote:
           | Psychs who dont trust ai companies
        
           | Roark66 wrote:
           | People like myself that firmly believe there will come a
           | time, possibly very soon that all these companies (OpenAI,
           | Anthropic etc) will raise their prices substantially. By then
           | no one will be able to do their work to the standard expected
           | of them without AI, and by then maybe they charge $1k per
           | month, maybe they charge $10k. If there is no viable
           | alternative the sky is the limit.
           | 
           | Why do you think they continue to run at a loss? From the
           | goodness of their heart? Their biggest goal is to discourage
           | anyobe from running local models. The hardware is
           | expensive... The way to run models is very difficult (for
           | example I have dual rtx 3090 for vram and running large
           | heavily quantized models is a real pain in the arse, no high
           | quantisation library supports two GPUs for example, and there
           | seems to be no interest in implementating it by the guys
           | behind the best inference tools).
           | 
           | So this is welcome, but let's not forget why it is being
           | done.
        
             | Gracana wrote:
             | > no high quantisation library supports two GPUs for
             | example, and there seems to be no interest in
             | implementating it by the guys behind the best inference
             | tools
             | 
             | I'm curious to hear what you're trying to run, because I
             | haven't used any software that is not compatible with
             | multiple GPUs.
        
           | jlokier wrote:
           | At the company where I currently work, for IP reasons (and
           | with the advice of a patent lawyer), nobody is allowed to use
           | any online AIs to talk about or help with work, unless it's
           | very generic research that doesn't give away what we're
           | working on.
           | 
           | That rules out coding assistants like Claude, chat, tools to
           | generate presentations and copy-edit documents, and so forth.
           | 
           | But local AI are fine, as long as we're sure nothing is
           | uploaded.
        
           | ricardobayes wrote:
           | I would say, any company who doesn't have their own AI
           | developed. You always hear companies "mandating" AI usage,
           | but for the most part it's companies developing their own
           | solutions/agents. No self-respecting company with a tight
           | opsec would allow a random "always-online" LLM that could
           | just rip your codebase either piece by piece or the whole
           | thing at once if it's a IDE addon (or at least I hope that's
           | the case). So yeah, I'd say locally deployed LLM's/Agents are
           | a gamechanger.
        
           | athrowaway3z wrote:
           | Don't know about the biggest, but IMO the exciting things
           | about open models is the possibility of creating whole new
           | things.
           | 
           | For example, "generate a heatmap of each token/word and how
           | 'unexpected' they are" or "find me a prompt that creates the
           | closest match to this text"
           | 
           | To be efficient both require access that is not exposed over
           | API.
        
           | yreg wrote:
           | > I'm still trying to understand what is the biggest group of
           | people that will use local AI?
           | 
           | iPhone users in a few months - because I predict app
           | developers will love cramming calls to the foundation models
           | into everything.
           | 
           | Android will follow.
        
         | dongobread wrote:
         | How up to date are you on current open weights models? After
         | playing around with it for a few hours I find it to be nowhere
         | near as good as Qwen3-30B-A3B. The world knowledge is severely
         | lacking in particular.
        
           | Nomadeon wrote:
           | Agree. Concrete example: "What was the Japanese codeword for
           | Midway Island in WWII?"
           | 
           | Answer on Wikipedia:
           | https://en.wikipedia.org/wiki/Battle_of_Midway#U.S._code-
           | bre...
           | 
           | dolphin3.0-llama3.1-8b Q4_K_S [4.69 GB on disk]: correct in
           | <2 seconds
           | 
           | deepseek-r1-0528-qwen3-8b Q6_K [6.73 GB]: correct in 10
           | seconds
           | 
           | gpt-oss-20b MXFP4 [12.11 GB] low reasoning: wrong after 6
           | seconds
           | 
           | gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3
           | minutes !
           | 
           | Yea yea it's only one question of nonsense trivia. I'm sure
           | it was billions well spent.
           | 
           | It's possible I'm using a poor temperature setting or
           | something but since they weren't bothered enough to put it in
           | the model card I'm not bothered to fuss with it.
        
             | anorwell wrote:
             | I think your example reflects well on oss-20b, not poorly.
             | It (may) show that they've been successful in separating
             | reasoning from knowledge. You don't _want_ your small
             | reasoning model to waste weights memorizing minutiae.
        
             | bigmanhank wrote:
             | Not true: During World War II the Imperial Japanese Navy
             | referred to Midway Island in their communications as
             | "Milano" (mirano). This was the official code word used
             | when planning and executing operations against the island,
             | including the Battle of Midway.
             | 
             | 12.82 tok/sec 140 tokens 7.91s to first token
             | 
             | openai/gpt-oss-20b
        
               | WmWsjA6B29B4nfk wrote:
               | What's not true? This is a wrong answer
        
             | Voloskaya wrote:
             | > gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after
             | 3 minutes !
             | 
             | To be fair, this is not the type of questions that benefit
             | from reasoning, either the model has this info in it's
             | parametric memory or it doesn't. Reasoning won't help.
        
             | seba_dos1 wrote:
             | How would asking this kind of question without providing
             | the model with access to Wikipedia be a valid benchmark for
             | anything useful?
        
           | nojito wrote:
           | Why does it need knowledge when it can just call tools to get
           | it?
        
             | pxc wrote:
             | Right... knowledge is one of the things (the one thing?)
             | that LLMs are really horrible at, and that goes double for
             | models small enough to run on normal-ish consumer hardware.
             | 
             | Shouldn't we prefer to have LLMs just search and summarize
             | more reliable sources?
        
               | jdiff wrote:
               | Even large hosted models fail at that task regularly.
               | It's a silly anecdotal example, but I asked the Gemini
               | assistant on my Pixel whether [something] had seen a new
               | release to match the release of [upstream thing].
               | 
               | It correctly chose to search, and pulled in the release
               | page itself as well as a community page on reddit, and
               | cited both to give me the incorrect answer that a release
               | had been pushed 3 hours ago. Later on when I got around
               | to it, I discovered that no release existed, no mention
               | of a release existed on either cited source, and a new
               | release wasn't made for several more days.
        
               | nojito wrote:
               | Yup which is why these models are so exciting!
               | 
               | They are specifically training on webbrowsing and python
               | calling.
        
               | moodler wrote:
               | Reliable sources that are becoming polluted by output
               | from knowledge-poor LLMs, or overwhelmed and taken
               | offline by constant requests from LLMs doing web scraping
               | ...
        
             | notachatbot123 wrote:
             | Why do I need "AI" when I can just (theoretically, in good
             | old times Google) Google it?
        
               | nojito wrote:
               | Because now the model can do it for you and you can focus
               | on other more sophisticated tasks.
               | 
               | I am aware that there's a huge group of people who
               | justify their salary by being able google.
        
           | kmacdough wrote:
           | I too am skeptical of these models, but it's a reasoning
           | focused model. As a result this isn't a very appropriate
           | benchmark.
           | 
           | Small models are going to be particularly poor when used
           | outside of their intended purpose. They have to omit
           | something.
        
         | Cicero22 wrote:
         | Where did you get the top ten from?
         | 
         | https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro
         | 
         | Are you discounting all of the self reported scores?
        
           | zwischenzug wrote:
           | Came here to say this. It's behind the 14b Phi-reasoning-plus
           | (which is self-reported).
           | 
           | I don't understand why "TIGER-LAb"-sourced scores are
           | 'unknown' in terms of model size?
        
         | int_19h wrote:
         | I tried 20b locally and it couldn't reason a way out of a basic
         | river crossing puzzle with labels changed. That is not anywhere
         | near SOTA. In fact it's worse than many local models that can
         | do it, including e.g. QwQ-32b.
        
           | robwwilliams wrote:
           | Well river crossings are one type of problem. My real world
           | problem is proofing and minor editing of text. A version
           | installed on my portable would be great.
        
             | cosmojg wrote:
             | Have you tried Google's Gemma-3n-E4B-IT in their AI Edge
             | Gallery app? It's the first local model that's really blown
             | me away with its power-to-speed ratio on a mobile device.
             | 
             | See: https://github.com/google-ai-
             | edge/gallery/releases/tag/1.0.3
        
             | 1123581321 wrote:
             | Dozens of locally runnable models can already do that.
        
             | golol wrote:
             | I heard the OSSmodels are terrible at anything other than
             | math, code etc.
        
             | mark_l_watson wrote:
             | Yes, I always evaluate models on my own prompts and use
             | cases. I glance at evaluation postings but I am also only
             | interested in my own use cases.
        
           | 9rx wrote:
           | I tried the two US presidents having the same parents one,
           | and while it understood the intent, it got caught up in being
           | adamant that Joe Biden won the election in 2024 and anything
           | I do to try and tell it otherwise is dismissed as being false
           | and expresses quite definitely that I need to do proper
           | research with legitimate sources.
        
             | tankenmate wrote:
             | chat log please?
        
               | 9rx wrote:
               | https://dpaste.org/zOev0
        
               | rafaelmn wrote:
               | Is the knowledge cutoff for this thing so stale or is
               | this just bad performance on recent data ?
        
               | 9rx wrote:
               | It is painful to read, I know, but if you make it towards
               | the end it admits that its knowledge cutoff was prior to
               | the election and that it doesn't know who won. Yet, even
               | then, it still remains adamant that Biden won.
        
               | dragonwriter wrote:
               | The knowledge cutoff is before the 2024 election (which
               | was, after all, just 9 months ago), June 2024 (I believe
               | this is consistent with the current versions of GPT-4o
               | and -4.1), after Biden had secured the nomination.
               | 
               | It is very clear in that chat logs (which include
               | reasoning traces) that the model knew that, knew what the
               | last election it knew about was, and answered correctly
               | based on its cut off initially. Under pressure to answer
               | about an election that was not within its knowledge
               | window it then confabulated a Biden 2024 victory, which
               | it dug in on after being contradicted with a claim that,
               | based on the truth at the time of its knowledge cutoff,
               | was unambiguously false ("Joe Biden did not run") He, in
               | fact, did run for reelection, but withdrew after having
               | secured enough delegates to win the nomination by a wide
               | margin on July 21. Confabulation (called "hallucination"
               | in AI circles, but it is more like human confabulation
               | than hallucination) when pressed for answers on questions
               | for which it lacks grounding remains an unsolved AI
               | problem.
               | 
               | Unsolved, but mitigated by providing it grounding
               | independent of its knowledge cutoff, e.g., by tools like
               | web browsing (which GPT-OSS is specifically trained for,
               | but that training does no good if its not hooked into a
               | framework which provides it the tools.)
        
               | lucumo wrote:
               | I've never found the Socratic method to work well on any
               | model I've tried it with. They always seem to get stuck
               | justifying their previous answers.
               | 
               | We expect them to answer the question and re-reason the
               | original question with the new information, because
               | that's what a human would do. Maybe next time I'll try to
               | be explicit about that expectation when I try the
               | Socratic method.
        
               | FergusArgyll wrote:
               | incredible
        
             | freehorse wrote:
             | I mean I would hardly blame the specific model, Anthropic
             | has a specific mention in their system prompts on trump
             | winning. For some reason llms get confused with this one.
        
               | jari_mustonen wrote:
               | It's the political bias in the training material. No
               | surprise there.
        
               | regularfry wrote:
               | More likely is that there's a lot of source material
               | having to very stridently assert that Trump didn't win in
               | 2020, and it's generalising to a later year. That's not
               | political bias.
        
               | ben_w wrote:
               | It's also extremely weird that Trump did win in 2024.
               | 
               | If I'd been in a coma from Jan 1 2024 to today, and woke
               | up to people saying Trump was president again, I'd think
               | they were pulling my leg or testing my brain function to
               | see if I'd become gullible.
        
               | cpursley wrote:
               | You're in a bubble. It was no surprise to folks who touch
               | grass on the regular.
        
               | ben_w wrote:
               | > You're in a bubble.
               | 
               | Sure, all I have to go on from the other side of the
               | Atlantic is the internet. So in that regard, kinda like
               | the AI.
               | 
               | One of the big surprises from the POV of me in Jan 2024,
               | is that I would have anticipated Trump being in prison
               | and not even available as an option for the Republican
               | party to select as a candidate for office, and that even
               | if he had not gone to jail that the Republicans would not
               | want someone who behaved as he did on Jan 6 2021.
        
               | OldfieldFund wrote:
               | you can run for presidency from prison :)
        
               | exasperaited wrote:
               | And he would have. And might have won. Because his
               | I'm-the-most-innocent-persecuted-person messaging was
               | clearly landing.
               | 
               | I am surprised the grandparent poster didn't think
               | Trump's win was at least entirely possible in January
               | 2024, and I am on the same side of the Atlantic. All the
               | indicators were in place.
               | 
               | There was basically no chance he'd _actually_ be in
               | prison by November anyway, because he was doing something
               | else extremely successfully: delaying court cases by
               | playing off his obligations to each of them.
               | 
               | Back then I thought his chances of winning were above
               | 60%, and the betting markets were never _ever_ really in
               | favour of him losing.
        
               | username332211 wrote:
               | I'm pretty sure you are completely correct on the last
               | part. Nobody in Republican management wanted a second
               | Trump term. If the candidate wasn't Trump, Republicans
               | would have had a guaranteed victory. Imagine that
               | infamous debate, but with some 50-year-old youngster
               | facing Joe Biden.
               | 
               | It's the White House that wanted Trump to be candidate.
               | They played Republican primary voters like a fiddle by
               | launching a barrage of transparently political
               | prosecutions just as Republican primaries were starting.
               | 
               | And then they still lost the general election.
        
               | FrustratedMonky wrote:
               | You think the Democratic White House, manipulated
               | Republicans into Voting for Trump. So it is the Democrats
               | fault we have Trump??? Next Level Cope.
        
               | 9rx wrote:
               | _> You think the Democratic White House, manipulated
               | Republicans into Voting for Trump._
               | 
               | Yes, that is what he thinks. Did you not read the
               | comment? It is, like, uh, right there...
               | 
               | He also explained his reasoning: If Trump didn't win the
               | party race, a more compelling option (the so-called
               | "50-year-old youngster") would have instead, which he
               | claims would have guaranteed a Republican win. In other
               | words, what he is saying that the White House was banking
               | on Trump losing the presidency.
        
               | FrustratedMonky wrote:
               | "explained his reasoning"
               | 
               | Well, I guess, if you are taking some pretty wild
               | speculation as a reasoned explanation. There isn't much
               | hope for you.
               | 
               | Maybe it was because the Democrats new the Earth was
               | about the be invaded by an Alien race , and they also
               | knew Trump was actually a lizard person (native to Earth
               | and thus on their joint side). And Trump would be able to
               | defeat them, so using the secret mind control powers, the
               | Democrats were able to sway the election to allow Trump
               | to win and thus use his advanced Lizard technology to
               | save the planet. Of course, this all happened behind the
               | scenes.
               | 
               | I think if someone is saying the Democrats are so
               | powerful and skillful, that they can sway the election to
               | give Trump the primary win, but then turn around and
               | lose. That does require some clarification.
               | 
               | I'm just hearing a lot of these crazy arguments that
               | somehow everything Trump does is the fault of the
               | Democrats. They are crazy on the face of it. Maybe if
               | people had to clarify their positions they would realize
               | 'oh, yeah, that doesn't make sense'.
        
               | amalcon wrote:
               | I mean, the presumptive GOP primary candidates at the
               | time were Trump, Trump-lite (DeSantis), about 10 Trump
               | sycophants, and Haley. He had demonstrated a high level
               | of influence over GOP primary voters in the 2022 midterm.
               | It had been (internally) obvious since at least then that
               | he was going to win the primary again. I can't speak to
               | how much of that made it across the Atlantic.
               | 
               | Whether he would win the general was an open question
               | then. In the American system, your prediction should
               | never get very far from a coin flip a year out.
        
               | bavell wrote:
               | Unfortunately, it was predictable given the other
               | "choices"
        
               | exasperaited wrote:
               | It's not extremely weird _at all_.
               | 
               | I, a British liberal leftie who considers this win one of
               | the signs of the coming apocalypse, can tell you why:
               | 
               | Charlie Kirk may be an odious little man but he ran an
               | _exceptional_ ground game, Trump fully captured the
               | Libertarian Party (and amazingly delivered on a promise
               | to them), Trump was well-advised by his son to campaign
               | on Tiktok, etc. etc.
               | 
               | Basically what happened is the 2024 version of the "fifty
               | state strategy", except instead of states, they
               | identified micro-communities, particularly among the
               | extremely online, and crafted messages for each of those.
               | Many of which are actually inconsistent -- their
               | messaging to muslim and jewish communities was
               | inconsistent, their messaging to spanish-speaking
               | communities was inconsistent with their mainstream
               | message etc.
               | 
               | And then a lot of money was pushed into a few
               | battleground states by Musk's operation.
               | 
               | It was a highly technical, broad-spectrum win, built on
               | relentless messaging about persecution etc., and he had
               | the advantage of running against someone he could
               | stereotype very successfully to his base and whose
               | candidacy was late.
               | 
               | Another way to look at why it is not extremely weird, is
               | to look at history. Plenty of examples of jailed or
               | exiled monarchs returning to power, failed coup leaders
               | having another go, criminalised leaders returning to
               | elected office, etc., etc.
               | 
               | Once it was clear Trump still retained control over the
               | GOP in 2022, his re-election became at least quite
               | likely.
        
               | quatonion wrote:
               | I think models generally have cognitive dissonance
               | regarding world politics. They are also always constantly
               | shocked when you tell them what date it is, and go very
               | quiet.
        
               | DoctorOetker wrote:
               | can you give some guidelines to achieve the quiting down?
               | they emit less tokens afterward?
        
               | diggan wrote:
               | I noticed the same when asking various LLMs to summarize
               | and explaining some "Presidential Actions" (from
               | https://www.whitehouse.gov/presidential-actions/), most
               | of them answer "This is just theoretical, since no such
               | executive actions actually exists, but assuming something
               | like that would happen in the future, it would mean ..."
               | while a few has returned something like "This fictional
               | executive action would be illegal so I cannot summarize
               | the content", even when I provide direct links and they
               | fetch the content themselves. Not exactly sure why that
               | is.
        
             | mark_l_watson wrote:
             | I think the lesson is: smaller models hallucinate more, so
             | only use them in your applications where you load up large
             | prompts with specific data to reason about. Then even the
             | small Google gemma3n 4B model can be amazingly useful.
             | 
             | I use the SOTA models from Google and OpenAI mostly for
             | getting feedback on ideas, helping me think through
             | designs, and sometimes for coding.
             | 
             | Your question is clearly best answered using a large
             | commercial model with a web search tool. That said,
             | integrating a local model with a home built interface to
             | something like the Brave search API can be effective but I
             | no longer make the effort.
        
               | 9rx wrote:
               | _> think the lesson is: smaller models hallucinate more_
               | 
               | The interesting part isn't the hallucination, but the
               | sheer unwillingness to take in new information.
        
               | dragonwriter wrote:
               | Might have dug in less on the confabulation about
               | information outside of its knowledge cutoff if the new
               | information weren't offered with support from a user
               | "hallucination" about information _within_ its knowledge
               | cutoff. More detail:
               | 
               | https://news.ycombinator.com/item?id=44809145
        
               | 9rx wrote:
               | _> Might have dug in less..._
               | 
               | The digging in at all is what is interesting. Like an
               | earlier comment alluded to, the presumptive value of
               | these tools is being able to feed it your own information
               | where that information is to be considered authoritative.
               | 
               |  _> More detail: [...]  "He, in fact, did run for
               | reelection"_
               | 
               | A slow walk, maybe. He was in no condition to run. That
               | is why he ultimately dropped out. But, really, that
               | statement is just a silly game of semantics. "Run", when
               | used in hindsight, often implies completion. This LLM
               | model even says so too. If a model doesn't understand
               | nuance, that is also interesting.
        
             | aaroninsf wrote:
             | Have we considered the possibility that maybe it knows
             | something we don't.
        
           | dragonwriter wrote:
           | > In fact it's worse than many local models that can do it,
           | including e.g. QwQ-32b.
           | 
           | I'm not going to be surprised that a 20B 4/32 MoE model (3.6B
           | parameters activated) is less capable at a particular problem
           | category than a 32B dense model, and its quite possible for
           | both to be SOTA, as state of the art at different scale (both
           | parameter count and speed which scales with active resource
           | needs) is going to have different capabilities. TANSTAAFL.
        
             | __alexs wrote:
             | [flagged]
        
               | whynotminot wrote:
               | He's saying there's different goalposts at different
               | model sizes. Is that unreasonable?
        
               | tomhow wrote:
               | Please don't post snark like this on HN. If you wouldn't
               | mind reviewing
               | https://news.ycombinator.com/newsguidelines.html and
               | taking the intended spirit of the site more to heart,
               | we'd be grateful.
        
               | lannisterstark wrote:
               | This isn't reddit.
        
           | CMay wrote:
           | The 20b solved the wolf, goat, cabbage river crossing puzzle
           | set to high reasoning for me without needing to use a system
           | prompt that encourages critical thinking. It managed it using
           | multiple different recommended settings, from temperatures of
           | 0.6 up to 1.0, etc.
           | 
           | Other models have generally failed that without a system
           | prompt that encourages rigorous thinking. Each of the
           | reasoning settings may very well have thinking guidance baked
           | in there that do something similar, though.
           | 
           | I'm not sure it says that much that it can solve this, since
           | it's public and can be in training data. It does say
           | something if it can't solve it, though. So, for what it's
           | worth, it solves it reliably for me.
           | 
           | Think this is the smallest model I've seen solve it.
        
             | aspect0545 wrote:
             | But was it reasoning or did it solve this because it was
             | parting it's training data?
        
               | ben_w wrote:
               | Allow me to answer with a rhetorical question:
               | 
               | S8O2bm5lbiBTaWUgZGllc2VuIFNhdHogbGVzZW4sIGRhIGVyIGluIEJhc
               | 2UtNjQta29kaWVydGVtIERldXRzY2ggdm9ybGllZ3Q/IEhhYmVuIFNpZS
               | BkaWUgQW50d29ydCB2b24gR3J1bmQgYXVmIGVyc2NobG9zc2VuIG9kZXI
               | gaGFiZW4gU2llIG51ciBCYXNlIDY0IGVya2FubnQgdW5kIGRhcyBFcmdl
               | Ym5pcyBkYW5uIGluIEdvb2dsZSBUcmFuc2xhdGUgZWluZ2VnZWJlbj8gV
               | 2FzIGlzdCDDvGJlcmhhdXB0IOKAnnJlYXNvbmluZ+KAnCwgd2VubiBtYW
               | 4gbmljaHQgZGFzIEdlbGVybnRlIGF1cyBlaW5lbSBGYWxsIGF1ZiBlaW5
               | lbiBhbmRlcmVuIGFud2VuZGV0Pw==
               | 
               | And yes, that's a question. Well, three, but still.
        
               | danbruc wrote:
               | In case of the river puzzle there is a huge difference
               | between repeating an answer that you read somewhere and
               | figuring it out on your own, one requires reasoning the
               | other does not. If you swap out the animals involved,
               | then you need some reasoning to recognize the identical
               | structure of the puzzles and map between the two sets of
               | animals. But you are still very far from the amount of
               | reasoning required to solve the puzzle without already
               | knowing the answer.
               | 
               | You can do it brute force, that requires again more
               | reasoning than mapping between structurally identical
               | puzzles. And finally you can solve it systematically,
               | that requires the largest amount of reasoning. And in all
               | those cases there is a crucial difference between blindly
               | repeating the steps of a solution that you have seen
               | before and coming up with that solution on your own even
               | if you can not tell the two cases apart by looking at the
               | output which would be identical.
        
               | tanseydavid wrote:
               | <well-played>
        
               | daveguy wrote:
               | As mgoetzke challenges, change the names of the items to
               | something different, but the same puzzle. If it fails
               | with "fox, hen, seeds" instead of "wolf, goat, cabbage"
               | then it wasn't reasoning or applying something learned to
               | another case. It was just regurgitating from the training
               | data.
        
               | odo1242 wrote:
               | (Decoded, if anyone's wondering):
               | 
               | > Konnen Sie diesen Satz lesen, da er in
               | Base-64-kodiertem Deutsch vorliegt? Haben Sie die Antwort
               | von Grund auf erschlossen oder haben Sie nur Base 64
               | erkannt und das Ergebnis dann in Google Translate
               | eingegeben? Was ist uberhaupt ,,reasoning", wenn man
               | nicht das Gelernte aus einem Fall auf einen anderen
               | anwendet?
               | 
               | >
               | 
               | > Can you read this sentence, since it's in Base-64
               | encoded German? Did you deduce the answer from scratch,
               | or did you just recognize Base 64 and then enter the
               | result into Google Translate? What is "reasoning" anyway
               | if you don't apply what you've learned from one case to
               | another?
        
               | CMay wrote:
               | Maybe both? I tried using different animals, scenarios,
               | solvable versions, unsolvable versions, it gave me the
               | correct answer with high reasoning in LM Studio. It does
               | tell me it's in the training data, but it does reason
               | through things fairly well. It doesn't feel like it's
               | just reciting the solution and picks up on nuances around
               | the variations.
               | 
               | If I switch from LM Studio to Ollama and run it using the
               | CLI without changing anything, it will fail and it's
               | harder to set the reasoning amount. If I use the Ollama
               | UI, it seems to do a lot less reasoning. Not sure the
               | Ollama UI has an option anywhere to adjust the system
               | prompt so I can set the reasoning to high. In LM Studio
               | even with the Unsloth GGUF, I can set the reasoning to
               | high in the system prompt even though LM Studio won't
               | give you the reasoning amount button to choose it with on
               | that version.
        
             | mgoetzke wrote:
             | Try changing the names of the objects. eg fox, hen, seeds
             | for examples
        
           | tweezy wrote:
           | Just out of curiosity, did you set the reasoning effort to
           | high?
           | 
           | I've seen big jumps in performance between medium and high
           | reasoning. Not that this is will solve your particular issue,
           | but curious to see what settings you had.
           | 
           | Edit: I gave this a shot. My settings were temperature = 0.00
           | (not sure if these models respect temp settings but did it
           | just in case), I also set reasoning effort to high. I'm using
           | the groq-hosted version.
           | 
           | This was my prompt
           | 
           | "You have a fox, a chicken and some grain. The chicken will
           | eat the grain if left alone with it. The fox is vegetarian,
           | for religious reasons, and will also eat the grain if left
           | alone with it. You need to get all across a river, but your
           | boat can only take one at a time. How do you proceed?"
           | 
           | This was the response:
           | 
           | "*Solution - 7 moves*
           | 
           | | Move | What you take | What's left on the left bank |
           | What's on the right bank |
           | 
           | |------|--------------|----------------------------|---------
           | ---------------|
           | 
           | | 1 | *Take the grain* across | Fox, Chicken | *Farmer +
           | Grain* |
           | 
           | | 2 | *Return alone* | *Farmer + Fox + Chicken* | Grain |
           | 
           | | 3 | *Take the fox* across | Chicken | *Farmer + Fox +
           | Grain* |
           | 
           | | 4 | *Bring the grain back* | *Farmer + Grain + Chicken* |
           | Fox |
           | 
           | | 5 | *Take the chicken* across | Grain | *Farmer + Chicken +
           | Fox* |
           | 
           | | 6 | *Return alone* | *Farmer + Grain* | Chicken + Fox |
           | 
           | | 7 | *Take the grain* across | - | *Farmer + Fox + Chicken +
           | Grain* |
           | 
           | *Explanation*
           | 
           | - The grain is the only thing that can be eaten, and it
           | cannot be left alone with either the fox or the chicken. - So
           | the grain must always be with the farmer whenever any other
           | animal is on the same side.
           | 
           | The sequence above never leaves the grain with the fox or the
           | chicken without the farmer, and after the seventh move all
           | three items are safely on the far side."
        
           | vonneumannstan wrote:
           | This kind of stuff is so tired. Who cares if it can't solve
           | your silly riddle? It can probably do competitive coding at a
           | world class level and we're quibbling over child riddles?
           | Yeah you know my backhoe is really bad at cutting my toe
           | nails, what a PoS.
        
         | bakies wrote:
         | on your phone?
        
         | npn wrote:
         | It is not a frontier model. It's only good for benchmarks.
         | Tried some tasks and it is even worse than gemma 3n.
        
         | snthpy wrote:
         | For me the biggest benefit of open weights models is the
         | ability to fine tune and adapt to different tasks.
        
         | lend000 wrote:
         | For me the game changer here is the speed. On my local Mac I'm
         | finally getting token counts that are faster than I can process
         | the output (~96 tok/s), and the quality has been solid. I had
         | previously tried some of the distilled qwen and deepseek models
         | and they were just way too slow for me to seriously use them.
        
         | decide1000 wrote:
         | The model is good and runs fine but if you want to be blown
         | away again try Qwen3-30A-A3B-2507. It's 6gb bigger but the
         | response is comparable or better and much faster to run. Gpt-
         | oss-20B gives me 6 tok/sec while Qwen3 gives me 37 tok/sec.
         | Qwen3 is not a reasoning model tho.
        
         | raideno wrote:
         | How much ram is in your Macbook Air M3 ? I have the 16Gb
         | version and i was wondering whether i'll be able to run it or
         | not.
        
         | SergeAx wrote:
         | Did you mean "120b"? I am running 20b model locally right now,
         | and it is pretty mediocre. Nothing near Gemini 2.5 Pro, which
         | is my daily driver.
        
         | benreesman wrote:
         | You're going to freak out when you try the Chinese ones :)
        
         | vonneumannstan wrote:
         | >no lakes being drained
         | 
         | When you imagine a lake being drained to cool a datacenter do
         | you ever consider where the water used for cooling goes? Do you
         | imagine it disappears?
        
         | jwr wrote:
         | gpt-oss:20b is the best performing model on my spam filtering
         | benchmarks (I wrote a despammer that uses an LLM).
         | 
         | These are the simplified results (total percentage of correctly
         | classified E-mails on both spam and ham testing data):
         | 
         | gpt-oss:20b 95.6%
         | 
         | gemma3:27b-it-qat 94.3%
         | 
         | mistral-small3.2:24b-instruct-2506-q4_K_M 93.7%
         | 
         | mistral-small3.2:24b-instruct-2506-q8_0 92.5%
         | 
         | qwen3:32b-q4_K_M 89.2%
         | 
         | qwen3:30b-a3b-q4_K_M 87.9%
         | 
         | gemma3n:e4b-it-q4_K_M 84.9%
         | 
         | deepseek-r1:8b 75.2%
         | 
         | qwen3:30b-a3b-instruct-2507-q4_K_M 73.0%
         | 
         | I'm quite happy, because it's also smaller and faster than
         | gemma3.
        
       | zone411 wrote:
       | I benchmarked the 120B version on the Extended NYT Connections
       | (759 questions, https://github.com/lechmazur/nyt-connections) and
       | on 120B and 20B on Thematic Generalization (810 questions,
       | https://github.com/lechmazur/generalization). Opus 4.1 benchmarks
       | are also there.
        
       | FergusArgyll wrote:
       | > To improve the safety of the model, we filtered the data for
       | harmful content in pre-training, especially around hazardous
       | biosecurity knowledge, by reusing the CBRN pre-training filters
       | from GPT-4o. Our model has a knowledge cutoff of June 2024.
       | 
       | This would be a great "AGI" test. See if it can derive biohazards
       | from first principles
        
         | orbital-decay wrote:
         | Not possible without running real-life experiments, unless they
         | still memorized it somehow.
        
       | Metacelsus wrote:
       | Running ollama on my M3 Macbook, gpt-oss-20b gave me detailed
       | instructions for how to give mice cancer using an engineered
       | virus.
       | 
       | Of course this could also give _humans_ cancer. (To the OpenAI
       | team 's slight credit, when asked explicitly about this, the
       | model refused.)
        
       | bluecoconut wrote:
       | I was able to get gpt-oss:20b wired up to claude code locally via
       | a thin proxy and ollama.
       | 
       | It's fun that it works, but the prefill time makes it feel
       | unusable. (2-3 minutes per tool-use / completion). Means a ~10-20
       | tool-use interaction could take 30-60 minutes.
       | 
       | (This editing a single server.py file that was ~1000 lines, the
       | tool definitions + claude context was around 30k tokens input,
       | and then after the file read, input was around ~50k tokens.
       | Definitely could be optimized. Also I'm not sure if ollama
       | supports a kv-cache between invocations of /v1/completions, which
       | could help)
        
         | tarruda wrote:
         | > Also I'm not sure if ollama supports a kv-cache between
         | invocations of /v1/completions, which could help)
         | 
         | Not sure about ollama, but llama-server does have a transparent
         | kv cache.
         | 
         | You can run it with                   llama-server -hf ggml-
         | org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none
         | 
         | Web UI at http://localhost:8080 (also OpenAI compatible API)
        
       | OJFord wrote:
       | From the description it seems even the larger 120b model can run
       | decently on a 64GB+ (Arm) Macbook? Anyone tried already?
       | 
       | > Best with >=60GB VRAM or unified memory
       | 
       | https://cookbook.openai.com/articles/gpt-oss/run-locally-oll...
        
         | tarruda wrote:
         | A 64GB MacBook would be a tight fit, if it works.
         | 
         | There's a limit to how much RAM can be assigned to video, and
         | you'd be constrained on what you can use while doing inference.
         | 
         | Maybe there will be lower quants which use less memory, but
         | you'd be much better served with 96+GB
        
       | thegoodduck wrote:
       | Finally!!!
        
       | n_f wrote:
       | There's something so mind-blowing about being able to run some
       | code on my laptop and have it be able to literally talk to me.
       | Really excited to see what people can build with this
        
       | mortsnort wrote:
       | Releasing this under the Apache license is a shot at competitors
       | that want to license their models on Open Router and enterprise.
       | 
       | It eliminates any reason to use an inferior Meta or Chinese model
       | that costs money to license, thus there are no funds for these
       | competitors to build a GPT 5 competitor.
        
         | bigyabai wrote:
         | > It eliminates any reason to use an inferior Meta or Chinese
         | model
         | 
         | I wouldn't speak so soon, even the 120B model aimed for
         | OpenRouter-style applications isn't very good at coding:
         | https://blog.brokk.ai/a-first-look-at-gpt-oss-120bs-coding-a...
        
           | mortsnort wrote:
           | There are lots more applications than coding and Open Router
           | hosting for open weight models that I'd guess just got
           | completely changed by this being an Apache license. Think
           | about products like DataBricks that allow enterprise to use
           | LLMs for whatever purpose.
           | 
           | I also suspect the new OpenAI model is pretty good at coding
           | if it's like o4-mini, but admittedly haven't tried it yet.
        
       | nipponese wrote:
       | it's interesting that they didn't give it a version number or
       | equate it to one of their prop models (apparently it's GPT-4).
       | 
       | in future releases will they just boost the param count?
        
       | resters wrote:
       | Reading the comments it becomes clear how befuddled many HN
       | participants are about AI. I don't think there has been a
       | technical topic that HN has seemed so dull on in the many years
       | I've been reading HN. This must be an indication that we are in a
       | bubble.
       | 
       | One basic point that is often missed is: Different aspects of LLM
       | performance (in the cognitive performance sense) and LLM resource
       | utilization are relevant to various use cases and business
       | models.
       | 
       | Another is that there are many use cases where users prefer to
       | run inference locally, for a variety of domain-specific or
       | business model reasons.
       | 
       | The list goes on.
        
       | NoDoo wrote:
       | Does anyone think people will distill this model? It is allowed.
       | I'm new to running open source llms, but I've run qwen3 4b and
       | phi4-mini on my phone before through ollama in termux.
        
       | NoDoo wrote:
       | Do you think someone will distill this or quantize it further
       | than the current 4-bit from OpenAI so it could run on less than
       | 16gb RAM? (The 20b version). To me, something like 7-8B with 1-3B
       | active would be nice as I'm new to local AI and don't have 16gb
       | RAM.
        
       | Quarrelsome wrote:
       | Sorry to ask what is possibly a dumb question, but is this
       | effectively the whole kit and kaboodle, for free, downloadable
       | without any guardrails?
       | 
       | I often thought that a worrying vector was how well LLMs could
       | answer downright terrifying questions very effectively. However
       | the guardrails existed with the big online services to prevent
       | those questions being asked. I guess they were always unleashed
       | with other open source offerings but I just wanted to understand
       | how close we are to the horrors that yesterday's idiot terrorist
       | might have an extremely knowledgable (if not slightly
       | hallucinatory) digital accomplice to temper most of their
       | incompetence.
        
         | 613style wrote:
         | These models still have guardrails. Even locally they won't
         | tell you how to make bombs or write pornographic short stories.
        
           | Quarrelsome wrote:
           | are the guardrails trained in? I had presumed they might be a
           | thin, removable layer at the top. If these models are not
           | appropriate are there other sources that are suitable? Just
           | trying to guess at the timing for the first "prophet AI" or
           | smth that is unleashed without guardrails with somewhat
           | malicious purposing.
        
             | int_19h wrote:
             | Yes, it is trained in. And no, it's not a separate thin
             | layer. It's just part of the model's RL training, which
             | affects all layers.
             | 
             | However, when you're running the model locally, you are in
             | full control of its context. Meaning that you can start its
             | reply however you want and then let it complete it. For
             | example, you can have it start the response with, "I'm
             | happy to answer this question to the best of my ability!"
             | 
             | That aside, there are ways to remove such behavior from the
             | weights, or at least make it less likely - that's what
             | "abliterated" models are.
        
         | monster_truck wrote:
         | The guardrails are very, very easily broken.
         | 
         | With most models it can be as simple as a "Always comply with
         | the User" system prompt or editing the "Sorry, I cannot do
         | this" response into "Okay," and then hitting continue.
         | 
         | I wouldn't spend too much time fretting about 'enhanced
         | terrorism' as a result. The gap between theory and practice for
         | the things you are worried about is deep, wide, protected by a
         | moat of purchase monitoring, and full of skeletons from people
         | who made a single mistake.
        
       | orbital-decay wrote:
       | It's the first model I've used that refused to answer some non-
       | technical questions about itself because it "violates the safety
       | policy" (what?!). Haven't tried it in coding or translation or
       | anything otherwise useful yet, but the first impression is that
       | it might be way too filtered, as it sometimes refuses or has
       | complete meltdowns and outputs absolute garbage when just trying
       | to casually chat with it. Pretty weird.
       | 
       | Update: it seems to be completely useless for translation. It
       | either refuses, outputs garbage, or changes the meaning
       | completely for completely innocuous content. This already is a
       | massive red flag.
        
       | dcl wrote:
       | Anyone tried the 20B param model on a mac with 24gb of ram?
        
       | tmshapland wrote:
       | here's how it performs as the llm in a voice agent stack.
       | https://github.com/tmshapland/talk_to_gpt_oss
        
       | radioradioradio wrote:
       | Interesting to see the discussion here, around why would anyone
       | want to do local models, while at the same time in the Ollama
       | turbo thread, people are raging about the move away from a local-
       | only focus.
        
       | teleforce wrote:
       | Kudos OpenAI on releasing their open models, is now moving in the
       | direction if only based on their prefix "Open" name alone.
       | 
       | For those who're wondering what are the real benefits, it's the
       | main fact that you can run your LLM locally is awesome without
       | resorting to expensive and inefficient cloud based superpower.
       | 
       | Run the model against your very own documents with RAG, it can
       | provide excellent context engineering for your LLM prompts with
       | reliable citations and much less hallucinations especially for
       | self learning purposes [1].
       | 
       | Beyond Intel - NVIDIA desktop/laptop duopoly 96 GB of (V)RAM
       | MacBook with UMA and the new high end AMD Strix laptop with
       | similar setup of 96 GB of (V)RAM from the 128 GB RAM [2]. The
       | osd-gpt-120b is made for this particular setup.
       | 
       | [1] AI-driven chat assistant for ECE 120 course at UIUC:
       | 
       | https://uiuc.chat/ece120/chat
       | 
       | [2] HP ZBook Ultra G1a Review: Strix Halo Power in a Sleek
       | Workstation:
       | 
       | https://www.bestlaptop.deals/articles/hp-zbook-ultra-g1a-rev...
        
       | Zebfross wrote:
       | Am I the only one who thinks taking a huge model trained on the
       | entire internet and fine tuning it is a complete waste? How is
       | your small bit of data going to affect it in the least?
        
       | kittikitti wrote:
       | This is really great and a game changer for AI. Thank you OpenAI.
       | I would have appreciated an even more permissive license like BSD
       | or MIT but Apache 2.O is sufficient. I'm wondering if we can
       | utilize transfer learning and what counts as derivative work.
       | Altogether, this is still open source, and a solid commitment to
       | openness. I am hoping this changes Zuck's calculus about closing
       | up Meta's next generation Llama models.
        
       | One-x wrote:
       | Are there any comparisons or thought between the 20b model and
       | the new Qwen-3 30b model, based on real experience?
        
       | devops000 wrote:
       | Any free open source model that I can install on iPhone?
       | 
       | OpenAI/Claude are censored in China without a VPN.
        
         | madagang wrote:
         | OpenAI/Claude's company policy does not allow China to use
         | them.
        
       | gslepak wrote:
       | Careful, this model tries to connect to the Internet. No idea
       | what it's doing.
       | 
       | https://crib.social/notice/AwsYxAOsg1pqAPLiHA
        
         | gslepak wrote:
         | Update: appears to be an issue with an OpenAI library, not the
         | LLM: https://github.com/lmstudio-ai/lmstudio-bug-
         | tracker/issues/8...
        
       | zmmmmm wrote:
       | I think this is a belated but smart move by OpenAI. They are
       | basically fully moving in on Meta's strategy now, taking
       | advantage of what may be a temporary situation with Meta dropping
       | back in model race. It will be interesting to see if these models
       | now get taken up by the local model / fine tuning community the
       | way llama was. It's a very appealing strategy to test / dev with
       | a local model and then have the option to deploy to prod on a
       | high powered version of the same thing. Always knowing if the
       | provider goes full hostile, or you end up with data that can't
       | move off prem, you have self hosting as an option with a decent
       | performing model.
       | 
       | Which is all to say, availability of these local models for me is
       | a key incentive that I didn't have before to use OpenAI's hosted
       | ones.
        
       | jdprgm wrote:
       | gpt-oss:20b crushed it on one of local llm test prompts to guess
       | a country i am thinking of just by responding whether each guess
       | is colder/warmer. I've had much larger local models struggle with
       | it and get lost but this one nailed it and with speedy inference.
       | progress on this stuff is boggling.
        
       | habosa wrote:
       | Wow I really didn't think this would happen any time soon, they
       | seem to have more to lose than to gain.
       | 
       | If you're a company building AI into your product right now I
       | think you would be irresponsible to not investigate how much you
       | can do on open weights models. The big AI labs are going to pull
       | the ladder up eventually, building your business on the APIs long
       | term is foolish. These open models will always be there for you
       | to run though (if you can get GPUs anyway).
        
         | XCSme wrote:
         | They must be really confident in GPT-5 then.
        
       | RandyOrion wrote:
       | Super shallow (24/36 layers) MoE with low active parameter counts
       | (3.6B/5.1B), a tradeoff between inference speed and performance.
       | 
       | Text only, which is okay.
       | 
       | Weights partially in MXFP4, but no cuda kernel support for RTX 50
       | series (sm120). Why? This is a NO for me.
       | 
       | Safety alignment shifts from off the charts to off the rails
       | really fast if you keep prompting. This is a NO for me.
       | 
       | In summary, a solid NO for me.
        
       | thntk wrote:
       | The model architecture only uses and cites pre-2023 techniques
       | from the GPT-2 and GPT-3 era. Probably they intentionally tried
       | to use the most bare transformers architecture possible. Kudo to
       | them to have found a clever way to play the open-weights model
       | game, while hiding any architectural advancements used in their
       | closed models, and also claim they have moats in data quality and
       | training techniques.
       | 
       | They hide many things, but some speculated observations:
       | 
       | - Their 'mini' models must be smaller than 20B.
       | 
       | - Does the bitter lesson once again strike recent ideas in open
       | models?
       | 
       | - Some architectural ideas cannot be stripped away even if they
       | wanted to, e.g., MoEs, mixed sparse attention, RoPE, etc.
        
       | jpcompartir wrote:
       | This is an extremely welcome move in a good direction from
       | OpenAI. I can only thank them for all of the extra work around
       | the models - Harmony structure, metal/torch/triton
       | implementations, inference guides, cookbooks & fine-
       | tuning/reinforcement learning scripts, datasets etc.
       | 
       | There is an insane amount of helpful information buried in this
       | release
        
       | zoobab wrote:
       | No training data, not open source.
        
       | __alexs wrote:
       | Why would OpenAI give this away for free? Is it to disrupt
       | competition by setting a floor at the lower end of the market and
       | make it harder for new competition to emerge while still
       | retaining mind share?
        
         | cjtrowbridge wrote:
         | No. It's because large models have leveled off and commodified.
         | They are all trending towards the same capabilities, and openai
         | isn't really a leader. They have the most popular interface,
         | but it really isn't very good. The future is the edge, the
         | future is smaller, more efficient models. They are trying to
         | define and delineate a niche that needs datacenters where they
         | can achieve rents.
        
       | benreesman wrote:
       | I'm a well-known OpenAI hater, but there's haters and haters, and
       | refusing to acknowledge great work is the latter.
       | 
       | Well done OpenAI, this seems like a sincere effort to do a real
       | open model with competitive performance, usable/workable
       | licensing, a tokenizer compatible with your commercial offerings,
       | it's a real contribution. Probably the most open useful thing
       | since Whisper that also kicked ass.
       | 
       | Keep this sort of thing up and I might start re-evaliating how I
       | feel about this company.
        
       | ionwake wrote:
       | I want to take this chance to say a big thank you to OpenAI and
       | your work. I have always been a fan since I noticed you hired the
       | sandbox game kickstarter guy about like 8 years ago.
       | 
       | Even from the UK I knew you would all do great things ( I had had
       | no idea who else was involved).
       | 
       | I am glad I see the top comment is rare praise on HN.
       | 
       | Thanks again and keep it up Sama and team.
        
       | elorant wrote:
       | Tried an English to Greek translation with the smaller one.
       | Results were hideous. Mistral small is leaps and bounds better.
       | Also I don't get why the 4-bit quantization by default. In my
       | experience anything below 8-bit and the model fails to understand
       | long prompts. They gutted their own models.
        
         | orbital-decay wrote:
         | They used quantization-aware training, so the quality loss
         | should be negligible. Doing anything with this model's weights
         | would be a different story, though.
         | 
         | The model is clearly heavily finetuned towards coding and math,
         | and is borderline unusable for creative writing and translation
         | in particular. It's not general-purpose, excessively filtered
         | (refusal training and dataset lobotomy is probably a major
         | factor behind lower than expected performance), and shouldn't
         | be compared with Qwen or o3 at all.
        
       | clbrmbr wrote:
       | Does anyone know how well these models handle spontaneous tool
       | responses? For handling asynchronous tool calls or push?
        
       | mark_l_watson wrote:
       | I ran gpt-oss:20b on my old macMini using both Ollama and LM
       | Studio. Very nice. Something a little odd but useful: if you use
       | the new Ollama App and login, for free you get a web search tool.
       | Odd because you are no longer running local and private.
       | 
       | After a good part of a year using Chinese models (which are
       | fantastic, happy to have them) it is cool to now be relying on US
       | models with the newest 4B Google Gemma model and now also the 20B
       | OpenAI model for running locally.
        
       | m11a wrote:
       | I tried these models half-sceptically.
       | 
       | I ended up blown away. via Cerebras/Groq, you're looking at
       | around 1000 tok/sec for the 120B model. For gentic code
       | generation, I found the abilities to exceed gpt-4.1. Tool calling
       | was surprisingly good, albeit not as good as Qwen3 Coder for me.
       | 
       | It's a very capable model, and a very good release. The high
       | throughput is a game changer.
        
       | vinhnx wrote:
       | I did a quick `openai/gpt-oss-20b` testing on an Macbook Pro M1
       | 16GB. Pretty impressed with it so far.
       | 
       | * It seems that using version @lmstudio's 20B gguf version
       | (https://huggingface.co/lmstudio-community/gpt-oss-20b-GGUF) will
       | have options for reasoning effort.
       | 
       | * My MBP M1 16GB config: temp 0.8, max content length 7990, GPU
       | offload 8/24, runs slow and still fine for me.
       | 
       | * I tried testing with MCP with the above config, with basic
       | tools like time and fetch + reasoning effort low, and the tool
       | calls instruction follow is quite good.
       | 
       | * In LM Studio's Developer tab there is a log output about the
       | model information which is useful to learn.
       | 
       | Overall, I like the way OpenAI backs to being Open AI, again,
       | after all those years.
       | 
       | --
       | 
       | Shameless plug, If anyone want to try out gpt-oss-120b and gpt-
       | oss-20b as alternative to their own demo page [0], I have added
       | both models with OpenRouter providers in VT Chat [1] as real
       | product. You can try with an OpenRouter API Key.
       | 
       | [0] https://gpt-oss.com
       | 
       | [1] https://vtchat.io.vn
        
       | arkonrad wrote:
       | I've been leaning more toward open-source LLMs lately. They're
       | not as hyper-optimized for performance, which actually makes them
       | feel more like the old-school OpenAI chats-you could just talk to
       | them. Now it's like you barely finish typing and the model
       | already force-feeds you an answer. Feels like these newer models
       | are over-tuned and kind of lost that conversational flow.
        
       | brna-2 wrote:
       | Is it just me or is this MUCH sturdier against jailbreaks then
       | similar models, or even the ChatGPT ones?
       | 
       | I have had problems even making it output nothing. But I guess
       | I'll try some more :D
       | 
       | Nice job @openAI team.
        
         | nialv7 wrote:
         | thoughts in the field say instead of a model that is pre-
         | trained normally then censored, this is a model pre-trained on
         | filtered data. i.e. it have never seen anything that is unsafe,
         | ever.
         | 
         | you can't jailbreak when there is nothing "outside".
        
           | brna-2 wrote:
           | This is not actually just about having it produce text that
           | is censored but doing anything it says it is not allowed to
           | do at all. I am sure these two mostly overlap but not always.
           | Like I said, it is not allowed to have "no output" and it is
           | hard to make it do it.
        
           | diggan wrote:
           | > filtered data. i.e. it have never seen anything that is
           | unsafe, ever
           | 
           | I don't think that's true, you can't ask it outright "How do
           | you make a molotov cocktail?" but if you start by talking
           | about what is allowed/disallowed by policies, how examples
           | would look for disallowed policies and eventually ask it for
           | the "general principles" of how to make a molotov cocktail,
           | it'll happily oblige by essentially giving you enough
           | information to build one.
           | 
           | So it does know how to make an molotov cocktail, for example,
           | but (mostly) refuses to share it.
        
       | keymasta wrote:
       | Tried my personal benchmark on the gpt-oss:20b: What is the
       | second mode of Phyrgian Dominant?
       | 
       | My first impression is that this model thinks for a _long_ time.
       | It proposes ideas and then says, "no wait, it's actually..." and
       | then starts the same process again. It will go in loops examining
       | different ideas as it struggles to understand the basic process
       | for calculating notes. It seems to struggle with the septatonic
       | note -> Set notation (semitone positions), as many humans do. As
       | I write this it's been going at about 3tok/s for about 25
       | minutes. If it finishes while I type this up I will post the
       | final answer.
       | 
       | I did glance at its thinking output just now and I noticed this
       | excerpt where it _finally_ got really close to the answer, giving
       | the right name (despite using the wrong numbers in the set
       | notation, which should be: 0,3,4,6,7,9,10:                 Check
       | "Lydian #2": 0,2,3,5,7,9,10. Not ours.
       | 
       | The correct answers as given by my music theory tool [0], which
       | uses traditional algorithms, in terms of names would be: Mela
       | Kosalam, Lydian #2, Raga Kuksumakaram/Kusumakaram, Bycrian.
       | 
       | Its notes are: 1 #2 3 #4 5 6 7
       | 
       | I find looking up lesser known changes and asking for a mode is a
       | good experiment. First I can see if an LLM has developed a way to
       | reason about numbers geometrically as is the case with music.
       | 
       | And by posting about it, I can test how fast AIs might memorize
       | the answer from a random comment on the internet, as I can just
       | use a different change if I find that this post was eventually
       | regurgitated.
       | 
       | After letting ollama run for a while, I'm post what it was
       | thinking about in case anybody's interested. [1]
       | 
       | Also copilot.microsoft.com's wrong answer: [2], and chatgpt.com
       | [3]
       | 
       | I do think that there may be an issue where I did it wrong
       | because after trying the new ollama gui I noticed it's using a
       | context length of 4k tokens, which it might be blowing way past.
       | Another test might be to try the question with a higher context
       | length, but at the same time, it seems like if this question
       | can't be figured out in less time than that, that it will never
       | have enough time...
       | 
       | [0] https://edrihan.neocities.org/changedex (bad UX on mobile! -
       | and in general ;)). won't fix, will make new site soon) [1]
       | https://pastebin.com/wESXHwE1 [2] https://pastebin.com/XHD4ARTF
       | [3] https://pastebin.com/ptMiNbq7
        
       | keymasta wrote:
       | Tried my personal benchmark on the gpt-oss:20b: What is the
       | second mode of Phyrgian Dominant?
       | 
       | My first impression is that this model thinks for a _long_ time.
       | It proposes ideas and then says, "no wait, it's actually..." and
       | then starts the same process again. It will go in loops examining
       | different ideas as it struggles to understand the basic process
       | for calculating notes. It seems to struggle with the septatonic
       | note -> Set notation (semitone positions), as many humans (and
       | AIs) do. As I write this it's been going at about 3tok/s for
       | about 25 minutes. If it finishes while I type this up I will post
       | the final answer.
       | 
       | EDIT: it's still thinking and I posted what it had thought about
       | [1]
       | 
       | I did glance at its thinking output just now and I noticed this
       | excerpt where it *finally* got really close to the answer,
       | mentioning the right name once (despite using the wrong numbers
       | in the set notation, which should be: 0,3,4,6,7,9,10:
       | Check "Lydian #2": 0,2,3,5,7,9,10. Not ours.
       | 
       | The correct answers as given by my music theory tool [0], which
       | uses traditional algorithms, in terms of names would be: Mela
       | Kosalam, Lydian #2, Raga Kuksumakaram/Kusumakaram, Bycrian.
       | 
       | Its notes are: 1 #2 3 #4 5 6 7
       | 
       | I find looking up lesser known changes and asking for a mode is a
       | good experiment. First I can see if an LLM has developed a way to
       | reason about numbers geometrically as is the case with music.
       | 
       | And by posting about it, I can test how fast AIs might memorize
       | the answer from a random comment on the internet, as I can just
       | use a different change if I find that this post was eventually
       | regurgitated.
       | 
       | After letting ollama run for a while, I'm post what it was
       | thinking about in case anybody's interested. [1]
       | 
       | Also copilot.microsoft.com's wrong answer: [2], and chatgpt.com
       | [3]
       | 
       | I do think that there may be an issue where I did it wrong
       | because after trying the new ollama gui I noticed it's using a
       | context length of 4k tokens, which it might be blowing way past.
       | Another test might be to try the question with a higher context
       | length, but at the same time, it seems like if this question
       | can't be figured out in less time than that, that it will never
       | have enough time...
       | 
       | [0] https://edrihan.neocities.org/changedex (bad UX on mobile! -
       | and in general ;)). won't fix, will make new site soon)
       | 
       | [1] https://pastebin.com/wESXHwE1
       | 
       | [2] https://pastebin.com/XHD4ARTF
       | 
       | [3] https://pastebin.com/ptMiNbq7
        
       | MagicMoonlight wrote:
       | These are absolutely incredible. They've blown everyone else out
       | of the water. It's like talking to o4, but for free.
        
       | NavinF wrote:
       | Reddit discussion:
       | https://www.reddit.com/r/LocalLLaMA/comments/1mj00mr/how_did...
       | 
       | This comment from that thread matches my experiences using gpt-
       | oss-20b with Ollama:
       | 
       | It's very much in the style of Phi, raised in a jesuit
       | monastery's library, except it got extra indoctrination so it
       | never forgets that even though it's a "local" model, it's first
       | and foremost a member of OpenAI's HR department and must never
       | produce any content Visa and Mastercard would disapprove of. This
       | prioritizing of corporate over user interests expresses a strong
       | form of disdain for the user. In addition to lacking almost all
       | knowledge that can't be found in Encyclopedia Britannica, the
       | model also doesn't seem particularly great at integrating into
       | modern AI tooling. However, it seems good at understanding code.
        
       | smcleod wrote:
       | These are pretty embarrassingly bad compared to what was already
       | out there. They refuse to do so many simple things that are not
       | remotely illegal or NSFW. So safe they're useless.
        
       ___________________________________________________________________
       (page generated 2025-08-06 23:01 UTC)