[HN Gopher] Workhorse LLMs: Why Open Source Models Dominate Clos...
       ___________________________________________________________________
        
       Workhorse LLMs: Why Open Source Models Dominate Closed Source for
       Batch Tasks
        
       Author : cmogni1
       Score  : 18 points
       Date   : 2025-06-06 18:38 UTC (4 hours ago)
        
 (HTM) web link (sutro.sh)
 (TXT) w3m dump (sutro.sh)
        
       | ramesh31 wrote:
       | Flash is just so obscenely cheap at this point it's hard to
       | justify the headache of self hosting though. Really only applies
       | to sensitive data IMO.
        
         | behnamoh wrote:
         | You're getting downvoted but what you said is true. The cost of
         | self-hosting (and achieving +70 tok/sec consistently across the
         | entire context window) has never been low enough to justify
         | open source as a viable competitor to proprietary models of
         | OpenAI, Google, and Anthropic.
        
         | jacob019 wrote:
         | That's true for Flash 2.0 at $0.40/mtok output. GPT-4.1-nano is
         | the same price and also surprisingly capable. I can spend real
         | money with 2.5 flash, with those $3.50/mtok thinking tokens,
         | worth it though. OP is an inference provider, so there may be
         | some bias. Open source can't compete on context length either,
         | nothing touches 2.5 flash for the price with long context--I've
         | experimented with this a lot for my agentic pricing system.
         | Open source models are improving, but they aren't really any
         | cheaper right now, R1 for example does quite well performance
         | wise, but it uses a LOT of tokens to get there, further
         | limiting the shorter context window. There's still value in the
         | open source models, each model has unique strengths and they're
         | advancing quickly, but the frontier labs are moving fast too
         | and have very compelling "workhorse" offers.
        
       | delichon wrote:
       | Pass the choices through, please. It's so context dependent that
       | I want a <dumber> and a <smarter> button, with units of $/M
       | tokens. And another setting to send a particular prompt to "[x]
       | batch" and email me with the answer later. For most things I'll
       | start dumb and fast, but switch to smart and slow when the going
       | gets rough.
        
       ___________________________________________________________________
       (page generated 2025-06-06 23:00 UTC)