[HN Gopher] Workhorse LLMs: Why Open Source Models Dominate Clos...
___________________________________________________________________
Workhorse LLMs: Why Open Source Models Dominate Closed Source for
Batch Tasks
Author : cmogni1
Score : 18 points
Date : 2025-06-06 18:38 UTC (4 hours ago)
(HTM) web link (sutro.sh)
(TXT) w3m dump (sutro.sh)
| ramesh31 wrote:
| Flash is just so obscenely cheap at this point it's hard to
| justify the headache of self hosting though. Really only applies
| to sensitive data IMO.
| behnamoh wrote:
| You're getting downvoted but what you said is true. The cost of
| self-hosting (and achieving +70 tok/sec consistently across the
| entire context window) has never been low enough to justify
| open source as a viable competitor to proprietary models of
| OpenAI, Google, and Anthropic.
| jacob019 wrote:
| That's true for Flash 2.0 at $0.40/mtok output. GPT-4.1-nano is
| the same price and also surprisingly capable. I can spend real
| money with 2.5 flash, with those $3.50/mtok thinking tokens,
| worth it though. OP is an inference provider, so there may be
| some bias. Open source can't compete on context length either,
| nothing touches 2.5 flash for the price with long context--I've
| experimented with this a lot for my agentic pricing system.
| Open source models are improving, but they aren't really any
| cheaper right now, R1 for example does quite well performance
| wise, but it uses a LOT of tokens to get there, further
| limiting the shorter context window. There's still value in the
| open source models, each model has unique strengths and they're
| advancing quickly, but the frontier labs are moving fast too
| and have very compelling "workhorse" offers.
| delichon wrote:
| Pass the choices through, please. It's so context dependent that
| I want a <dumber> and a <smarter> button, with units of $/M
| tokens. And another setting to send a particular prompt to "[x]
| batch" and email me with the answer later. For most things I'll
| start dumb and fast, but switch to smart and slow when the going
| gets rough.
___________________________________________________________________
(page generated 2025-06-06 23:00 UTC)