[HN Gopher] Does the Bitter Lesson Have Limits?
___________________________________________________________________
Does the Bitter Lesson Have Limits?
Author : dbreunig
Score : 59 points
Date : 2025-08-01 20:21 UTC (2 hours ago)
(HTM) web link (www.dbreunig.com)
(TXT) w3m dump (www.dbreunig.com)
| o11c wrote:
| This is about AI, despite the title being ambiguous.
| andy99 wrote:
| Is there more than one bitter lesson?
| terminalshort wrote:
| I've learned many
| thrawa8387336 wrote:
| If we're going to be pedantic:
|
| This is about AI, the title is ambiguous.
|
| Despite was used unambiguously wrong.
| o11c wrote:
| "Despite" is absolutely correct when you realize that
| cheating in the title is a way to make people look at
| articles when they would rather ignore AI in favor of
| actually-useful/interesting subjects.
| criemen wrote:
| All links render as blue strike-through line in Firefox
| (underline in Chrome), hurting legibility :(
| thwg wrote:
| interesting. i see underlines in firefox. but the width of the
| line is 2px in chrome, 1px in firefox.
| penneyd wrote:
| Fine over here using Firefox
| Cerium wrote:
| I'm getting the same effect, seems to be the css property
| "text-underline-position: under;"
| grubbypaw wrote:
| I was not at all a fan of "The Bitter Lesson versus The Garbage
| Can", but this misses the same thing that it missed.
|
| The Bitter Lesson is from the perspective of how to spend your
| entire career. It is correct over the course of a very long time,
| and bakes in Moore's Law.
|
| The Bitter Lesson is true because general methods capture these
| assumed hardware gains that specific methods may not. It was
| never meant for contrasting methods at a specific moment in time.
| At a specific moment in time you're just describing Explore vs
| Exploit.
| schmidtleonard wrote:
| Right, and if you spot a job that needs doing and can be done
| by a specialized model, waving your hands about general purpose
| scale-leveraging models eventually overtaking specialized
| models has not historically been a winning approach.
|
| Except in the last year or two, which is why people are citing
| it a lot :)
| Animats wrote:
| The question is when price/performance hits financial limits.
| That point may be close, if not already passed.
|
| Interestingly, this hasn't happened for wafer fabs. A modern
| wafer fab costs US$1bn to US$3bn, and there is talk of US$20bn
| wafer fabs. Around the year 2000, those would have been un-
| financeable. It was expected that fab cost was going to be a
| constraint on feature size. That didn't happen.
|
| For years, it was thought that the ASML approach to extreme UV
| was going to cost too much. It's a horrible hack, shooting off
| droplets of tin to be vaporized by lasers just to generate soft
| X-rays. Industry people were hoping for small synchrotrons or
| X-ray lasers or E-beam machines or something sane. But none of
| those worked out. Progress went on by making a fundamentally
| awful process work commercially, at insane cost.
| schmidtleonard wrote:
| Fundamentally awful but spiritually delightful.
| thrawa8387336 wrote:
| This brings about a good point:
|
| How much of the recent bitter lesson peddling is done by compute
| salesmen?
|
| How much of it is done by people who can buy a lot of compute?
|
| Deepseek was scandalous for a reason.
| fdav wrote:
| A better lesson: https://rodneybrooks.com/a-better-lesson/
| benlivengood wrote:
| I think it's a little early (even in these AI times) to call HRM
| a counterexample of the bitter lesson.
|
| I think it's quite a bit more likely for HRM to scale
| embarrassingly far and outstrip the tons of RLHF and distillation
| that's been invested in for transformers, more of a bitter lesson
| 2.0 than anything else.
| PaulHoule wrote:
| One odd thing is that progress in SAT/SMT solvers has been almost
| as good as progress in neural networks from the 1970s to the
| present. There was a time I was really interested in production
| rules and expert system shells and systems in the early 1980s
| often didn't even use RETE and didn't have hash indexes so of
| course a rule base of 10,000 looked unmanageable, by 2015 you
| could have a million rules in Drools and it worked just fine.
| jamesblonde wrote:
| I see elements of the bitter lesson in arguments about context
| window size and RAG. The argument is about retrieval being the
| equivalent of compute/search. Just improve them, to hell with all
| else.
|
| However, retrieval is not just google search. Primary key lookups
| in my db are also retrieval. As are vector index queries or BM25
| free text search queries. It's not a general purpose area like
| compute/search. In summary, i don't think that RAG is dead.
| Context engineering is just like feature engineering - transform
| the swamp of data into a structured signal that is easy for in-
| context learning to learn.
|
| The corollory of all this is it's not just about scaling up
| agents - giving them more LLMs and more data via MCP. The bitter
| lesson doesn't apply to agents yet.
| sorenjan wrote:
| The last time I was reminded of the bitter lesson was when I read
| about Guidance & Control Networks, after seeing them used in an
| autonomous drone that beat the best human FPV pilots [0].
| Basically it's using a small MLP (Multi Layer Perceptron) on the
| order of 200 parameters, and using the drone's state as input and
| controlling the motors directly with the output. We have all
| kinds of fancy control theory like MCP (Model Predictive
| Control), but it turns out that the best solution might be to
| train a relatively tiny NN using a mix of simulation and
| collected sensor data instead. It's not better because of huge
| computation resources, it's actually more computationally
| efficient than some classic alternatives, but it is more general.
|
| [0] https://www.tudelft.nl/en/2025/lr/autonomous-drone-from-
| tu-d...
|
| https://www.nature.com/articles/s41586-023-06419-4
|
| https://arxiv.org/abs/2305.13078
|
| https://arxiv.org/abs/2305.02705
| logicchains wrote:
| >It's not better because of huge computation resources, it's
| actually more computationally efficient than some classic
| alternatives
|
| It's similar with options pricing. The most sophisticated
| models like multivariate stochastic volatility are
| computationally expensive to approximate with classical
| approaches (and have no closed form solution), so just training
| a small NN on the output of a vast number of simulations of the
| underlying processes ends up producing a more efficient model
| than traditional approaches. Same with stuff like trinomial
| trees.
| William_BB wrote:
| Interesting. Are these models the SOTA in the options trading
| industry (e.g. MM) nowadays?
| cactusfrog wrote:
| This is really interesting. I think force fields in molecular
| dynamics have underwent a similar NN revolution. You train
| your NN on the output of expensive calculations to replace
| the expensive function with a cheap one. Could you train a
| small language model with a big one?
| lossolo wrote:
| > Could you train a small language model with a big one?
|
| Yes, it's called distillation.
| mentalgear wrote:
| The Neuro-Symbolic approach is what the article describes,
| without actually naming it.
| pu_pe wrote:
| I'm not so sure Stockfish is a good example. The fact it can run
| on an Iphone is due to Moore's law, which follows the same
| pattern. And Deepmind briefly taking its throne was a very good
| example of the Bitter Lesson.
| aydyn wrote:
| Does anyone else see the big flaw with the chess engine analogy?
|
| When AlphaZero came along it blew stockfish out of the water.
|
| Stockfish is a top engine now because besides that initial proof
| of concept _there 's no money to be made by throwing compute at
| Chess_.
| itkovian_ wrote:
| I don't think people understand the point sutton was making; he's
| saying that general, simple systems that get better with scale
| tend to outperform hand engineered systems that don't. It's a
| kind of subtle point that's implicitly saying hand engineering
| inhibits scale because it inhibits generality. He is not saying
| anything about the rate, doesn't claim llms/gd are the best
| system, in fact I'd guess he thinks there's likely an even more
| general approach that would be better. It's comparing two classes
| of approaches not commenting on the merits of particular systems.
| eldenring wrote:
| Yep this article is self centered and perfectly represents the
| type of ego Sutton was referencing. Maybe in a year or two
| general methods will improve the author's workflow
| significantly once again (eg. better models) and they would
| still add a bit of human logic on top and claim victory.
| throw1289312 wrote:
| This article focuses on the learning aspect of The Bitter Lesson.
| But The Bitter Lesson is about both search _and_ learning.
|
| This article cites Leela, the chess program, as an example of the
| Bitter Lesson, as it learns chess using a general method. The
| article then goes on to cite Stockfish as a counterexample,
| because it uses human-written heuristics to perform search.
| However, as you add compute to Stockfish's search, or spend time
| optimizing compute-expenditure-per-position, Stockfish gets
| better. Stockfish isn't a counterexample, search is still a part
| of The Bitter Lesson!
| beepbooptheory wrote:
| I should know better than to speak anything too enthusiastically
| about the humanities or feminism on this particular forum, but I
| just want to say the connection here to Donna Haraway was a
| surprise and delight. Any one open to that world at all would
| behoove themselves to check her out. "The Cyborg Manifesto" is
| the one everyone knows, but I recently finished "Living with the
| Trouble" and can't recommend it enough!
| benreesman wrote:
| When The Bitter Lesson essay came out it was a bunch of important
| things: addressing an audience of serious practitioners,
| contrarian and challenging entrenched dogma, written without any
| serious reputational or (especially) financial stake in the
| outcome. It needed saying and it was important.
|
| But its become a lazy crutch for a bunch of people who meet
| _none_ of those criteria and perverted into a statement more
| along the lines of "LLMs trained on NVIDIA cards by one of a
| handful of US companies are guaranteed to outperform every other
| approach from here to the Singularity".
|
| Nope. Not at all guaranteed, and at the moment? Not even looking
| likely.
|
| It will have other stuff in it. Maybe that's prediction in
| representation space like JEPA, maybe its MCTS like Alpha*, maybe
| its some totally new thing.
|
| And maybe it happens in Hangzhou.
| stego-tech wrote:
| Not familiar with the cited essay (added to reading list for the
| weekend), but the post does make some generally good points on
| generalization (it me) vs specialization, and the benefits of an
| optimized and scalable generalist approach vs a niche,
| specialized approach, specifically with regards to current LLMs
| (and to a lesser degree, ML as a whole).
|
| Where I furrow my brow is the casual mixing of philosophical
| conjecture with technical observations or statements. Mixing the
| two all too often feels like a crutch around defending either
| singular perspective in an argument by stating the other half of
| the argument defends the first half. I know I'm not articulating
| my point well here, but it just comes off as a little...
| _insincere_ , I guess? I'm sure someone here will find the
| appropriate words to communicate my point better, if I'm being
| understood.
|
| One nitpick on the philosophical side of things I'd point out is
| that a lot of the resistance to AI replacing human labor is less
| to do with the self-styled importance of humanity, and more the
| bleak future of a species where a handful of Capitalists will
| destroy civilization for the remainder to benefit themselves.
| _That_ is what sticks in our collective craw, and a large reason
| for the pushback against AI - and nobody in a position of power
| is taking that threat remotely seriously, largely _because_ the
| owners of AI have a vested interest in preventing that from being
| addressed (since it would inevitably curb the very power they 're
| investing in building for themselves).
___________________________________________________________________
(page generated 2025-08-01 23:00 UTC)