[HN Gopher] Does the Bitter Lesson Have Limits?
       ___________________________________________________________________
        
       Does the Bitter Lesson Have Limits?
        
       Author : dbreunig
       Score  : 59 points
       Date   : 2025-08-01 20:21 UTC (2 hours ago)
        
 (HTM) web link (www.dbreunig.com)
 (TXT) w3m dump (www.dbreunig.com)
        
       | o11c wrote:
       | This is about AI, despite the title being ambiguous.
        
         | andy99 wrote:
         | Is there more than one bitter lesson?
        
           | terminalshort wrote:
           | I've learned many
        
         | thrawa8387336 wrote:
         | If we're going to be pedantic:
         | 
         | This is about AI, the title is ambiguous.
         | 
         | Despite was used unambiguously wrong.
        
           | o11c wrote:
           | "Despite" is absolutely correct when you realize that
           | cheating in the title is a way to make people look at
           | articles when they would rather ignore AI in favor of
           | actually-useful/interesting subjects.
        
       | criemen wrote:
       | All links render as blue strike-through line in Firefox
       | (underline in Chrome), hurting legibility :(
        
         | thwg wrote:
         | interesting. i see underlines in firefox. but the width of the
         | line is 2px in chrome, 1px in firefox.
        
         | penneyd wrote:
         | Fine over here using Firefox
        
         | Cerium wrote:
         | I'm getting the same effect, seems to be the css property
         | "text-underline-position: under;"
        
       | grubbypaw wrote:
       | I was not at all a fan of "The Bitter Lesson versus The Garbage
       | Can", but this misses the same thing that it missed.
       | 
       | The Bitter Lesson is from the perspective of how to spend your
       | entire career. It is correct over the course of a very long time,
       | and bakes in Moore's Law.
       | 
       | The Bitter Lesson is true because general methods capture these
       | assumed hardware gains that specific methods may not. It was
       | never meant for contrasting methods at a specific moment in time.
       | At a specific moment in time you're just describing Explore vs
       | Exploit.
        
         | schmidtleonard wrote:
         | Right, and if you spot a job that needs doing and can be done
         | by a specialized model, waving your hands about general purpose
         | scale-leveraging models eventually overtaking specialized
         | models has not historically been a winning approach.
         | 
         | Except in the last year or two, which is why people are citing
         | it a lot :)
        
       | Animats wrote:
       | The question is when price/performance hits financial limits.
       | That point may be close, if not already passed.
       | 
       | Interestingly, this hasn't happened for wafer fabs. A modern
       | wafer fab costs US$1bn to US$3bn, and there is talk of US$20bn
       | wafer fabs. Around the year 2000, those would have been un-
       | financeable. It was expected that fab cost was going to be a
       | constraint on feature size. That didn't happen.
       | 
       | For years, it was thought that the ASML approach to extreme UV
       | was going to cost too much. It's a horrible hack, shooting off
       | droplets of tin to be vaporized by lasers just to generate soft
       | X-rays. Industry people were hoping for small synchrotrons or
       | X-ray lasers or E-beam machines or something sane. But none of
       | those worked out. Progress went on by making a fundamentally
       | awful process work commercially, at insane cost.
        
         | schmidtleonard wrote:
         | Fundamentally awful but spiritually delightful.
        
       | thrawa8387336 wrote:
       | This brings about a good point:
       | 
       | How much of the recent bitter lesson peddling is done by compute
       | salesmen?
       | 
       | How much of it is done by people who can buy a lot of compute?
       | 
       | Deepseek was scandalous for a reason.
        
       | fdav wrote:
       | A better lesson: https://rodneybrooks.com/a-better-lesson/
        
       | benlivengood wrote:
       | I think it's a little early (even in these AI times) to call HRM
       | a counterexample of the bitter lesson.
       | 
       | I think it's quite a bit more likely for HRM to scale
       | embarrassingly far and outstrip the tons of RLHF and distillation
       | that's been invested in for transformers, more of a bitter lesson
       | 2.0 than anything else.
        
       | PaulHoule wrote:
       | One odd thing is that progress in SAT/SMT solvers has been almost
       | as good as progress in neural networks from the 1970s to the
       | present. There was a time I was really interested in production
       | rules and expert system shells and systems in the early 1980s
       | often didn't even use RETE and didn't have hash indexes so of
       | course a rule base of 10,000 looked unmanageable, by 2015 you
       | could have a million rules in Drools and it worked just fine.
        
       | jamesblonde wrote:
       | I see elements of the bitter lesson in arguments about context
       | window size and RAG. The argument is about retrieval being the
       | equivalent of compute/search. Just improve them, to hell with all
       | else.
       | 
       | However, retrieval is not just google search. Primary key lookups
       | in my db are also retrieval. As are vector index queries or BM25
       | free text search queries. It's not a general purpose area like
       | compute/search. In summary, i don't think that RAG is dead.
       | Context engineering is just like feature engineering - transform
       | the swamp of data into a structured signal that is easy for in-
       | context learning to learn.
       | 
       | The corollory of all this is it's not just about scaling up
       | agents - giving them more LLMs and more data via MCP. The bitter
       | lesson doesn't apply to agents yet.
        
       | sorenjan wrote:
       | The last time I was reminded of the bitter lesson was when I read
       | about Guidance & Control Networks, after seeing them used in an
       | autonomous drone that beat the best human FPV pilots [0].
       | Basically it's using a small MLP (Multi Layer Perceptron) on the
       | order of 200 parameters, and using the drone's state as input and
       | controlling the motors directly with the output. We have all
       | kinds of fancy control theory like MCP (Model Predictive
       | Control), but it turns out that the best solution might be to
       | train a relatively tiny NN using a mix of simulation and
       | collected sensor data instead. It's not better because of huge
       | computation resources, it's actually more computationally
       | efficient than some classic alternatives, but it is more general.
       | 
       | [0] https://www.tudelft.nl/en/2025/lr/autonomous-drone-from-
       | tu-d...
       | 
       | https://www.nature.com/articles/s41586-023-06419-4
       | 
       | https://arxiv.org/abs/2305.13078
       | 
       | https://arxiv.org/abs/2305.02705
        
         | logicchains wrote:
         | >It's not better because of huge computation resources, it's
         | actually more computationally efficient than some classic
         | alternatives
         | 
         | It's similar with options pricing. The most sophisticated
         | models like multivariate stochastic volatility are
         | computationally expensive to approximate with classical
         | approaches (and have no closed form solution), so just training
         | a small NN on the output of a vast number of simulations of the
         | underlying processes ends up producing a more efficient model
         | than traditional approaches. Same with stuff like trinomial
         | trees.
        
           | William_BB wrote:
           | Interesting. Are these models the SOTA in the options trading
           | industry (e.g. MM) nowadays?
        
           | cactusfrog wrote:
           | This is really interesting. I think force fields in molecular
           | dynamics have underwent a similar NN revolution. You train
           | your NN on the output of expensive calculations to replace
           | the expensive function with a cheap one. Could you train a
           | small language model with a big one?
        
             | lossolo wrote:
             | > Could you train a small language model with a big one?
             | 
             | Yes, it's called distillation.
        
       | mentalgear wrote:
       | The Neuro-Symbolic approach is what the article describes,
       | without actually naming it.
        
       | pu_pe wrote:
       | I'm not so sure Stockfish is a good example. The fact it can run
       | on an Iphone is due to Moore's law, which follows the same
       | pattern. And Deepmind briefly taking its throne was a very good
       | example of the Bitter Lesson.
        
       | aydyn wrote:
       | Does anyone else see the big flaw with the chess engine analogy?
       | 
       | When AlphaZero came along it blew stockfish out of the water.
       | 
       | Stockfish is a top engine now because besides that initial proof
       | of concept _there 's no money to be made by throwing compute at
       | Chess_.
        
       | itkovian_ wrote:
       | I don't think people understand the point sutton was making; he's
       | saying that general, simple systems that get better with scale
       | tend to outperform hand engineered systems that don't. It's a
       | kind of subtle point that's implicitly saying hand engineering
       | inhibits scale because it inhibits generality. He is not saying
       | anything about the rate, doesn't claim llms/gd are the best
       | system, in fact I'd guess he thinks there's likely an even more
       | general approach that would be better. It's comparing two classes
       | of approaches not commenting on the merits of particular systems.
        
         | eldenring wrote:
         | Yep this article is self centered and perfectly represents the
         | type of ego Sutton was referencing. Maybe in a year or two
         | general methods will improve the author's workflow
         | significantly once again (eg. better models) and they would
         | still add a bit of human logic on top and claim victory.
        
       | throw1289312 wrote:
       | This article focuses on the learning aspect of The Bitter Lesson.
       | But The Bitter Lesson is about both search _and_ learning.
       | 
       | This article cites Leela, the chess program, as an example of the
       | Bitter Lesson, as it learns chess using a general method. The
       | article then goes on to cite Stockfish as a counterexample,
       | because it uses human-written heuristics to perform search.
       | However, as you add compute to Stockfish's search, or spend time
       | optimizing compute-expenditure-per-position, Stockfish gets
       | better. Stockfish isn't a counterexample, search is still a part
       | of The Bitter Lesson!
        
       | beepbooptheory wrote:
       | I should know better than to speak anything too enthusiastically
       | about the humanities or feminism on this particular forum, but I
       | just want to say the connection here to Donna Haraway was a
       | surprise and delight. Any one open to that world at all would
       | behoove themselves to check her out. "The Cyborg Manifesto" is
       | the one everyone knows, but I recently finished "Living with the
       | Trouble" and can't recommend it enough!
        
       | benreesman wrote:
       | When The Bitter Lesson essay came out it was a bunch of important
       | things: addressing an audience of serious practitioners,
       | contrarian and challenging entrenched dogma, written without any
       | serious reputational or (especially) financial stake in the
       | outcome. It needed saying and it was important.
       | 
       | But its become a lazy crutch for a bunch of people who meet
       | _none_ of those criteria and perverted into a statement more
       | along the lines of  "LLMs trained on NVIDIA cards by one of a
       | handful of US companies are guaranteed to outperform every other
       | approach from here to the Singularity".
       | 
       | Nope. Not at all guaranteed, and at the moment? Not even looking
       | likely.
       | 
       | It will have other stuff in it. Maybe that's prediction in
       | representation space like JEPA, maybe its MCTS like Alpha*, maybe
       | its some totally new thing.
       | 
       | And maybe it happens in Hangzhou.
        
       | stego-tech wrote:
       | Not familiar with the cited essay (added to reading list for the
       | weekend), but the post does make some generally good points on
       | generalization (it me) vs specialization, and the benefits of an
       | optimized and scalable generalist approach vs a niche,
       | specialized approach, specifically with regards to current LLMs
       | (and to a lesser degree, ML as a whole).
       | 
       | Where I furrow my brow is the casual mixing of philosophical
       | conjecture with technical observations or statements. Mixing the
       | two all too often feels like a crutch around defending either
       | singular perspective in an argument by stating the other half of
       | the argument defends the first half. I know I'm not articulating
       | my point well here, but it just comes off as a little...
       | _insincere_ , I guess? I'm sure someone here will find the
       | appropriate words to communicate my point better, if I'm being
       | understood.
       | 
       | One nitpick on the philosophical side of things I'd point out is
       | that a lot of the resistance to AI replacing human labor is less
       | to do with the self-styled importance of humanity, and more the
       | bleak future of a species where a handful of Capitalists will
       | destroy civilization for the remainder to benefit themselves.
       | _That_ is what sticks in our collective craw, and a large reason
       | for the pushback against AI - and nobody in a position of power
       | is taking that threat remotely seriously, largely _because_ the
       | owners of AI have a vested interest in preventing that from being
       | addressed (since it would inevitably curb the very power they 're
       | investing in building for themselves).
        
       ___________________________________________________________________
       (page generated 2025-08-01 23:00 UTC)