[HN Gopher] 'Attention is all you need' coauthor says he's 'sick...
       ___________________________________________________________________
        
       'Attention is all you need' coauthor says he's 'sick' of
       transformers
        
       Author : achow
       Score  : 290 points
       Date   : 2025-10-24 04:40 UTC (18 hours ago)
        
 (HTM) web link (venturebeat.com)
 (TXT) w3m dump (venturebeat.com)
        
       | Xcelerate wrote:
       | Haha, I like to joke that we were on track for the singularity in
       | 2024, but it stalled because the research time gap between
       | "profitable" and "recursive self-improvement" was just a _bit_
       | too long that we 're now stranded on the transformer model for
       | the next two decades until every last cent has been extracted
       | from it.
        
         | ai-christianson wrote:
         | There's massive hardware and energy infra built out going on.
         | None of that is specialized to run only transformers at this
         | point, so wouldn't that create a huge incentive to find newer
         | and better architectures to get the most out of all this
         | hardware and energy infra?
        
           | Mehvix wrote:
           | >None of that is specialized to run only transformers at this
           | point
           | 
           | isn't this what [etched](https://www.etched.com/) is doing?
        
             | imtringued wrote:
             | Only being able to run transformers is a silly concept,
             | because attention consists of two matrix multiplications,
             | which are the standard operation in feed forward and
             | convolutional layers. Basically, you get transformers for
             | free.
        
               | kadushka wrote:
               | devil is in the details
        
         | Davidzheng wrote:
         | how do you know we're not at recursive self-improvement but the
         | rate is just slower than human-mediated improvement?
        
       | teleforce wrote:
       | >The project, he said, was "very organic, bottom up," born from
       | "talking over lunch or scrawling randomly on the whiteboard in
       | the office."
       | 
       | Many of the breakthrough and game changing inventions were done
       | this way with the back of the envelope discussions, the other
       | popular example was the Ethernet network.
       | 
       | Some good stories of similar culture in AT&T Bell lab is well
       | described in the Hamming's book [1].
       | 
       | [1] Stripe Press The Art of Doing Science and Engineering:
       | 
       | https://press.stripe.com/the-art-of-doing-science-and-engine...
        
         | atonse wrote:
         | True in creativity too.
         | 
         | According to various stories pieced together, the ideas of 4 of
         | Pixar's early hits were conceived on or around one lunch.
         | 
         | Bug's Life, Wall-E, Monsters, Inc
        
           | emi2k01 wrote:
           | The fourth one is Finding Nemo
        
         | CaptainOfCoit wrote:
         | All transformative inventions and innovations seems to come
         | from similar scenarios like "I was playing around with these
         | things" or "I just met X at lunch and we discussed ...".
         | 
         | I'm wondering how big impact work from home will really have on
         | humanity in general, when so many of our life changing
         | discoveries comes from the odd chance of two specific people
         | happening to be in the same place at some moment in time.
        
           | DyslexicAtheist wrote:
           | I'd go back to the office in a heartbeat provided it was an
           | actual office. And not an "open-office" layout, that people
           | are forced to try to concentrate with all the noise and
           | people passing behind them constantly.
           | 
           | The agile treadmill (with PM's breathing down our necks) and
           | features getting planned and delivered in 2 week-sprints, has
           | also reduced our ability to just do something we feel needs
           | getting done. Today you go to work to feed several layers of
           | incompetent managers - there is no room for play, or for
           | creativity. At least in most orgs I know.
           | 
           | I think innovation (or even joy of being at work) needs more
           | than just the office, or people, or a canteen, but an
           | environment that supports it.
        
             | entropicdrifter wrote:
             | Personally, I try to under-promise on what I think I can do
             | every sprint specifically so I can spend more time
             | mentoring more junior engineers, brainstorming random
             | ideas, and working on stuff that nobody has called out as
             | something that needs working on yet.
             | 
             | Basically, I set aside as much time as I can to squeeze in
             | creativity and real engineering work into the job.
             | Otherwise I'd go crazy from the grind of just cranking out
             | deliverables
        
               | DyslexicAtheist wrote:
               | yeah that sounds like a good strategy to avoid burn-out.
        
             | dekhn wrote:
             | We have an open office surrounded by "breakout offices". I
             | simply squat in one of the offices (I take most meetings
             | over video chat), as do most of the other principals. I
             | don't think I could do my job in an office if I couldn't
             | have a room to work in most of the time.
             | 
             | As for agile: I've made it clear to my PMs that I generally
             | plan on a quarterly/half year basis and my work and other
             | people's work adheres to that schedule, not weekly sprints
             | (we stay up to date in a slack channel, no standups)
        
           | fipar wrote:
           | What you say is true, but let's not forget that Ken Thompson
           | did the first version of unix in 3 weeks while his wife had
           | gone to California with their child to visit relatives, so
           | deep focus is important too.
           | 
           | It seems, in those days, people at Bell Labs did get the best
           | of both worlds: being able to have chance encounters with
           | very smart people while also being able to just be gone for
           | weeks to work undistracted.
           | 
           | A dream job that probably didn't even feel like a job (at
           | least that's the impression I get from hearing Thompson talk
           | about that time).
        
           | tagami wrote:
           | Perhaps this is why we see AI devotees congregate in places
           | like SF - increased probability
        
         | bitwize wrote:
         | One of the OG Unix guys (was it Kernighan?) literally specced
         | out UTF-8 on a cocktail napkin.
        
           | dekhn wrote:
           | Thompson and Pike: https://en.wikipedia.org/wiki/UTF-8
           | 
           | """Thompson's design was outlined on September 2, 1992, on a
           | placemat in a New Jersey diner with Rob Pike. In the
           | following days, Pike and Thompson implemented it and updated
           | Plan 9 to use it throughout,[11] and then communicated their
           | success back to X/Open, which accepted it as the
           | specification for FSS-UTF.[9]"""
        
         | liuliu wrote:
         | And it is always felt to me that has lineage from neural Turing
         | machine line of work as prior. The transformative part was 1.
         | find a good task (machine translation) and a reasonable way to
         | stack (encoder-decoder architecture); 2. run the experiment; 3.
         | ditch the external KV store idea and just use self-projected
         | KV.
         | 
         | Related thread:https://threadreaderapp.com/thread/1864023344435
         | 380613.html
        
       | Proofread0592 wrote:
       | I think a transformer wrote this article, seeing a suspicious
       | number of em dashes in the last section
        
         | DonHopkins wrote:
         | The next big AI architectural fad will be "disrupters".
        
           | judge2020 wrote:
           | Maybe even 'terminators'
        
       | yieldcrv wrote:
       | These are evolutionary dead ends, sorry that I'm not inspired
       | enough to see it any other way, this transformer based direction
       | is good enough
       | 
       | The LLM stack has enough branches of evolution within it for
       | efficiency, agent-based work can power a new industrial
       | revolution specifically around white collar workers on its own,
       | while expanding the self-expression for personal fulfillment for
       | everyone else
       | 
       | Well have fun sir
        
         | password54321 wrote:
         | ^AI psychosis, never underestimate its effects.
         | 
         | https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
        
       | TheRealPomax wrote:
       | tl;dr: AI is built on top of science done by people just "doing
       | research", and transformers took off so hard that those same
       | people now can't do any meaningful, real AI research anymore
       | because everyone only wants to pay for "how to make this one
       | single thing that everyone else is also doing, better" instead of
       | being willing to fund research into literally anything else.
       | 
       | It's like if someone invented the hamburger and every single food
       | outlet decided to only serve hamburgers from that point on, only
       | spending time and money on making the perfect hamburger, rather
       | than spending time and effort on making great meals. Which sounds
       | ludicrously far-fetched, but is exactly what happened here.
        
         | jjtheblunt wrote:
         | Good points, and it made me have a mini epiphany...
         | 
         | i think you analogously just described Sun Microsystems, where
         | Unixes (BSD originally in their case, generalized to SVR4 (?)
         | hybrid later) worked soooo well, that NT was built as a
         | hybridization for the Microsoft user base and Apple reabsorbed
         | the BSD-Mach-DisplayPostscript hybridization spinoff NeXT,
         | while Linux simultaneously thrived.
        
         | marcel-c13 wrote:
         | Dude now I want a hamburger :(
        
         | hatthew wrote:
         | This is a decent analogy, but I think it understates how good
         | transformers are. People are all making hamburgers because it's
         | _really hard_ to find anything better than a hamburger. Better
         | foods definitely exist out there but nobody 's been able to
         | prove it yet.
        
       | amelius wrote:
       | Of course he's sick. He could have made billions.
        
         | efskap wrote:
         | But attention is all he needs.
        
         | rzzzt wrote:
         | When you have your (next) lightbulb moment, how would you
         | monetize such an idea? Royalties? 1c after each request?
        
           | BoorishBears wrote:
           | Leave and raise a round right away.
        
         | password54321 wrote:
         | Money has diminishing returns. Not everyone wants to buy
         | Twitter.
        
       | dekhn wrote:
       | The way I look at transformers is: they have been one of the most
       | fertile inventions in recent history. Originally released in
       | 2017, in the subsequent 8 years they completely transformed (heh)
       | multiple fields, and at least partially led to one Nobel prize.
       | 
       | realistically, I think the valuable idea is probabilistic
       | graphical models- of which transformers is an example- combining
       | probability with sequences, or with trees and graphs- is likely
       | to continue to be a valuable area for research exploration for
       | the foreseeable future.
        
         | jimbo808 wrote:
         | Which fields have they completely transformed? How was it
         | before and how is it now? I won't pretend like it hasn't
         | impacted my field, but I would say the impact is almost
         | entirely negative.
        
           | Profan wrote:
           | hah well, transformative doesn't necessarily mean positive!
        
             | econ wrote:
             | All we get is distraction.
        
           | dekhn wrote:
           | Genomics, protein structure prediction, various forms of
           | small molecule and large molecule drug discovery.
        
             | thesz wrote:
             | No neural protein structure prediction papers I read have
             | compared transformers to SAT solvers.
             | 
             | As if this approach [1] does not exist.
             | 
             | [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC7197060/
        
           | jimmyl02 wrote:
           | in the super public consumer space, search engines / answer
           | engines (like chatgpt) are the big ones.
           | 
           | on the other hand it's also led to improvements in many
           | places hidden behind the scenes. for example, vision
           | transformers are much more powerful and scalable than many of
           | the other computer vision models which has probably led to
           | new capabilities.
           | 
           | in general, transformers aren't just "generate text" but it's
           | a new foundational model architecture which enables a leap
           | step in many things which require modeling!
        
             | ACCount37 wrote:
             | Transformers also make for a damn good base to graft just
             | about any other architecture onto.
             | 
             | Like, vision transformers? They seem to work best when they
             | still have a CNN backbone, but the "transformer" component
             | is very good at focusing on relevant information, and doing
             | different things depending on what you want to be done with
             | those images.
             | 
             | And if you bolt that hybrid vision transformer to an even
             | larger language-oriented transformer? That also imbues it
             | with basic problem-solving, world knowledge and commonsense
             | reasoning capabilities - which, in things like advanced OCR
             | systems, are very welcome.
        
           | CamperBob2 wrote:
           | _Which fields have they completely transformed?_
           | 
           | Simultaneously discovering and leveraging the functional
           | nature of language seems like kind of a big deal.
        
             | jimbo808 wrote:
             | Can you explain what this means?
        
               | CamperBob2 wrote:
               | Given that we can train a transformer model by shoveling
               | large amounts of inert text at it, and then use it to
               | compose original works and solve original problems with
               | the addition of nothing more than generic computing
               | power, we can conclude that there's nothing special about
               | what the human brain does.
               | 
               | All that remains is to come up with a way to integrate
               | short-term experience into long-term memory, and we can
               | call the job of emulating our brains done, at least in
               | principle. Everything after that will amount to detail
               | work.
        
               | jimbo808 wrote:
               | > we can conclude that there's nothing special about what
               | the human brain does
               | 
               | ...lol. Yikes.
               | 
               | I do not accept your premise. At all.
               | 
               | > use it to compose original works and solve original
               | problems
               | 
               | Which original works and original problems have LLMs
               | solved, exactly? You might find a random article or
               | stealth marketing paper that claims to have solved some
               | novel problem, but if what you're saying were actually
               | true, we'd be flooded with original works and new
               | problems being solved. So where are all these original
               | works?
               | 
               | > All that remains is to come up with a way to integrate
               | short-term experience into long-term memory, and we can
               | call the job of emulating our brains done, at least in
               | principle
               | 
               | What experience do you have that caused you to believe
               | these things?
        
               | CamperBob2 wrote:
               | Which is fine, but it's now clear where the burden of
               | proof lies, and IMHO we have transformer-based language
               | models to thank for that.
               | 
               | If anyone still insists on hidden magical components
               | ranging from immortal souls to Penrose's quantum woo,
               | well... let's see what you've got.
        
               | jimbo808 wrote:
               | I had edited my comment, I think you replied before I
               | saved it.
        
               | CamperBob2 wrote:
               | I was just saying that it's fine if you don't accept my
               | premise, but that doesn't change the reality of the
               | premise.
               | 
               | The International Math Olympiad qualifies as solving
               | original problems, for example. If you disagree, that's a
               | case _you_ have to make. Transformer models are
               | unquestionably better at math than I am. They are also
               | better at composition, and will soon be better at
               | programming if they aren 't already.
               | 
               | Every time a magazine editor is fooled by AI slop, every
               | time an entire subreddit loses the Turing test to
               | somebody's ethically-questionable 'experiment', every
               | time an AI-rendered image wins a contest meant for human
               | artists -- those are original works.
               | 
               | Heck, looking at my Spotify playlist, I'd be amazed if I
               | haven't already been fooled by AI-composed music. If it
               | hasn't happened yet, it will probably happen next week,
               | or maybe next year. Certainly within the next five years.
        
               | rhetocj23 wrote:
               | Someones drank too much of the AI-hype-juice. You'll
               | sober up in time.
        
               | Call_center wrote:
               | Cara Untuk membatalkan pinjaman Adapundi, Anda harus
               | menghubungi layanan pelanggan melalui Live Chat via WA di
               | 0813-5138-4097, atau Cs 0838-4068-5703, Siapkan data diri
               | seperti KTP dan ikuti instruksi dari petugas customer
               | service untuk proses pembatalan lebih lanjut.
        
               | leptons wrote:
               | Humans hallucinate too, but there's usually dysfunction,
               | and it's not expected as a normal operational output.
               | 
               | >If anyone still insists on hidden magical components
               | ranging from immortal souls to Penrose's quantum woo,
               | well... let's see what you've got.
               | 
               | This isn't too far off from the marketing and hypesteria
               | surrounding "AI" companies.
        
               | emptysongglass wrote:
               | No, the burden of proof is _on you_ to deliver. You are
               | the claimant, _you_ provide the proof. You made a drive-
               | by assertion with no evidence or even arguments.
               | 
               | I also do not accept your assertion, at all. Humans
               | largely function on the basis of desire-fulfilment, be
               | that eating, fucking, seeking safety, gaining power, or
               | any of the other myriad human activities. Our brains, and
               | the brains of all the animals before us, have evolved for
               | that purpose. For evidence, start with Skinner or the
               | millions of behavioral analysis studies done in that
               | field.
               | 
               | Our thoughts lend themselves to those activities. They
               | arise from desire. Transformers have nothing to do with
               | human cognition because they do not contain the basic
               | chemical building blocks that precede and give rise to
               | human cognition. They are, in fact, stochastic parrots,
               | that can fool others, like yourself, into believing they
               | are somehow thinking.
               | 
               | [1] Libet, B., Gleason, C. A., Wright, E. W., & Pearl, D.
               | K. (1983). Time of conscious intention to act in relation
               | to onset of cerebral activity (readiness-potential).
               | Brain, 106(3), 623-642.
               | 
               | [2] Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J.
               | D. (2008). Unconscious determinants of free decisions in
               | the human brain. Nature Neuroscience, 11(5), 543-545.
               | 
               | [3] Berridge, K. C., & Robinson, T. E. (2003). Parsing
               | reward. Trends in Neurosciences, 26(9), 507-513. (This
               | paper reviews the "wanting" vs. "liking" distinction,
               | where unconscious "wanting" or desire is driven by
               | dopamine).
               | 
               | [4] Kavanagh, D. J., Andrade, J., & May, J. (2005).
               | Elaborated Intrusion theory of desire: a multi-component
               | cognitive model of craving. British Journal of Health
               | Psychology, 10(4), 515-532. (This model proposes that
               | desires begin as unconscious "intrusions" that precede
               | conscious thought and elaboration).
        
               | CamperBob2 wrote:
               | If anything, your citation 1, along with subsequent fMRI
               | studies, backs up my point. We literally don't know what
               | we're going to do next. Is that a hallmark of cognition
               | in your book? The rest are simply irrelevant.
               | 
               |  _They are, in fact, stochastic parrots, that can fool
               | others, like yourself, into believing they are somehow
               | thinking._
               | 
               | What makes you think you're not arguing with one now?
        
               | emptysongglass wrote:
               | How does that back up your point?
               | 
               | You are not making an argument, you are just making
               | assertions without evidence and then telling us the
               | burden of proof is on us to tell you why not.
               | 
               | If you went walking down the streets yelling the world is
               | run by a secret cabal of reptile-people without evidence,
               | you would rightfully be declared insane.
               | 
               | Our feelings and desires largely determine the content of
               | our thoughts and actions. LLMs do not function as such.
               | 
               | Whether I am arguing with a parrot or not has nothing to
               | do with cognition. A parrot being able to usefully fool a
               | human has nothing to do with cognition.
        
               | Marshferm wrote:
               | If the brain only uses language like a sportscaster
               | explaining post-hoc what the self and others are doing
               | (experimental evidence 2003, empirical proof 2016), then
               | what's special about brains is entirely separate from
               | what language is or appears to be. It's not even like a
               | ticker tape that records trades, it's like a disengaged,
               | arbitrary set of sequences that have nothing to do with
               | what we're doing (and thinking!).
               | 
               | Language is like a disembodied science-fiction narration.
               | 
               | Wegener's Illusion of Conscious Will
               | 
               | https://www.its.caltech.edu/~squartz/wegner2.pdf
               | 
               | Fedorenko's Language and Thought are Not The Same Thing
               | 
               | https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/
        
           | isoprophlex wrote:
           | Everyone who did NLP research or product discovery in the
           | past 5 years had to pivot real hard to salvage their shit
           | post-transformers. They're very disruptively good at most NLP
           | task
           | 
           | edit: _post-transformers_ meaning  "in the era after
           | transformers were widely adopted" not some mystical new wave
           | of hypothetical tech to disrupt transformers themselves.
        
             | rootnod3 wrote:
             | So, unless this went r/woosh over my head....how is current
             | AI better than shit post-transformers? If all....old shit
             | post-transformers are at least deterministic or open and
             | not a randomized shitbox.
             | 
             | Unless I misinterpreted the post, render me confused.
        
               | dgacmu wrote:
               | I think you're misinterpreting: "with the advent of
               | transformers, (many) people doing NLP with pre-
               | transformers techniques had to salvage their shit"
        
               | isoprophlex wrote:
               | I wasn't too clear, I think. Apologies if the wording was
               | confusing.
               | 
               | People who started their NLP work (PhDs etc; industry
               | research projects) _before_ the LLM  / transformer craze
               | had to adapt to the new world. (Hence 'post-mass-uptake-
               | of-transformers')
        
               | numpad0 wrote:
               | There's no post-transformer tech. There are lots of NLP
               | tasks that you can now, just, _prompt_ an LLM to do.
        
               | isoprophlex wrote:
               | Yeah unclear wording; see the sibling comment also. I
               | meant "the tech we have now", in the era after "attention
               | is all you need"
        
             | dingnuts wrote:
             | Sorry but you didn't really answer the question. The
             | original claim was that transformers changed a whole bunch
             | of fields, and you listed literally the one thing language
             | models are directly useful for.. modeling language.
             | 
             | I think this might be the ONLY example that doesn't back up
             | the original claim, because of course an advancement in
             | language processing is an advancement in language
             | processing -- that's tautological! every new technology is
             | an advancement in its domain; what's claimed to be special
             | about transformers is that they are allegedly disruptive
             | OUTSIDE of NLP. "Which fields have been transformed?" means
             | ASIDE FROM language processing.
             | 
             | other than disrupting users by forcing "AI" features they
             | don't want on them... what examples of transformers being
             | revolutionary exist outside of NLP?
             | 
             | Claude Code? lol
        
               | iknowstuff wrote:
               | https://x.com/aelluswamy/status/1981760576591393203
               | 
               | saving lives
        
               | dingnuts wrote:
               | I'm not watching a video on Twitter about self driving
               | from the company who told us twelve years ago that
               | completely autonomous vehicles were a year away as a
               | rebuttal to the point I made.
               | 
               | If you have something relevant to say, you can summarize
               | for the class & include links to your receipts.
        
               | iknowstuff wrote:
               | your choice, I don't really care about your opinion
        
               | ComplexSystems wrote:
               | Transformers aren't only used in language processing.
               | They're very useful in image processing, video, audio,
               | etc. They're kind of like a general-purpose replacement
               | for RNNs that are better in many ways.
        
               | dotnet00 wrote:
               | I think they meant fields of research. If you do anything
               | in NLP, CV, inverse-problem solving or simulations,
               | things have changed drastically.
               | 
               | Some directly, because LLMs and highly capable general
               | purpose classifiers that might be enough for your use
               | case are just out there, and some because of downstream
               | effects, like GPU-compute being far more common, hardware
               | optimized for tasks like matrix multiplication and mature
               | well-maintained libraries with automatic differentiation
               | capabilities. Plus the emergence of things that mix both
               | classical ML and transformers, like training networks to
               | approximate intermolecular potentials faster than the ab-
               | initio calculation, allowing for accelerating molecular
               | dynamics simulations.
        
               | conartist6 wrote:
               | The goal was never to answer the question. So what if
               | it's worse. It's not worse for the researchers. It's not
               | worse for the CEOs and the people who work for the AI
               | companies. They're bathing in the limelight so their
               | actual goal, as they would state it to themselves, is:
               | "To get my bit of the limelight"
        
               | conartist6 wrote:
               | >The final conversation on Sewell's screen was with a
               | chatbot in the persona of Daenerys Targaryen, the
               | beautiful princess and Mother of Dragons from "Game of
               | Thrones." > >"I promise I will come home to you," Sewell
               | wrote. "I love you so much, Dany." > >"I love you, too,"
               | the chatbot replied. "Please come home to me as soon as
               | possible, my love." > >"What if I told you I could come
               | home right now?" he asked. > >"Please do, my sweet king."
               | > >Then he pulled the trigger.
               | 
               | Reading the newspaper is such a lovely experience these
               | days. But hey, the AI researchers are really excited so
               | who really cares if stuff like this happens if we can
               | declare that "therapy is transformed!"
               | 
               | It sure is. Could it have been that attention was all
               | that kid needed?
        
               | rcbdev wrote:
               | As a professor and lecturer, I can safely assure you that
               | the transformer model has disrupted the way students
               | learn - iin the literal sense of the word.
        
           | warkdarrior wrote:
           | Spam detection and phishing detection are completely
           | different than 5 years ago, as one cannot rely on typos and
           | grammar mistakes to identify bad content.
        
             | onlyrealcuzzo wrote:
             | The signals might be different, but the underlying
             | mechanism is still incredibly efficient, no?
        
             | walkabout wrote:
             | Spam, scams, propaganda, and astroturfing are easily the
             | largest beneficiaries of LLM automation, so far. LLMs are
             | exactly the 100x rocket-boots their boosters are promising
             | for other areas (without such results outside a few tiny,
             | but sometimes important, niches, so far) when what you're
             | doing is producing throw-away content at enormous scale and
             | have a high tolerance for mistakes, as long as the volume
             | is high.
        
               | visarga wrote:
               | It seems unfair to call out LLMs for "spam, scams,
               | propaganda, and astroturfing." These problems are largely
               | the result of platform optimization for engagement and
               | SEO competition for attention. This isn't unique to
               | models; even we, humans, when operating without feedback,
               | generate mostly slop. Curation is performed by the
               | environment and the passage of time, which reveals
               | consequences. LLMs taken in isolation from their
               | environment are just as sloppy as brains in a similar
               | situation.
               | 
               | Therefore, the correct attitude to take regarding LLMs is
               | to create ways for them to receive useful feedback on
               | their outputs. When using a coding agent, have the agent
               | work against tests. Scaffold constraints and feedback
               | around it. AlphaZero, for example, had abundant
               | environmental feedback and achieved amazing (superhuman)
               | results. Other Alpha models (for math, coding, etc.) that
               | operated within validation loops reached olympic levels
               | in specific types of problem-solving. The limitation of
               | LLMs is actually a limitation of their incomplete
               | coupling with the external world.
               | 
               | In fact you don't even need a super intelligent agent to
               | make progress, it is sufficient to have copying and
               | competition, evolution shows it can create all life,
               | including us and our culture and technology without a
               | very smart learning algorithm. Instead what it has is
               | plenty of feedback. Intelligence is not in the brain or
               | the LLM, it is in the ecosystem, the society of agents,
               | and the world. Intelligence is the result of having to
               | pay the cost of our execution to continue to exist, a
               | strategy to balance the cost of life.
               | 
               | What I mean by feedback is exploration, when you execute
               | novel actions or actions in novel environment
               | configurations, and observe the outcomes. And adjust, and
               | iterate. So the feedback becomes part of the model, and
               | the model part of the action-feedback process. They co-
               | create each other.
        
               | walkabout wrote:
               | > It seems unfair to call out LLMs for "spam, scams,
               | propaganda, and astroturfing." These problems are largely
               | the result of platform optimization for engagement and
               | SEO competition for attention.
               | 
               | They didn't create those markets, but they're the markets
               | for which LLMs enhance productivity and capability the
               | best right now, because they're the ones that need the
               | least supervision of input to and output from the LLMs,
               | and they happen to be otherwise well-suited to the kind
               | of work it is, besides.
               | 
               | > This isn't unique to models; even we, humans, when
               | operating without feedback, generate mostly slop.
               | 
               | I don't understand the relevance of this.
               | 
               | > Curation is performed by the environment and the
               | passage of time, which reveals consequences.
               | 
               | It'd say it's revealed by human judgement and eroded by
               | chance, but either way, I still don't get the relevance.
               | 
               | > LLMs taken in isolation from their environment are just
               | as sloppy as brains in a similar situation.
               | 
               | Sure? And clouds are often fluffy. Water is often wet.
               | Relevance?
               | 
               | The rest of this is a description of how we can make LLMs
               | work better, which amounts to more work than required to
               | make LLMs pay off enormously for the purposes I called
               | out, so... are we even in disagreement? I don't disagree
               | that perhaps this will change, and explicitly bound my
               | original claim ("so far") for that reason.
               | 
               | ... are you actually demonstrating my point, on purpose,
               | by responding with LLM slop?
        
               | visarga wrote:
               | LLMs can generate slop if used without good feedback or
               | trying to minimize human contribution. But the same LLMs
               | can filter out the dark patterns. They can use search and
               | compare against dozens or hundreds of web pages, which is
               | like the deep research mode outputs. These reports can
               | still contain mistakes, but we can iterate - generate
               | multiple deep reports from different models with
               | different web search tools, and then do comparative
               | analysis once more. There is no reason we should consume
               | raw web full of "spam, scams, propaganda, and
               | astroturfing" today.
        
               | throwaway290 wrote:
               | So they can sort of maybe solve the problems they create
               | except some people profit from it and can mass manipulate
               | minds in new exciting ways
        
               | pixelpoet wrote:
               | > It seems unfair to call out LLMs for "spam, scams,
               | propaganda, and astroturfing."
               | 
               | You should hear HN talk about crypto. If the knife were
               | invented today they'd have a field day calling it the
               | most evil plaything of bandits, etc. Nothing about human
               | nature, of course.
               | 
               | Edit: There it is! Like clockwork.
        
               | econ wrote:
               | For a good while I joked that I could easily write a bot
               | that makes more interesting conversation than you. The
               | human slop will drown in AI slop. Looks like we wil need
               | to make more of an effort when publishing if not develop
               | our own personality.
        
           | jonas21 wrote:
           | Out of curiosity, what field are you in?
        
           | EGreg wrote:
           | AI fan (type 1 -- AI made a big breakthrough) meets AI
           | defender (type 2 -- AI has not fundamentally changed anything
           | that was already a problem).
           | 
           | Defenders are supposed to defend against attacks on AI, but
           | here it misfired, so the conversation should be interesting.
           | 
           | That's because the defender is actually a skeptic of AI. But
           | the first sentence sounded like a typical "nothing to see
           | here" defense of AI.
        
           | mountainriver wrote:
           | Software, and it's wildly positive.
           | 
           | Takes like this are utterly insane to me
        
             | Silamoth wrote:
             | It's had an impact on software for sure. Now I have to fix
             | my coworker's AI slop code all the time. I guess it could
             | be a positive for my job security. But acting like "AI" has
             | had a wildly positive impact on software seems, at best, a
             | simplification and, at worst, the opposite of reality.
        
             | sponnath wrote:
             | Wouldn't say it's transformative.
        
               | mrieck wrote:
               | My workflow is transformed. If yours isn't you're missing
               | out.
               | 
               | Days that I'd normally feel overwhelmed from requests by
               | management are just Claude Code and chill days now.
        
           | blibble wrote:
           | > but I would say the impact is almost entirely negative.
           | 
           | quite
           | 
           | the transformer innovation was to bring down the cost of
           | producing incorrect, but plausible looking content (slop) in
           | any modality to near zero
           | 
           | not a positive thing for anyone other than spammers
        
           | CHY872 wrote:
           | In computer vision transformers have basically taken over
           | most perception fields. If you look at paperswithcode
           | benchmarks it's common to find like 10/10 recent winners
           | being transformer based against common CV problems. Note, I'm
           | not talking about VLMs here, just small ViTs with a few
           | million parameters. YOLOs and other CNNs are still hanging
           | around for detection but it's only a matter of time.
        
             | thesz wrote:
             | Can it be that transformer-based solutions come from the
             | well-funded organizations that can spend vast amount of
             | money on training expensive (O(n^3)) models?
             | 
             | Are there any papers that compare predictive power against
             | compute needed?
        
         | AaronAPU wrote:
         | I have my own probabilistic hyper-graph model which I have
         | never written down in an article to share. You see people
         | converging on this idea all over if you're looking for it.
         | 
         | Wish there were more hours in the day.
        
           | rbartelme wrote:
           | Yeah I think this is definitely the future. Recently, I too
           | have spent considerable time on probabilistic hyper-graph
           | models in certain domains of science. Maybe it _is_ the next
           | big thing.
        
         | epistasis wrote:
         | > think the valuable idea is probabilistic graphical models- of
         | which transformers is an example- combining probability with
         | sequences, or with trees and graphs- is likely to continue to
         | be a valuable area for research exploration for the foreseeable
         | future.
         | 
         | As somebody who was a biiiiig user of probabilistic graphical
         | models, and felt kind of left behind in this brave new world of
         | stacked nets, I would love for my prior knowledge and
         | experience to become valuable for a broader set of problem
         | domains. However, I don't see it yet. Hope you are right!
        
           | cauliflower2718 wrote:
           | +1, I am also big user of PGMs, and also a big user of
           | transformers, and I don't know what the parent comment
           | talking about, beyond that for e.g. LLMs, sampling the next
           | token can be thought of as sampling from a conditional
           | distribution (of the next token, given previous tokens).
           | However, this connection of using transformers to sample from
           | conditional distributions is about autoregressive generation
           | and training using next-token prediction loss, not about the
           | transformer architecture itself, which mostly seems to be
           | good because it is expressive and scalable (i.e. can be
           | hardware-optimized).
           | 
           | Source: I am a PhD student, this is kinda my wheelhouse
        
         | hammock wrote:
         | > I think the valuable idea is probabilistic graphical models-
         | of which transformers is an example- combining probability with
         | sequences, or with trees and graphs- is likely to continue to
         | be a valuable area
         | 
         | I agree. Causal inference and symbolic reasoning would SUPER
         | juicy nuts to crack , more so than what we got from
         | transformers.
        
         | samsartor wrote:
         | I'm skeptical that we'll see a big breakthrough in the
         | architecture itself. As sick as we all are of transformers,
         | they are really good universal approximators. You can get some
         | marginal gains, but how more _universal_ are you realistically
         | going to get? I could be wrong, and I'm glad there are
         | researchers out there looking at alternatives like graphical
         | models, but for my money we need to look further afeild.
         | Reconsider the auto-regressive task, cross entropy loss, even
         | gradient descent optimization itself.
        
           | kingstnap wrote:
           | There are many many problems with attention.
           | 
           | The softmax has issues regarding attention sinks [1]. The
           | softmax also causes sharpness problems [2]. In general this
           | decision boundary being Euclidean dot products isn't actually
           | optimal for everything, there are many classes of problem
           | where you want polyhedral cones [3]. Positional embedding are
           | also janky af and so is rope tbh, I think Cannon layers are a
           | more promising alternative for horizontal alignment [4].
           | 
           | I still think there is plenty of room to improve these
           | things. But a lot of focus right now is unfortunately being
           | spent on benchmaxxing using flawed benchmarks that can be
           | hacked with memorization. I think a really promising and
           | underappreciated direction is synthetically coming up with
           | ideas and tests that mathematically do not work well and
           | proving that current arhitectures struggle with it. A great
           | example of this is the VITs need glasses paper [5], or belief
           | state transformers with their star task [6]. The Google one
           | about what are the limits of embedding dimensions also is
           | great and shows how the dimension of the QK part is actually
           | important to getting good retrevial [7].
           | 
           | [1] https://arxiv.org/abs/2309.17453
           | 
           | [2] https://arxiv.org/abs/2410.01104
           | 
           | [3] https://arxiv.org/abs/2505.17190
           | 
           | [4]
           | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5240330
           | 
           | [5] https://arxiv.org/abs/2406.04267
           | 
           | [6] https://arxiv.org/abs/2410.23506
           | 
           | [6] https://arxiv.org/abs/2508.21038
        
             | ACCount37 wrote:
             | If all your problems with attention are actually just
             | problems with softmax, then that's an easy fix. Delete
             | softmax lmao.
             | 
             | No but seriously, just fix the fucking softmax. Add a
             | dedicated "parking spot" like GPT-OSS does and eat the
             | gradient flow tax on that, or replace softmax with any of
             | the almost-softmax-but-not-really candidates. Plenty of
             | options there.
             | 
             | The reason why we're "benchmaxxing" is that benchmarks are
             | the metrics we have, and the only way by which we can sift
             | through this gajillion of "revolutionary new architecture
             | ideas" and get at the ones that show any promise at all. Of
             | which there are very few, and fewer still that are worth
             | their gains when you account for: there not being an
             | unlimited amount of compute. Especially not when it comes
             | to frontier training runs.
             | 
             | Memorization vs generalization is a well known idiot trap,
             | and we are all stupid dumb fucks in the face of applied ML.
             | Still, some benchmarks are harder to game than others
             | (guess how we found that out), and there's power in that.
        
           | eldenring wrote:
           | I think something with more uniform training and inference
           | setups, and otherwise equally hardware friendly, just as
           | easily trainable, and equally expressive could replace
           | transformers.
        
           | krychu wrote:
           | BDH
        
             | tim333 wrote:
             | Yeah that thing is quite interesting - baby dragon
             | hatchling https://news.ycombinator.com/item?id=45668408
             | https://youtu.be/mfV44-mtg7c
        
         | eli_gottlieb wrote:
         | > probabilistic graphical models- of which transformers is an
         | example
         | 
         | Having done my PhD in probabilistic programming... _what?_
        
           | pishpash wrote:
           | It's got nothing to do with PGM's. However, there is the
           | flavor of describing graph structure by soft edge weights vs.
           | hard/pruned edge connections. It's not that surprising that
           | one does better than the other, and it's a very obvious and
           | classical idea. For a time there were people working on NN
           | structure learning and this is a natural step. I don't think
           | there is any breakthrough here, other than that computation
           | power caught up to make it feasible.
        
         | pigeons wrote:
         | Not doubting in any way, but what are some fields it
         | transformed
        
       | bangaladore wrote:
       | > Now, as CTO and co-founder of Tokyo-based Sakana AI, Jones is
       | explicitly abandoning his own creation. "I personally made a
       | decision in the beginning of this year that I'm going to
       | drastically reduce the amount of time that I spend on
       | transformers," he said. "I'm explicitly now exploring and looking
       | for the next big thing."
       | 
       | So, this is really just a BS hype talk. This is just trying to
       | get more funding and VCs.
        
         | htrp wrote:
         | anyone know what they're trying to sell here?
        
           | aydyn wrote:
           | probably AI
        
           | gwbas1c wrote:
           | The ability to do original, academic research without the
           | pressure to build something marketable.
        
         | YC3498723984327 wrote:
         | His AI company is called "Fish AI"?? Does it mean their AI will
         | have the intelligence of a fish?
        
           | bangaladore wrote:
           | Without transformers, maybe.
           | 
           | /s
        
             | prmph wrote:
             | Hope were not talking about eels
        
           | v3ss0n wrote:
           | Or Fishy?
        
           | astrange wrote:
           | It's about collective intelligence, as seen in swarms of ants
           | or fish.
        
         | ivape wrote:
         | He sounds a lot like how some people behave when they reach a
         | "top". Suddenly that thing seems unworthy all of a sudden. It's
         | one of the reasons you'll see your favorite music artist
         | totally go a different direction on their next album. It's an
         | artistic process almost. There's a core arrogance involved,
         | that you were responsible for the outcome and can easily create
         | another great outcome.
        
           | bigyabai wrote:
           | When you're overpressured to succeed, it makes a lot of sense
           | to switch up your creative process in hopes of getting
           | something new or better.
           | 
           | It _doesn 't_ mean that you'll get good results by abandoning
           | prior art, either with LLMs or musicians. But it does signal
           | a sort of personal stress and insecurity, for sure.
        
             | ivape wrote:
             | It's a good process (although, many take it to its common
             | conclusion which is self-destruction). It's why the most
             | creative people are able to re-invent themselves. But one
             | must go into everything with both eyes open, and truly
             | humble themselves with the possibility that that may have
             | been the greatest achievement of their life, never to be
             | matched again.
             | 
             | I wonder if he can simply sit back and bask in the glory of
             | being one of the most important people during the infancy
             | of AI. Someone needs to interview this guy, would love to
             | see how he thinks.
        
           | dekhn wrote:
           | Many researchers who invent something new and powerful pivot
           | quickly to something new. that's because they're researchers,
           | and incentive is to develop new things that subsume the old
           | things. Other researchers will continue to work on improving
           | existing things and finding new applications to existing
           | problems, but they rarely get as much attention as the folks
           | who "discover" something new.
        
             | ASalazarMX wrote:
             | Also, not all researchers have the fortune of doing the
             | research they would want to. If he can do it, it would be
             | foolish not to take the opportunity.
        
           | moritzwarhier wrote:
           | Why "arrogance"? There are music artists that truly enjoy
           | making music and don't just see their purpose in maximizing
           | financial success and fan service?
           | 
           | There are other considerations that don't revolve around
           | money, but I feel it's arrogant to assume success is the only
           | motivation for musicians.
        
             | ivape wrote:
             | Sans money, it's arrogant because we know talent is god-
             | given. You are basically betting again that your natural
             | given trajectory has more leg room for more incredible
             | output. It's not a bad bet at all, but it is a bet. Some
             | talent is so incredible that it takes a while for the ego
             | to accept its limits. Jordan tried to come back at 40 and
             | Einstein fought quantum mechanics unto death. Accepting the
             | limits has nothing to do with mediocrity, and everything to
             | do with humility. You can still have an incredible
             | trajectory beyond belief (which I believe this person has
             | and will have).
        
               | tim333 wrote:
               | Einstein also got his nobel prize for basically
               | discovering quanta. I'm not sure he fought it so much as
               | tried to figure what's going on with it which is still
               | kind of unknown.
        
               | jrflowers wrote:
               | You know people get bored right? A person doesn't have to
               | have delusions of grandeur to get bored of something.
               | 
               | Alternatively, if anything it could be the exact opposite
               | of what you're describing. Maybe he sees an ecosystem
               | based on hype that provides little value compared to the
               | cost and wants to distance himself from it, like the
               | Keurig inventor.
        
           | Mistletoe wrote:
           | Sometimes it just turns out like Michael Jordan playing
           | baseball.
        
           | ambicapter wrote:
           | Or a core fear, that you'll never do something as good in the
           | same vein as the smash hit you already made, so you strike
           | off in a completely different direction.
        
           | dmix wrote:
           | That's just normal human behaviour to have evolving interests
           | 
           | Arrogance would be if explicitly chose to abandon it because
           | he thought he was better
        
           | toxic72 wrote:
           | Its also plausible that the research field attracts people
           | who want to explore the cutting edge and now that
           | transformers are no longer "that"... he wants to find
           | something novel.
        
         | cheschire wrote:
         | Well he got your _attention_ didn 't he?
        
         | brandall10 wrote:
         | Attention is all he needs.
        
           | osener wrote:
           | Reminds me of the headline I saw a long time ago: "50 years
           | later, inventor of the pixel says he's sorry that he made it
           | square."
        
           | LogicFailsMe wrote:
           | Sadly, he probably needs a lot more or he's gonna go all
           | Maslow...
        
             | Ey7NFZ3P0nzAe wrote:
             | link:
             | https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs
        
         | elicash wrote:
         | Why wouldn't this both be an attempt to get funding and also
         | him wanting to do something new? Certainly if he was wanting to
         | do something new he'd want it funded, too?
        
         | IncreasePosts wrote:
         | It would be hype talk if he said and my next big thing is X.
        
           | bangaladore wrote:
           | Well, that's why he needs funding. Hasn't figured out what
           | the next big thing is.
        
         | energy123 wrote:
         | It's also how curious scientists operate, they're always
         | itching for something creative and different.
        
         | password54321 wrote:
         | If it was about money it would probably be easier to double
         | down on something proven to make revenue rather than something
         | that doesn't even exist.
         | 
         | Edit: there is a cult around transformers.
        
       | mmaunder wrote:
       | If anyone has a video if it I think we'd all very much appreciate
       | you posting a link. I've tried and I can't find one.
        
       | InkCanon wrote:
       | The other big missing part here is the enormous incentives (and
       | punishments if you don't) to publish in the big three AI
       | conferences. And because quantity is being rewarded far more than
       | quantity, the meta is to do really shoddy and uninspired work
       | really quickly. The people I talk to have a 3 month time horizon
       | on their projects.
        
       | nabla9 wrote:
       | What "AI" means for most people is the software product they see,
       | but only a part of it is the underlying machine learning model.
       | Each foundation model receives additional training from thousands
       | of humans, often very lowly paid, and then many prompts are used
       | to fine-tune it all. It's 90% product development, not ML
       | research.
       | 
       | If you look at AI research papers, most of them are by people
       | trying to earn a PhD so they can get a high-paying job. They
       | demonstrate an ability to understand the current generation of AI
       | and tweak it, they create content for their CVs.
       | 
       | There is actual research going on, but it's tiny share of
       | everything, does not look impressive because it's not a product,
       | or a demo, but an experiment.
        
       | janalsncm wrote:
       | I have a feeling there is more research being done on non-
       | transformer based architectures now, not less. The tsunami of
       | money pouring in to make the next chatbot powered CRM doesn't
       | care about that though, so it might seem to be less.
       | 
       | I would also just fundamentally disagree with the assertion that
       | a new architecture will be the solution. We need better methods
       | to extract more value from the data that already exists. Ilya
       | Sutskever talked about this recently. You shouldn't need the
       | whole internet to get to a decent baseline. And that new method
       | may or may not use a transformer, I don't think that is the
       | problem.
        
         | fritzo wrote:
         | It looks like almost every AI researcher and lab who existed
         | pre-2017 is now focused on transformers somehow. I agree the
         | total number of researchers has increased, but I suspect the
         | ratio has moved faster, so there are now fewer total non-
         | transformer researchers.
        
           | janalsncm wrote:
           | Well, we also still use wheels despite them being invented
           | thousands of years ago. We have added tons of improvements on
           | top though, just as transformers have. The fact that wheels
           | perform poorly in mud doesn't mean you throw out the concept
           | of wheels. You add treads to grip the ground better.
           | 
           | If you check the DeepSeek OCR paper it shows text based
           | tokenization may be suboptimal. Also all of the MoE stuff,
           | reasoning, and RLHF. The 2017 paper is pretty primitive
           | compared to what we have now.
        
         | marcel-c13 wrote:
         | I think you misunderstood the article a bit by saying that the
         | assertion is "that a new architecture will be the solution".
         | That's not the assertion. It's simply a statement about the
         | lack of balance between exploration and exploitation. And the
         | desire to rebalance it. What's wrong with that?
        
         | tim333 wrote:
         | The assertion, or maybe idea, that a new architecture may be
         | the thing is kind of about building AGI rather than chatbots.
         | 
         | Like humans think about things and learn which may require some
         | differences from feed the internet in to pre-train your
         | transformer.
        
       | mcfry wrote:
       | Something which I haven't been able to fully parse that perhaps
       | someone has better insight into: aren't transformers inherently
       | only capable of inductive reasoning? In order to actually
       | progress to AGI, which is being promised at least as an
       | eventuality, don't models have to be capable of deduction?
       | Wouldn't that mean fundamentally changing the pipeline in some
       | way? And no, tools are not deduction. They are useful patches for
       | the lack of deduction.
       | 
       | Models need to move beyond the domain of parsing existing
       | information into existing ideas.
        
         | hammock wrote:
         | They can induct just can't generate new ideas. Its not going to
         | discover a new quark without a human in the loop somewhere
        
           | nightshift1 wrote:
           | maybe that's a good thing after all.
        
         | eli_gottlieb wrote:
         | That sounds like a category mistake to me. A proof assistant or
         | logic-programming system performs deduction, and just strapping
         | one of those to an LLM hasn't gotten us to "AGI".
        
           | mcfry wrote:
           | A proof assistant is a verifier, and a tool so therefor a
           | patch, so I really fail to see how that could be understood
           | as the LLM having deduction.
        
         | energy123 wrote:
         | I don't see any reason to think that transformers are not
         | capable of deductive reasoning. Stochasticity doesn't rule out
         | that ability. It just means the model might be wrong in its
         | deduction, just like humans are sometimes wrong.
        
       | wohoef wrote:
       | I'm tired of feeling like the articles I read are AI generated.
        
       | stevetron wrote:
       | And here I thought this would be about Transformers: Robots in
       | Disguise. The form of transformers I'm tired of hearing about.
        
       | stevetron wrote:
       | And here I thought this would be about Transformers: Robots in
       | Disguise. The form of transformers I'm tired of hearing about.
       | 
       | And the decepticons.
        
       | einrealist wrote:
       | I ask myself how much the focus of this industry on transformer
       | models is informed by the ease of computation on GPUs/NPUs, and
       | whether better AI technology is possible but would require much
       | greater computing power on traditional hardware architectures. We
       | depend so much on traditional computation architectures, it might
       | be a real blinder. My brain doesn't need 500 Watts, at least I
       | hope so.
        
       | alyxya wrote:
       | I think people care too much about trying to innovate a new model
       | architecture. Models are meant to create a compressed
       | representation of its training data. Even if you came up with a
       | more efficient compression, the capabilities of the model
       | wouldn't be any better. What is more relevant is finding more
       | efficient ways of training, like the shift to reinforcement
       | learning these days.
        
         | marcel-c13 wrote:
         | But isn't the max training efficiency naturally tied to the
         | architecture? Meaning other architecture have another training
         | efficiency landscape? I've said it somewhere else: It is not
         | about "caring too much about new model architecture" but to
         | have a balance between exploitation and exploration.
        
       | nextworddev wrote:
       | Isn't Sakana the one that got flack for falsely advertising its
       | CUDA codegen abilities?
        
       | Mithriil wrote:
       | My opinion on the "Attention is all you need" paper is that its
       | most important idea is the Positional Encoding. The transformer
       | head itself... is just another NN block among many.
        
       | nashashmi wrote:
       | Transformers have sucked up all the attention and money. And AI
       | scientists have been sucked in to the transformer-is-prime
       | industry.
       | 
       | We will spend more time in the space until we see bigger
       | roadblocks.
       | 
       | I really wished energy consumption was a very big roadblock that
       | forced them into still researching.
        
         | tim333 wrote:
         | I think it may be a future roadblock quite soon. If you look at
         | all the data centers planned and speed of it, it's going to be
         | a job getting the energy. xAI hacked it by putting about 20 gas
         | turbines around their data center which is giving locals health
         | problems from the pollution. I imagine that sort of thing will
         | be cracked down on.
        
           | dmix wrote:
           | If there's a legit long term demand for energy the market
           | will figure it out. I doubt that will be a long term issue.
           | It's just a short term one because of the gold rush. But
           | innovation doesn't have to happen overnight. The world
           | doesn't live or die on a subset of VC funds not 100xing
           | within a certain timeframe
           | 
           | Or it's possible China just builds the power capabilities
           | faster because they actually build new things
        
       | tippytippytango wrote:
       | It's difficult to do because of how well matched they are to the
       | hardware we have. They were partially designed to solve the
       | mismatch between RNNs and GPUs, and they are way too good at it.
       | If you come up with something truly new, it's quite likely you
       | have to influence hardware makers to help scale your idea. That
       | makes any new idea fundamentally coupled to hardware, and that's
       | the lesson we should be taking from this. Work on the idea as a
       | simultaneous synthesis of hardware and software. But, it also
       | means that fundamental change is measured in decade scales.
       | 
       | I get the impulse to do something new, to be radically different
       | and stand out, especially when everyone is obsessing over it, but
       | we are going to be stuck with transformers for a while.
        
         | danielmarkbruce wrote:
         | This is backwards. Algorithms that can be parallelized are
         | inherently superior, independent of the hardware. GPUs were
         | built to take advantage of the superiority and handle all kinds
         | of parallel algorithms well - graphics, scientific simulation,
         | signal processing, some financial calculations, and on and on.
         | 
         | There's a reason so much engineering effort has gone into
         | speculative execution, pipelining, multicore design etc -
         | parallelism is universally good. Even when "computers" were
         | human calculators, work was divided into independent chunks
         | that could be done simultaneously. The efficiency comes from
         | the math itself, not from the hardware it happens to run on.
         | 
         | RNNs are not parallelizable by nature. Each step depends on the
         | output of the previous one. Transformers removed that
         | sequential bottleneck.
        
       ___________________________________________________________________
       (page generated 2025-10-24 23:00 UTC)