[HN Gopher] 'Attention is all you need' coauthor says he's 'sick...
___________________________________________________________________
'Attention is all you need' coauthor says he's 'sick' of
transformers
Author : achow
Score : 290 points
Date : 2025-10-24 04:40 UTC (18 hours ago)
(HTM) web link (venturebeat.com)
(TXT) w3m dump (venturebeat.com)
| Xcelerate wrote:
| Haha, I like to joke that we were on track for the singularity in
| 2024, but it stalled because the research time gap between
| "profitable" and "recursive self-improvement" was just a _bit_
| too long that we 're now stranded on the transformer model for
| the next two decades until every last cent has been extracted
| from it.
| ai-christianson wrote:
| There's massive hardware and energy infra built out going on.
| None of that is specialized to run only transformers at this
| point, so wouldn't that create a huge incentive to find newer
| and better architectures to get the most out of all this
| hardware and energy infra?
| Mehvix wrote:
| >None of that is specialized to run only transformers at this
| point
|
| isn't this what [etched](https://www.etched.com/) is doing?
| imtringued wrote:
| Only being able to run transformers is a silly concept,
| because attention consists of two matrix multiplications,
| which are the standard operation in feed forward and
| convolutional layers. Basically, you get transformers for
| free.
| kadushka wrote:
| devil is in the details
| Davidzheng wrote:
| how do you know we're not at recursive self-improvement but the
| rate is just slower than human-mediated improvement?
| teleforce wrote:
| >The project, he said, was "very organic, bottom up," born from
| "talking over lunch or scrawling randomly on the whiteboard in
| the office."
|
| Many of the breakthrough and game changing inventions were done
| this way with the back of the envelope discussions, the other
| popular example was the Ethernet network.
|
| Some good stories of similar culture in AT&T Bell lab is well
| described in the Hamming's book [1].
|
| [1] Stripe Press The Art of Doing Science and Engineering:
|
| https://press.stripe.com/the-art-of-doing-science-and-engine...
| atonse wrote:
| True in creativity too.
|
| According to various stories pieced together, the ideas of 4 of
| Pixar's early hits were conceived on or around one lunch.
|
| Bug's Life, Wall-E, Monsters, Inc
| emi2k01 wrote:
| The fourth one is Finding Nemo
| CaptainOfCoit wrote:
| All transformative inventions and innovations seems to come
| from similar scenarios like "I was playing around with these
| things" or "I just met X at lunch and we discussed ...".
|
| I'm wondering how big impact work from home will really have on
| humanity in general, when so many of our life changing
| discoveries comes from the odd chance of two specific people
| happening to be in the same place at some moment in time.
| DyslexicAtheist wrote:
| I'd go back to the office in a heartbeat provided it was an
| actual office. And not an "open-office" layout, that people
| are forced to try to concentrate with all the noise and
| people passing behind them constantly.
|
| The agile treadmill (with PM's breathing down our necks) and
| features getting planned and delivered in 2 week-sprints, has
| also reduced our ability to just do something we feel needs
| getting done. Today you go to work to feed several layers of
| incompetent managers - there is no room for play, or for
| creativity. At least in most orgs I know.
|
| I think innovation (or even joy of being at work) needs more
| than just the office, or people, or a canteen, but an
| environment that supports it.
| entropicdrifter wrote:
| Personally, I try to under-promise on what I think I can do
| every sprint specifically so I can spend more time
| mentoring more junior engineers, brainstorming random
| ideas, and working on stuff that nobody has called out as
| something that needs working on yet.
|
| Basically, I set aside as much time as I can to squeeze in
| creativity and real engineering work into the job.
| Otherwise I'd go crazy from the grind of just cranking out
| deliverables
| DyslexicAtheist wrote:
| yeah that sounds like a good strategy to avoid burn-out.
| dekhn wrote:
| We have an open office surrounded by "breakout offices". I
| simply squat in one of the offices (I take most meetings
| over video chat), as do most of the other principals. I
| don't think I could do my job in an office if I couldn't
| have a room to work in most of the time.
|
| As for agile: I've made it clear to my PMs that I generally
| plan on a quarterly/half year basis and my work and other
| people's work adheres to that schedule, not weekly sprints
| (we stay up to date in a slack channel, no standups)
| fipar wrote:
| What you say is true, but let's not forget that Ken Thompson
| did the first version of unix in 3 weeks while his wife had
| gone to California with their child to visit relatives, so
| deep focus is important too.
|
| It seems, in those days, people at Bell Labs did get the best
| of both worlds: being able to have chance encounters with
| very smart people while also being able to just be gone for
| weeks to work undistracted.
|
| A dream job that probably didn't even feel like a job (at
| least that's the impression I get from hearing Thompson talk
| about that time).
| tagami wrote:
| Perhaps this is why we see AI devotees congregate in places
| like SF - increased probability
| bitwize wrote:
| One of the OG Unix guys (was it Kernighan?) literally specced
| out UTF-8 on a cocktail napkin.
| dekhn wrote:
| Thompson and Pike: https://en.wikipedia.org/wiki/UTF-8
|
| """Thompson's design was outlined on September 2, 1992, on a
| placemat in a New Jersey diner with Rob Pike. In the
| following days, Pike and Thompson implemented it and updated
| Plan 9 to use it throughout,[11] and then communicated their
| success back to X/Open, which accepted it as the
| specification for FSS-UTF.[9]"""
| liuliu wrote:
| And it is always felt to me that has lineage from neural Turing
| machine line of work as prior. The transformative part was 1.
| find a good task (machine translation) and a reasonable way to
| stack (encoder-decoder architecture); 2. run the experiment; 3.
| ditch the external KV store idea and just use self-projected
| KV.
|
| Related thread:https://threadreaderapp.com/thread/1864023344435
| 380613.html
| Proofread0592 wrote:
| I think a transformer wrote this article, seeing a suspicious
| number of em dashes in the last section
| DonHopkins wrote:
| The next big AI architectural fad will be "disrupters".
| judge2020 wrote:
| Maybe even 'terminators'
| yieldcrv wrote:
| These are evolutionary dead ends, sorry that I'm not inspired
| enough to see it any other way, this transformer based direction
| is good enough
|
| The LLM stack has enough branches of evolution within it for
| efficiency, agent-based work can power a new industrial
| revolution specifically around white collar workers on its own,
| while expanding the self-expression for personal fulfillment for
| everyone else
|
| Well have fun sir
| password54321 wrote:
| ^AI psychosis, never underestimate its effects.
|
| https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
| TheRealPomax wrote:
| tl;dr: AI is built on top of science done by people just "doing
| research", and transformers took off so hard that those same
| people now can't do any meaningful, real AI research anymore
| because everyone only wants to pay for "how to make this one
| single thing that everyone else is also doing, better" instead of
| being willing to fund research into literally anything else.
|
| It's like if someone invented the hamburger and every single food
| outlet decided to only serve hamburgers from that point on, only
| spending time and money on making the perfect hamburger, rather
| than spending time and effort on making great meals. Which sounds
| ludicrously far-fetched, but is exactly what happened here.
| jjtheblunt wrote:
| Good points, and it made me have a mini epiphany...
|
| i think you analogously just described Sun Microsystems, where
| Unixes (BSD originally in their case, generalized to SVR4 (?)
| hybrid later) worked soooo well, that NT was built as a
| hybridization for the Microsoft user base and Apple reabsorbed
| the BSD-Mach-DisplayPostscript hybridization spinoff NeXT,
| while Linux simultaneously thrived.
| marcel-c13 wrote:
| Dude now I want a hamburger :(
| hatthew wrote:
| This is a decent analogy, but I think it understates how good
| transformers are. People are all making hamburgers because it's
| _really hard_ to find anything better than a hamburger. Better
| foods definitely exist out there but nobody 's been able to
| prove it yet.
| amelius wrote:
| Of course he's sick. He could have made billions.
| efskap wrote:
| But attention is all he needs.
| rzzzt wrote:
| When you have your (next) lightbulb moment, how would you
| monetize such an idea? Royalties? 1c after each request?
| BoorishBears wrote:
| Leave and raise a round right away.
| password54321 wrote:
| Money has diminishing returns. Not everyone wants to buy
| Twitter.
| dekhn wrote:
| The way I look at transformers is: they have been one of the most
| fertile inventions in recent history. Originally released in
| 2017, in the subsequent 8 years they completely transformed (heh)
| multiple fields, and at least partially led to one Nobel prize.
|
| realistically, I think the valuable idea is probabilistic
| graphical models- of which transformers is an example- combining
| probability with sequences, or with trees and graphs- is likely
| to continue to be a valuable area for research exploration for
| the foreseeable future.
| jimbo808 wrote:
| Which fields have they completely transformed? How was it
| before and how is it now? I won't pretend like it hasn't
| impacted my field, but I would say the impact is almost
| entirely negative.
| Profan wrote:
| hah well, transformative doesn't necessarily mean positive!
| econ wrote:
| All we get is distraction.
| dekhn wrote:
| Genomics, protein structure prediction, various forms of
| small molecule and large molecule drug discovery.
| thesz wrote:
| No neural protein structure prediction papers I read have
| compared transformers to SAT solvers.
|
| As if this approach [1] does not exist.
|
| [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC7197060/
| jimmyl02 wrote:
| in the super public consumer space, search engines / answer
| engines (like chatgpt) are the big ones.
|
| on the other hand it's also led to improvements in many
| places hidden behind the scenes. for example, vision
| transformers are much more powerful and scalable than many of
| the other computer vision models which has probably led to
| new capabilities.
|
| in general, transformers aren't just "generate text" but it's
| a new foundational model architecture which enables a leap
| step in many things which require modeling!
| ACCount37 wrote:
| Transformers also make for a damn good base to graft just
| about any other architecture onto.
|
| Like, vision transformers? They seem to work best when they
| still have a CNN backbone, but the "transformer" component
| is very good at focusing on relevant information, and doing
| different things depending on what you want to be done with
| those images.
|
| And if you bolt that hybrid vision transformer to an even
| larger language-oriented transformer? That also imbues it
| with basic problem-solving, world knowledge and commonsense
| reasoning capabilities - which, in things like advanced OCR
| systems, are very welcome.
| CamperBob2 wrote:
| _Which fields have they completely transformed?_
|
| Simultaneously discovering and leveraging the functional
| nature of language seems like kind of a big deal.
| jimbo808 wrote:
| Can you explain what this means?
| CamperBob2 wrote:
| Given that we can train a transformer model by shoveling
| large amounts of inert text at it, and then use it to
| compose original works and solve original problems with
| the addition of nothing more than generic computing
| power, we can conclude that there's nothing special about
| what the human brain does.
|
| All that remains is to come up with a way to integrate
| short-term experience into long-term memory, and we can
| call the job of emulating our brains done, at least in
| principle. Everything after that will amount to detail
| work.
| jimbo808 wrote:
| > we can conclude that there's nothing special about what
| the human brain does
|
| ...lol. Yikes.
|
| I do not accept your premise. At all.
|
| > use it to compose original works and solve original
| problems
|
| Which original works and original problems have LLMs
| solved, exactly? You might find a random article or
| stealth marketing paper that claims to have solved some
| novel problem, but if what you're saying were actually
| true, we'd be flooded with original works and new
| problems being solved. So where are all these original
| works?
|
| > All that remains is to come up with a way to integrate
| short-term experience into long-term memory, and we can
| call the job of emulating our brains done, at least in
| principle
|
| What experience do you have that caused you to believe
| these things?
| CamperBob2 wrote:
| Which is fine, but it's now clear where the burden of
| proof lies, and IMHO we have transformer-based language
| models to thank for that.
|
| If anyone still insists on hidden magical components
| ranging from immortal souls to Penrose's quantum woo,
| well... let's see what you've got.
| jimbo808 wrote:
| I had edited my comment, I think you replied before I
| saved it.
| CamperBob2 wrote:
| I was just saying that it's fine if you don't accept my
| premise, but that doesn't change the reality of the
| premise.
|
| The International Math Olympiad qualifies as solving
| original problems, for example. If you disagree, that's a
| case _you_ have to make. Transformer models are
| unquestionably better at math than I am. They are also
| better at composition, and will soon be better at
| programming if they aren 't already.
|
| Every time a magazine editor is fooled by AI slop, every
| time an entire subreddit loses the Turing test to
| somebody's ethically-questionable 'experiment', every
| time an AI-rendered image wins a contest meant for human
| artists -- those are original works.
|
| Heck, looking at my Spotify playlist, I'd be amazed if I
| haven't already been fooled by AI-composed music. If it
| hasn't happened yet, it will probably happen next week,
| or maybe next year. Certainly within the next five years.
| rhetocj23 wrote:
| Someones drank too much of the AI-hype-juice. You'll
| sober up in time.
| Call_center wrote:
| Cara Untuk membatalkan pinjaman Adapundi, Anda harus
| menghubungi layanan pelanggan melalui Live Chat via WA di
| 0813-5138-4097, atau Cs 0838-4068-5703, Siapkan data diri
| seperti KTP dan ikuti instruksi dari petugas customer
| service untuk proses pembatalan lebih lanjut.
| leptons wrote:
| Humans hallucinate too, but there's usually dysfunction,
| and it's not expected as a normal operational output.
|
| >If anyone still insists on hidden magical components
| ranging from immortal souls to Penrose's quantum woo,
| well... let's see what you've got.
|
| This isn't too far off from the marketing and hypesteria
| surrounding "AI" companies.
| emptysongglass wrote:
| No, the burden of proof is _on you_ to deliver. You are
| the claimant, _you_ provide the proof. You made a drive-
| by assertion with no evidence or even arguments.
|
| I also do not accept your assertion, at all. Humans
| largely function on the basis of desire-fulfilment, be
| that eating, fucking, seeking safety, gaining power, or
| any of the other myriad human activities. Our brains, and
| the brains of all the animals before us, have evolved for
| that purpose. For evidence, start with Skinner or the
| millions of behavioral analysis studies done in that
| field.
|
| Our thoughts lend themselves to those activities. They
| arise from desire. Transformers have nothing to do with
| human cognition because they do not contain the basic
| chemical building blocks that precede and give rise to
| human cognition. They are, in fact, stochastic parrots,
| that can fool others, like yourself, into believing they
| are somehow thinking.
|
| [1] Libet, B., Gleason, C. A., Wright, E. W., & Pearl, D.
| K. (1983). Time of conscious intention to act in relation
| to onset of cerebral activity (readiness-potential).
| Brain, 106(3), 623-642.
|
| [2] Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J.
| D. (2008). Unconscious determinants of free decisions in
| the human brain. Nature Neuroscience, 11(5), 543-545.
|
| [3] Berridge, K. C., & Robinson, T. E. (2003). Parsing
| reward. Trends in Neurosciences, 26(9), 507-513. (This
| paper reviews the "wanting" vs. "liking" distinction,
| where unconscious "wanting" or desire is driven by
| dopamine).
|
| [4] Kavanagh, D. J., Andrade, J., & May, J. (2005).
| Elaborated Intrusion theory of desire: a multi-component
| cognitive model of craving. British Journal of Health
| Psychology, 10(4), 515-532. (This model proposes that
| desires begin as unconscious "intrusions" that precede
| conscious thought and elaboration).
| CamperBob2 wrote:
| If anything, your citation 1, along with subsequent fMRI
| studies, backs up my point. We literally don't know what
| we're going to do next. Is that a hallmark of cognition
| in your book? The rest are simply irrelevant.
|
| _They are, in fact, stochastic parrots, that can fool
| others, like yourself, into believing they are somehow
| thinking._
|
| What makes you think you're not arguing with one now?
| emptysongglass wrote:
| How does that back up your point?
|
| You are not making an argument, you are just making
| assertions without evidence and then telling us the
| burden of proof is on us to tell you why not.
|
| If you went walking down the streets yelling the world is
| run by a secret cabal of reptile-people without evidence,
| you would rightfully be declared insane.
|
| Our feelings and desires largely determine the content of
| our thoughts and actions. LLMs do not function as such.
|
| Whether I am arguing with a parrot or not has nothing to
| do with cognition. A parrot being able to usefully fool a
| human has nothing to do with cognition.
| Marshferm wrote:
| If the brain only uses language like a sportscaster
| explaining post-hoc what the self and others are doing
| (experimental evidence 2003, empirical proof 2016), then
| what's special about brains is entirely separate from
| what language is or appears to be. It's not even like a
| ticker tape that records trades, it's like a disengaged,
| arbitrary set of sequences that have nothing to do with
| what we're doing (and thinking!).
|
| Language is like a disembodied science-fiction narration.
|
| Wegener's Illusion of Conscious Will
|
| https://www.its.caltech.edu/~squartz/wegner2.pdf
|
| Fedorenko's Language and Thought are Not The Same Thing
|
| https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/
| isoprophlex wrote:
| Everyone who did NLP research or product discovery in the
| past 5 years had to pivot real hard to salvage their shit
| post-transformers. They're very disruptively good at most NLP
| task
|
| edit: _post-transformers_ meaning "in the era after
| transformers were widely adopted" not some mystical new wave
| of hypothetical tech to disrupt transformers themselves.
| rootnod3 wrote:
| So, unless this went r/woosh over my head....how is current
| AI better than shit post-transformers? If all....old shit
| post-transformers are at least deterministic or open and
| not a randomized shitbox.
|
| Unless I misinterpreted the post, render me confused.
| dgacmu wrote:
| I think you're misinterpreting: "with the advent of
| transformers, (many) people doing NLP with pre-
| transformers techniques had to salvage their shit"
| isoprophlex wrote:
| I wasn't too clear, I think. Apologies if the wording was
| confusing.
|
| People who started their NLP work (PhDs etc; industry
| research projects) _before_ the LLM / transformer craze
| had to adapt to the new world. (Hence 'post-mass-uptake-
| of-transformers')
| numpad0 wrote:
| There's no post-transformer tech. There are lots of NLP
| tasks that you can now, just, _prompt_ an LLM to do.
| isoprophlex wrote:
| Yeah unclear wording; see the sibling comment also. I
| meant "the tech we have now", in the era after "attention
| is all you need"
| dingnuts wrote:
| Sorry but you didn't really answer the question. The
| original claim was that transformers changed a whole bunch
| of fields, and you listed literally the one thing language
| models are directly useful for.. modeling language.
|
| I think this might be the ONLY example that doesn't back up
| the original claim, because of course an advancement in
| language processing is an advancement in language
| processing -- that's tautological! every new technology is
| an advancement in its domain; what's claimed to be special
| about transformers is that they are allegedly disruptive
| OUTSIDE of NLP. "Which fields have been transformed?" means
| ASIDE FROM language processing.
|
| other than disrupting users by forcing "AI" features they
| don't want on them... what examples of transformers being
| revolutionary exist outside of NLP?
|
| Claude Code? lol
| iknowstuff wrote:
| https://x.com/aelluswamy/status/1981760576591393203
|
| saving lives
| dingnuts wrote:
| I'm not watching a video on Twitter about self driving
| from the company who told us twelve years ago that
| completely autonomous vehicles were a year away as a
| rebuttal to the point I made.
|
| If you have something relevant to say, you can summarize
| for the class & include links to your receipts.
| iknowstuff wrote:
| your choice, I don't really care about your opinion
| ComplexSystems wrote:
| Transformers aren't only used in language processing.
| They're very useful in image processing, video, audio,
| etc. They're kind of like a general-purpose replacement
| for RNNs that are better in many ways.
| dotnet00 wrote:
| I think they meant fields of research. If you do anything
| in NLP, CV, inverse-problem solving or simulations,
| things have changed drastically.
|
| Some directly, because LLMs and highly capable general
| purpose classifiers that might be enough for your use
| case are just out there, and some because of downstream
| effects, like GPU-compute being far more common, hardware
| optimized for tasks like matrix multiplication and mature
| well-maintained libraries with automatic differentiation
| capabilities. Plus the emergence of things that mix both
| classical ML and transformers, like training networks to
| approximate intermolecular potentials faster than the ab-
| initio calculation, allowing for accelerating molecular
| dynamics simulations.
| conartist6 wrote:
| The goal was never to answer the question. So what if
| it's worse. It's not worse for the researchers. It's not
| worse for the CEOs and the people who work for the AI
| companies. They're bathing in the limelight so their
| actual goal, as they would state it to themselves, is:
| "To get my bit of the limelight"
| conartist6 wrote:
| >The final conversation on Sewell's screen was with a
| chatbot in the persona of Daenerys Targaryen, the
| beautiful princess and Mother of Dragons from "Game of
| Thrones." > >"I promise I will come home to you," Sewell
| wrote. "I love you so much, Dany." > >"I love you, too,"
| the chatbot replied. "Please come home to me as soon as
| possible, my love." > >"What if I told you I could come
| home right now?" he asked. > >"Please do, my sweet king."
| > >Then he pulled the trigger.
|
| Reading the newspaper is such a lovely experience these
| days. But hey, the AI researchers are really excited so
| who really cares if stuff like this happens if we can
| declare that "therapy is transformed!"
|
| It sure is. Could it have been that attention was all
| that kid needed?
| rcbdev wrote:
| As a professor and lecturer, I can safely assure you that
| the transformer model has disrupted the way students
| learn - iin the literal sense of the word.
| warkdarrior wrote:
| Spam detection and phishing detection are completely
| different than 5 years ago, as one cannot rely on typos and
| grammar mistakes to identify bad content.
| onlyrealcuzzo wrote:
| The signals might be different, but the underlying
| mechanism is still incredibly efficient, no?
| walkabout wrote:
| Spam, scams, propaganda, and astroturfing are easily the
| largest beneficiaries of LLM automation, so far. LLMs are
| exactly the 100x rocket-boots their boosters are promising
| for other areas (without such results outside a few tiny,
| but sometimes important, niches, so far) when what you're
| doing is producing throw-away content at enormous scale and
| have a high tolerance for mistakes, as long as the volume
| is high.
| visarga wrote:
| It seems unfair to call out LLMs for "spam, scams,
| propaganda, and astroturfing." These problems are largely
| the result of platform optimization for engagement and
| SEO competition for attention. This isn't unique to
| models; even we, humans, when operating without feedback,
| generate mostly slop. Curation is performed by the
| environment and the passage of time, which reveals
| consequences. LLMs taken in isolation from their
| environment are just as sloppy as brains in a similar
| situation.
|
| Therefore, the correct attitude to take regarding LLMs is
| to create ways for them to receive useful feedback on
| their outputs. When using a coding agent, have the agent
| work against tests. Scaffold constraints and feedback
| around it. AlphaZero, for example, had abundant
| environmental feedback and achieved amazing (superhuman)
| results. Other Alpha models (for math, coding, etc.) that
| operated within validation loops reached olympic levels
| in specific types of problem-solving. The limitation of
| LLMs is actually a limitation of their incomplete
| coupling with the external world.
|
| In fact you don't even need a super intelligent agent to
| make progress, it is sufficient to have copying and
| competition, evolution shows it can create all life,
| including us and our culture and technology without a
| very smart learning algorithm. Instead what it has is
| plenty of feedback. Intelligence is not in the brain or
| the LLM, it is in the ecosystem, the society of agents,
| and the world. Intelligence is the result of having to
| pay the cost of our execution to continue to exist, a
| strategy to balance the cost of life.
|
| What I mean by feedback is exploration, when you execute
| novel actions or actions in novel environment
| configurations, and observe the outcomes. And adjust, and
| iterate. So the feedback becomes part of the model, and
| the model part of the action-feedback process. They co-
| create each other.
| walkabout wrote:
| > It seems unfair to call out LLMs for "spam, scams,
| propaganda, and astroturfing." These problems are largely
| the result of platform optimization for engagement and
| SEO competition for attention.
|
| They didn't create those markets, but they're the markets
| for which LLMs enhance productivity and capability the
| best right now, because they're the ones that need the
| least supervision of input to and output from the LLMs,
| and they happen to be otherwise well-suited to the kind
| of work it is, besides.
|
| > This isn't unique to models; even we, humans, when
| operating without feedback, generate mostly slop.
|
| I don't understand the relevance of this.
|
| > Curation is performed by the environment and the
| passage of time, which reveals consequences.
|
| It'd say it's revealed by human judgement and eroded by
| chance, but either way, I still don't get the relevance.
|
| > LLMs taken in isolation from their environment are just
| as sloppy as brains in a similar situation.
|
| Sure? And clouds are often fluffy. Water is often wet.
| Relevance?
|
| The rest of this is a description of how we can make LLMs
| work better, which amounts to more work than required to
| make LLMs pay off enormously for the purposes I called
| out, so... are we even in disagreement? I don't disagree
| that perhaps this will change, and explicitly bound my
| original claim ("so far") for that reason.
|
| ... are you actually demonstrating my point, on purpose,
| by responding with LLM slop?
| visarga wrote:
| LLMs can generate slop if used without good feedback or
| trying to minimize human contribution. But the same LLMs
| can filter out the dark patterns. They can use search and
| compare against dozens or hundreds of web pages, which is
| like the deep research mode outputs. These reports can
| still contain mistakes, but we can iterate - generate
| multiple deep reports from different models with
| different web search tools, and then do comparative
| analysis once more. There is no reason we should consume
| raw web full of "spam, scams, propaganda, and
| astroturfing" today.
| throwaway290 wrote:
| So they can sort of maybe solve the problems they create
| except some people profit from it and can mass manipulate
| minds in new exciting ways
| pixelpoet wrote:
| > It seems unfair to call out LLMs for "spam, scams,
| propaganda, and astroturfing."
|
| You should hear HN talk about crypto. If the knife were
| invented today they'd have a field day calling it the
| most evil plaything of bandits, etc. Nothing about human
| nature, of course.
|
| Edit: There it is! Like clockwork.
| econ wrote:
| For a good while I joked that I could easily write a bot
| that makes more interesting conversation than you. The
| human slop will drown in AI slop. Looks like we wil need
| to make more of an effort when publishing if not develop
| our own personality.
| jonas21 wrote:
| Out of curiosity, what field are you in?
| EGreg wrote:
| AI fan (type 1 -- AI made a big breakthrough) meets AI
| defender (type 2 -- AI has not fundamentally changed anything
| that was already a problem).
|
| Defenders are supposed to defend against attacks on AI, but
| here it misfired, so the conversation should be interesting.
|
| That's because the defender is actually a skeptic of AI. But
| the first sentence sounded like a typical "nothing to see
| here" defense of AI.
| mountainriver wrote:
| Software, and it's wildly positive.
|
| Takes like this are utterly insane to me
| Silamoth wrote:
| It's had an impact on software for sure. Now I have to fix
| my coworker's AI slop code all the time. I guess it could
| be a positive for my job security. But acting like "AI" has
| had a wildly positive impact on software seems, at best, a
| simplification and, at worst, the opposite of reality.
| sponnath wrote:
| Wouldn't say it's transformative.
| mrieck wrote:
| My workflow is transformed. If yours isn't you're missing
| out.
|
| Days that I'd normally feel overwhelmed from requests by
| management are just Claude Code and chill days now.
| blibble wrote:
| > but I would say the impact is almost entirely negative.
|
| quite
|
| the transformer innovation was to bring down the cost of
| producing incorrect, but plausible looking content (slop) in
| any modality to near zero
|
| not a positive thing for anyone other than spammers
| CHY872 wrote:
| In computer vision transformers have basically taken over
| most perception fields. If you look at paperswithcode
| benchmarks it's common to find like 10/10 recent winners
| being transformer based against common CV problems. Note, I'm
| not talking about VLMs here, just small ViTs with a few
| million parameters. YOLOs and other CNNs are still hanging
| around for detection but it's only a matter of time.
| thesz wrote:
| Can it be that transformer-based solutions come from the
| well-funded organizations that can spend vast amount of
| money on training expensive (O(n^3)) models?
|
| Are there any papers that compare predictive power against
| compute needed?
| AaronAPU wrote:
| I have my own probabilistic hyper-graph model which I have
| never written down in an article to share. You see people
| converging on this idea all over if you're looking for it.
|
| Wish there were more hours in the day.
| rbartelme wrote:
| Yeah I think this is definitely the future. Recently, I too
| have spent considerable time on probabilistic hyper-graph
| models in certain domains of science. Maybe it _is_ the next
| big thing.
| epistasis wrote:
| > think the valuable idea is probabilistic graphical models- of
| which transformers is an example- combining probability with
| sequences, or with trees and graphs- is likely to continue to
| be a valuable area for research exploration for the foreseeable
| future.
|
| As somebody who was a biiiiig user of probabilistic graphical
| models, and felt kind of left behind in this brave new world of
| stacked nets, I would love for my prior knowledge and
| experience to become valuable for a broader set of problem
| domains. However, I don't see it yet. Hope you are right!
| cauliflower2718 wrote:
| +1, I am also big user of PGMs, and also a big user of
| transformers, and I don't know what the parent comment
| talking about, beyond that for e.g. LLMs, sampling the next
| token can be thought of as sampling from a conditional
| distribution (of the next token, given previous tokens).
| However, this connection of using transformers to sample from
| conditional distributions is about autoregressive generation
| and training using next-token prediction loss, not about the
| transformer architecture itself, which mostly seems to be
| good because it is expressive and scalable (i.e. can be
| hardware-optimized).
|
| Source: I am a PhD student, this is kinda my wheelhouse
| hammock wrote:
| > I think the valuable idea is probabilistic graphical models-
| of which transformers is an example- combining probability with
| sequences, or with trees and graphs- is likely to continue to
| be a valuable area
|
| I agree. Causal inference and symbolic reasoning would SUPER
| juicy nuts to crack , more so than what we got from
| transformers.
| samsartor wrote:
| I'm skeptical that we'll see a big breakthrough in the
| architecture itself. As sick as we all are of transformers,
| they are really good universal approximators. You can get some
| marginal gains, but how more _universal_ are you realistically
| going to get? I could be wrong, and I'm glad there are
| researchers out there looking at alternatives like graphical
| models, but for my money we need to look further afeild.
| Reconsider the auto-regressive task, cross entropy loss, even
| gradient descent optimization itself.
| kingstnap wrote:
| There are many many problems with attention.
|
| The softmax has issues regarding attention sinks [1]. The
| softmax also causes sharpness problems [2]. In general this
| decision boundary being Euclidean dot products isn't actually
| optimal for everything, there are many classes of problem
| where you want polyhedral cones [3]. Positional embedding are
| also janky af and so is rope tbh, I think Cannon layers are a
| more promising alternative for horizontal alignment [4].
|
| I still think there is plenty of room to improve these
| things. But a lot of focus right now is unfortunately being
| spent on benchmaxxing using flawed benchmarks that can be
| hacked with memorization. I think a really promising and
| underappreciated direction is synthetically coming up with
| ideas and tests that mathematically do not work well and
| proving that current arhitectures struggle with it. A great
| example of this is the VITs need glasses paper [5], or belief
| state transformers with their star task [6]. The Google one
| about what are the limits of embedding dimensions also is
| great and shows how the dimension of the QK part is actually
| important to getting good retrevial [7].
|
| [1] https://arxiv.org/abs/2309.17453
|
| [2] https://arxiv.org/abs/2410.01104
|
| [3] https://arxiv.org/abs/2505.17190
|
| [4]
| https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5240330
|
| [5] https://arxiv.org/abs/2406.04267
|
| [6] https://arxiv.org/abs/2410.23506
|
| [6] https://arxiv.org/abs/2508.21038
| ACCount37 wrote:
| If all your problems with attention are actually just
| problems with softmax, then that's an easy fix. Delete
| softmax lmao.
|
| No but seriously, just fix the fucking softmax. Add a
| dedicated "parking spot" like GPT-OSS does and eat the
| gradient flow tax on that, or replace softmax with any of
| the almost-softmax-but-not-really candidates. Plenty of
| options there.
|
| The reason why we're "benchmaxxing" is that benchmarks are
| the metrics we have, and the only way by which we can sift
| through this gajillion of "revolutionary new architecture
| ideas" and get at the ones that show any promise at all. Of
| which there are very few, and fewer still that are worth
| their gains when you account for: there not being an
| unlimited amount of compute. Especially not when it comes
| to frontier training runs.
|
| Memorization vs generalization is a well known idiot trap,
| and we are all stupid dumb fucks in the face of applied ML.
| Still, some benchmarks are harder to game than others
| (guess how we found that out), and there's power in that.
| eldenring wrote:
| I think something with more uniform training and inference
| setups, and otherwise equally hardware friendly, just as
| easily trainable, and equally expressive could replace
| transformers.
| krychu wrote:
| BDH
| tim333 wrote:
| Yeah that thing is quite interesting - baby dragon
| hatchling https://news.ycombinator.com/item?id=45668408
| https://youtu.be/mfV44-mtg7c
| eli_gottlieb wrote:
| > probabilistic graphical models- of which transformers is an
| example
|
| Having done my PhD in probabilistic programming... _what?_
| pishpash wrote:
| It's got nothing to do with PGM's. However, there is the
| flavor of describing graph structure by soft edge weights vs.
| hard/pruned edge connections. It's not that surprising that
| one does better than the other, and it's a very obvious and
| classical idea. For a time there were people working on NN
| structure learning and this is a natural step. I don't think
| there is any breakthrough here, other than that computation
| power caught up to make it feasible.
| pigeons wrote:
| Not doubting in any way, but what are some fields it
| transformed
| bangaladore wrote:
| > Now, as CTO and co-founder of Tokyo-based Sakana AI, Jones is
| explicitly abandoning his own creation. "I personally made a
| decision in the beginning of this year that I'm going to
| drastically reduce the amount of time that I spend on
| transformers," he said. "I'm explicitly now exploring and looking
| for the next big thing."
|
| So, this is really just a BS hype talk. This is just trying to
| get more funding and VCs.
| htrp wrote:
| anyone know what they're trying to sell here?
| aydyn wrote:
| probably AI
| gwbas1c wrote:
| The ability to do original, academic research without the
| pressure to build something marketable.
| YC3498723984327 wrote:
| His AI company is called "Fish AI"?? Does it mean their AI will
| have the intelligence of a fish?
| bangaladore wrote:
| Without transformers, maybe.
|
| /s
| prmph wrote:
| Hope were not talking about eels
| v3ss0n wrote:
| Or Fishy?
| astrange wrote:
| It's about collective intelligence, as seen in swarms of ants
| or fish.
| ivape wrote:
| He sounds a lot like how some people behave when they reach a
| "top". Suddenly that thing seems unworthy all of a sudden. It's
| one of the reasons you'll see your favorite music artist
| totally go a different direction on their next album. It's an
| artistic process almost. There's a core arrogance involved,
| that you were responsible for the outcome and can easily create
| another great outcome.
| bigyabai wrote:
| When you're overpressured to succeed, it makes a lot of sense
| to switch up your creative process in hopes of getting
| something new or better.
|
| It _doesn 't_ mean that you'll get good results by abandoning
| prior art, either with LLMs or musicians. But it does signal
| a sort of personal stress and insecurity, for sure.
| ivape wrote:
| It's a good process (although, many take it to its common
| conclusion which is self-destruction). It's why the most
| creative people are able to re-invent themselves. But one
| must go into everything with both eyes open, and truly
| humble themselves with the possibility that that may have
| been the greatest achievement of their life, never to be
| matched again.
|
| I wonder if he can simply sit back and bask in the glory of
| being one of the most important people during the infancy
| of AI. Someone needs to interview this guy, would love to
| see how he thinks.
| dekhn wrote:
| Many researchers who invent something new and powerful pivot
| quickly to something new. that's because they're researchers,
| and incentive is to develop new things that subsume the old
| things. Other researchers will continue to work on improving
| existing things and finding new applications to existing
| problems, but they rarely get as much attention as the folks
| who "discover" something new.
| ASalazarMX wrote:
| Also, not all researchers have the fortune of doing the
| research they would want to. If he can do it, it would be
| foolish not to take the opportunity.
| moritzwarhier wrote:
| Why "arrogance"? There are music artists that truly enjoy
| making music and don't just see their purpose in maximizing
| financial success and fan service?
|
| There are other considerations that don't revolve around
| money, but I feel it's arrogant to assume success is the only
| motivation for musicians.
| ivape wrote:
| Sans money, it's arrogant because we know talent is god-
| given. You are basically betting again that your natural
| given trajectory has more leg room for more incredible
| output. It's not a bad bet at all, but it is a bet. Some
| talent is so incredible that it takes a while for the ego
| to accept its limits. Jordan tried to come back at 40 and
| Einstein fought quantum mechanics unto death. Accepting the
| limits has nothing to do with mediocrity, and everything to
| do with humility. You can still have an incredible
| trajectory beyond belief (which I believe this person has
| and will have).
| tim333 wrote:
| Einstein also got his nobel prize for basically
| discovering quanta. I'm not sure he fought it so much as
| tried to figure what's going on with it which is still
| kind of unknown.
| jrflowers wrote:
| You know people get bored right? A person doesn't have to
| have delusions of grandeur to get bored of something.
|
| Alternatively, if anything it could be the exact opposite
| of what you're describing. Maybe he sees an ecosystem
| based on hype that provides little value compared to the
| cost and wants to distance himself from it, like the
| Keurig inventor.
| Mistletoe wrote:
| Sometimes it just turns out like Michael Jordan playing
| baseball.
| ambicapter wrote:
| Or a core fear, that you'll never do something as good in the
| same vein as the smash hit you already made, so you strike
| off in a completely different direction.
| dmix wrote:
| That's just normal human behaviour to have evolving interests
|
| Arrogance would be if explicitly chose to abandon it because
| he thought he was better
| toxic72 wrote:
| Its also plausible that the research field attracts people
| who want to explore the cutting edge and now that
| transformers are no longer "that"... he wants to find
| something novel.
| cheschire wrote:
| Well he got your _attention_ didn 't he?
| brandall10 wrote:
| Attention is all he needs.
| osener wrote:
| Reminds me of the headline I saw a long time ago: "50 years
| later, inventor of the pixel says he's sorry that he made it
| square."
| LogicFailsMe wrote:
| Sadly, he probably needs a lot more or he's gonna go all
| Maslow...
| Ey7NFZ3P0nzAe wrote:
| link:
| https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs
| elicash wrote:
| Why wouldn't this both be an attempt to get funding and also
| him wanting to do something new? Certainly if he was wanting to
| do something new he'd want it funded, too?
| IncreasePosts wrote:
| It would be hype talk if he said and my next big thing is X.
| bangaladore wrote:
| Well, that's why he needs funding. Hasn't figured out what
| the next big thing is.
| energy123 wrote:
| It's also how curious scientists operate, they're always
| itching for something creative and different.
| password54321 wrote:
| If it was about money it would probably be easier to double
| down on something proven to make revenue rather than something
| that doesn't even exist.
|
| Edit: there is a cult around transformers.
| mmaunder wrote:
| If anyone has a video if it I think we'd all very much appreciate
| you posting a link. I've tried and I can't find one.
| InkCanon wrote:
| The other big missing part here is the enormous incentives (and
| punishments if you don't) to publish in the big three AI
| conferences. And because quantity is being rewarded far more than
| quantity, the meta is to do really shoddy and uninspired work
| really quickly. The people I talk to have a 3 month time horizon
| on their projects.
| nabla9 wrote:
| What "AI" means for most people is the software product they see,
| but only a part of it is the underlying machine learning model.
| Each foundation model receives additional training from thousands
| of humans, often very lowly paid, and then many prompts are used
| to fine-tune it all. It's 90% product development, not ML
| research.
|
| If you look at AI research papers, most of them are by people
| trying to earn a PhD so they can get a high-paying job. They
| demonstrate an ability to understand the current generation of AI
| and tweak it, they create content for their CVs.
|
| There is actual research going on, but it's tiny share of
| everything, does not look impressive because it's not a product,
| or a demo, but an experiment.
| janalsncm wrote:
| I have a feeling there is more research being done on non-
| transformer based architectures now, not less. The tsunami of
| money pouring in to make the next chatbot powered CRM doesn't
| care about that though, so it might seem to be less.
|
| I would also just fundamentally disagree with the assertion that
| a new architecture will be the solution. We need better methods
| to extract more value from the data that already exists. Ilya
| Sutskever talked about this recently. You shouldn't need the
| whole internet to get to a decent baseline. And that new method
| may or may not use a transformer, I don't think that is the
| problem.
| fritzo wrote:
| It looks like almost every AI researcher and lab who existed
| pre-2017 is now focused on transformers somehow. I agree the
| total number of researchers has increased, but I suspect the
| ratio has moved faster, so there are now fewer total non-
| transformer researchers.
| janalsncm wrote:
| Well, we also still use wheels despite them being invented
| thousands of years ago. We have added tons of improvements on
| top though, just as transformers have. The fact that wheels
| perform poorly in mud doesn't mean you throw out the concept
| of wheels. You add treads to grip the ground better.
|
| If you check the DeepSeek OCR paper it shows text based
| tokenization may be suboptimal. Also all of the MoE stuff,
| reasoning, and RLHF. The 2017 paper is pretty primitive
| compared to what we have now.
| marcel-c13 wrote:
| I think you misunderstood the article a bit by saying that the
| assertion is "that a new architecture will be the solution".
| That's not the assertion. It's simply a statement about the
| lack of balance between exploration and exploitation. And the
| desire to rebalance it. What's wrong with that?
| tim333 wrote:
| The assertion, or maybe idea, that a new architecture may be
| the thing is kind of about building AGI rather than chatbots.
|
| Like humans think about things and learn which may require some
| differences from feed the internet in to pre-train your
| transformer.
| mcfry wrote:
| Something which I haven't been able to fully parse that perhaps
| someone has better insight into: aren't transformers inherently
| only capable of inductive reasoning? In order to actually
| progress to AGI, which is being promised at least as an
| eventuality, don't models have to be capable of deduction?
| Wouldn't that mean fundamentally changing the pipeline in some
| way? And no, tools are not deduction. They are useful patches for
| the lack of deduction.
|
| Models need to move beyond the domain of parsing existing
| information into existing ideas.
| hammock wrote:
| They can induct just can't generate new ideas. Its not going to
| discover a new quark without a human in the loop somewhere
| nightshift1 wrote:
| maybe that's a good thing after all.
| eli_gottlieb wrote:
| That sounds like a category mistake to me. A proof assistant or
| logic-programming system performs deduction, and just strapping
| one of those to an LLM hasn't gotten us to "AGI".
| mcfry wrote:
| A proof assistant is a verifier, and a tool so therefor a
| patch, so I really fail to see how that could be understood
| as the LLM having deduction.
| energy123 wrote:
| I don't see any reason to think that transformers are not
| capable of deductive reasoning. Stochasticity doesn't rule out
| that ability. It just means the model might be wrong in its
| deduction, just like humans are sometimes wrong.
| wohoef wrote:
| I'm tired of feeling like the articles I read are AI generated.
| stevetron wrote:
| And here I thought this would be about Transformers: Robots in
| Disguise. The form of transformers I'm tired of hearing about.
| stevetron wrote:
| And here I thought this would be about Transformers: Robots in
| Disguise. The form of transformers I'm tired of hearing about.
|
| And the decepticons.
| einrealist wrote:
| I ask myself how much the focus of this industry on transformer
| models is informed by the ease of computation on GPUs/NPUs, and
| whether better AI technology is possible but would require much
| greater computing power on traditional hardware architectures. We
| depend so much on traditional computation architectures, it might
| be a real blinder. My brain doesn't need 500 Watts, at least I
| hope so.
| alyxya wrote:
| I think people care too much about trying to innovate a new model
| architecture. Models are meant to create a compressed
| representation of its training data. Even if you came up with a
| more efficient compression, the capabilities of the model
| wouldn't be any better. What is more relevant is finding more
| efficient ways of training, like the shift to reinforcement
| learning these days.
| marcel-c13 wrote:
| But isn't the max training efficiency naturally tied to the
| architecture? Meaning other architecture have another training
| efficiency landscape? I've said it somewhere else: It is not
| about "caring too much about new model architecture" but to
| have a balance between exploitation and exploration.
| nextworddev wrote:
| Isn't Sakana the one that got flack for falsely advertising its
| CUDA codegen abilities?
| Mithriil wrote:
| My opinion on the "Attention is all you need" paper is that its
| most important idea is the Positional Encoding. The transformer
| head itself... is just another NN block among many.
| nashashmi wrote:
| Transformers have sucked up all the attention and money. And AI
| scientists have been sucked in to the transformer-is-prime
| industry.
|
| We will spend more time in the space until we see bigger
| roadblocks.
|
| I really wished energy consumption was a very big roadblock that
| forced them into still researching.
| tim333 wrote:
| I think it may be a future roadblock quite soon. If you look at
| all the data centers planned and speed of it, it's going to be
| a job getting the energy. xAI hacked it by putting about 20 gas
| turbines around their data center which is giving locals health
| problems from the pollution. I imagine that sort of thing will
| be cracked down on.
| dmix wrote:
| If there's a legit long term demand for energy the market
| will figure it out. I doubt that will be a long term issue.
| It's just a short term one because of the gold rush. But
| innovation doesn't have to happen overnight. The world
| doesn't live or die on a subset of VC funds not 100xing
| within a certain timeframe
|
| Or it's possible China just builds the power capabilities
| faster because they actually build new things
| tippytippytango wrote:
| It's difficult to do because of how well matched they are to the
| hardware we have. They were partially designed to solve the
| mismatch between RNNs and GPUs, and they are way too good at it.
| If you come up with something truly new, it's quite likely you
| have to influence hardware makers to help scale your idea. That
| makes any new idea fundamentally coupled to hardware, and that's
| the lesson we should be taking from this. Work on the idea as a
| simultaneous synthesis of hardware and software. But, it also
| means that fundamental change is measured in decade scales.
|
| I get the impulse to do something new, to be radically different
| and stand out, especially when everyone is obsessing over it, but
| we are going to be stuck with transformers for a while.
| danielmarkbruce wrote:
| This is backwards. Algorithms that can be parallelized are
| inherently superior, independent of the hardware. GPUs were
| built to take advantage of the superiority and handle all kinds
| of parallel algorithms well - graphics, scientific simulation,
| signal processing, some financial calculations, and on and on.
|
| There's a reason so much engineering effort has gone into
| speculative execution, pipelining, multicore design etc -
| parallelism is universally good. Even when "computers" were
| human calculators, work was divided into independent chunks
| that could be done simultaneously. The efficiency comes from
| the math itself, not from the hardware it happens to run on.
|
| RNNs are not parallelizable by nature. Each step depends on the
| output of the previous one. Transformers removed that
| sequential bottleneck.
___________________________________________________________________
(page generated 2025-10-24 23:00 UTC)