[HN Gopher] Meta Superintelligence Labs' first paper is about RAG
       ___________________________________________________________________
        
       Meta Superintelligence Labs' first paper is about RAG
        
       https://arxiv.org/abs/2509.01092
        
       Author : skadamat
       Score  : 392 points
       Date   : 2025-10-11 23:16 UTC (23 hours ago)
        
 (HTM) web link (paddedinputs.substack.com)
 (TXT) w3m dump (paddedinputs.substack.com)
        
       | bigyabai wrote:
       | > Long awaited first paper from Meta Superintelligence Labs is
       | not a model layer innovation. What does this mean?
       | 
       | It means you're reading into it too much and need to be let down,
       | gently, from the hype train.
        
       | nine_k wrote:
       | A great post, it starts with this:
       | 
       |  _TL;DR
       | 
       | * MSI's first paper, REFRAG, is about a new way to do RAG.
       | 
       | * This slightly modified LLM converts most retrieved document
       | chunks into compact, LLM-aligned chunk embeddings that the LLM
       | can consume directly.
       | 
       | * A lightweight policy (trained with RL) decides which chunk
       | embeddings should be expanded back into full tokens under a
       | budget; the LLM runs normally on this mixed input.
       | 
       | * The net effect is far less KV cache and attention cost, much
       | faster first-byte latency and higher throughput, while preserving
       | perplexity and task accuracy in benchmarks._
       | 
       | I wish more long posts followed this model of a scientific paper.
        
       | jongjong wrote:
       | Interesting. All developers I know who tinkered around with
       | embeddings and vector similarity scoring were instantly hooked.
       | The efficiency of computing the embeddings once and then reusing
       | as many times as needed, comparing the vectors with a cheap
       | <30-line function is extremely appealing. Not to mention the
       | indexing capabilities to make it work at scale.
       | 
       | IMO vector embedding is the most important innovation in
       | computing of the last decade. There's something magical about it.
       | These people deserve some kind of prize. The idea that you can
       | reduce almost any intricate concept including whole paragraphs to
       | a fixed-size vector which encapsulates its meaning and proximity
       | to other concepts across a large number of dimensions is pure
       | genius.
        
         | _jayhack_ wrote:
         | Vector embedding is not an invention of the last decade.
         | Featurization in ML goes back to the 60s - even deep learning-
         | based featurization is decades old at a minimum. Like
         | everything else in ML this became much more useful with data
         | and compute scale
        
           | senderista wrote:
           | Yup, when I was at MSFT 20 years ago they were already
           | productizing vector embedding of documents and queries (LSI).
        
             | jongjong wrote:
             | Interesting. Makes one think.
        
               | senderista wrote:
               | To be clear, LSA[1] is simply applied linear algebra, not
               | ML. I'm sure learned embeddings outperform the simple
               | SVD[2] used in LSA.
               | 
               | [1]
               | https://en.wikipedia.org/wiki/Latent_semantic_analysis
               | 
               | [2] https://en.wikipedia.org/wiki/Singular_value_decompos
               | ition
        
         | ekidd wrote:
         | Vector embeddings are slightly interesting because they come
         | pre-trained with large amounts of data.
         | 
         | But similar ways to reduce huge numbers of dimensions to a much
         | smaller set of "interesting" dimensions have been known for a
         | long time.
         | 
         | Examples include principal component analysis/single value
         | decomposition, which was the first big breakthrough in face
         | recognition (in the early 90s), and also used in latent
         | semantic indexing, the Netflix prize, and a large pile of other
         | things. And the underlying technique was invented in 1901.
         | 
         | Dimensionality reduction is cool, and vector embedding is
         | definitely an interesting way to do it (at significant
         | computational cost).
        
         | liampulles wrote:
         | If you take the embedding for king, subtract the embedding for
         | male, add the embedding for female, and lookup the closest
         | embedding you get queen.
         | 
         | The fact that dot product addition can encode the concept of
         | royalty and gender (among all other sorts) is kind of magic to
         | me.
        
           | puttycat wrote:
           | This was actually shown to not really work in practice.
        
             | intelkishan wrote:
             | I have seen this particular work example to work. You don't
             | get the exact match but the closest one is indeed Queen.
        
               | mirekrusin wrote:
               | Shouldn't this itself be a part of training?
               | 
               | Having set of "king - male + female = queen" like
               | relations, including more complex phrases to align
               | embeddings.
               | 
               | It seems like terse, lightweight, information dense way
               | to address essence of knowldge.
        
               | godelski wrote:
               | Yes but it doesn't generalize very well. Even on simple
               | features like gender. If you go look at embeddings you'll
               | find that man and woman are neighbors, just as king and
               | queen are[0]. This is a better explanation for the result
               | as you're just taking very small steps in the latent
               | space.
               | 
               | Here, play around[1]                 mother - parent +
               | man = woman       father - parent + woman = man
               | father - parent + man = woman       mother - parent +
               | woman = man       woman - human + man = girl
               | 
               | Or some that should be trivial                 woman -
               | man + man = girl       man - man + man = woman
               | woman - woman + woman = man
               | 
               | Working in very high dimensions is funky stuff. Embedding
               | high dimensions into low dimensions results in even
               | funkier stuff
               | 
               | [0] https://projector.tensorflow.org/
               | 
               | [1] https://www.cs.cmu.edu/~dst/WordEmbeddingDemo/
        
               | yellowcake0 wrote:
               | so addition is not associative?
        
               | godelski wrote:
               | I think you're missing the point
        
               | yellowcake0 wrote:
               | It's a pretty exotic type of addition that would lead to
               | the second set of examples, just trying to get an idea of
               | its nature.
        
         | CuriouslyC wrote:
         | Vector embeddings are so overhyped. They're decent as a
         | secondary signal, but they're expensive to compute and fragile.
         | BM25 based solutions are more robust and WAY lower latency, at
         | the cost of some accuracy loss vs hybrid solutions. You can get
         | the majority of the lift from hybrid solutions with ingest time
         | semantic expansion/reverse hyde type input annotation with a
         | sparse embedding BM25 at a fraction of the computational cost.
        
           | jongjong wrote:
           | But it's much cheaper to compute than inference, and also you
           | only have to compute once for any content and reuse multiple
           | times.
        
         | calf wrote:
         | The idea of reducing language to mere bits, in general, sounds
         | like it would violate the Godel/Turing theorems about
         | computability.
        
       | mountainriver wrote:
       | This was a very obvious next step, I played around with
       | implementing something similar at one point.
       | 
       | In general we need to make it simpler for LLMs to take in
       | different forms of embeddings. At least frameworks that simplify
       | it.
        
       | cm2012 wrote:
       | At first I thought the super intelligence wrote a novel
       | scientific paper
        
       | Imnimo wrote:
       | I'm curious whether this is work that was specifically begun
       | under the "superintelligence" umbrella, or if it's just that the
       | people who were working on it had been shifted to the
       | Superintelligence team by the time they wrote the paper. I would
       | guess the former?
        
         | lblume wrote:
         | Another commenter claims the latter:
         | https://news.ycombinator.com/item?id=45554169
        
       | naasking wrote:
       | > the core insight here is actually: if embeddings are generated
       | by layers within the LLM, it makes no sense to convert them back
       | to natural language, just for another LLM to compress those
       | tokens back to embeddings.
       | 
       | Doesn't this tie the two layers together in a way that they can't
       | evolve separately?
        
       | xvector wrote:
       | Working in big tech it's pretty wild to see how integral AI has
       | become to our work internally, vs the public perception of it.
       | People are NOT prepared.
        
         | fishmicrowaver wrote:
         | Not prepared for what? Seems like the rest of the world is
         | desperate to be shown the way to unlock something of value?
        
           | Workaccount2 wrote:
           | I think at this point it's software devs looking for the
           | value unlock.
           | 
           | Non-software devs are actually making functional programs for
           | themselves for the first time ever. The value is crazy.
        
             | ceejayoz wrote:
             | It's not the first time ever. People did the same with
             | Access and HyperCard in the 90s.
        
             | fishmicrowaver wrote:
             | Sure but I'm the real world do you think businesses are
             | going to deploy piles of code into production generated
             | this way? No, non technical people will continue to whip up
             | MS PowerApps. AI generated code has no value to many
             | businesses.
        
               | xvector wrote:
               | The value of AI is not in generating code. That's just a
               | "nice-to-have."
               | 
               | The value of AI is in having a scalable, human-like
               | decision maker that you can plug into anything, anywhere.
               | This has unlocked countless use cases for my team, that
               | we could scarcely imagine a few years ago.
        
               | cbg0 wrote:
               | "Human-like decision maker" except it's just as if not
               | more unpredictable than a human, has no understanding of
               | what it's actually outputting or the impact of it, and it
               | isn't concerned with losing their job or facing legal
               | repercussions for their actions.
        
               | xvector wrote:
               | There are plenty of ways to manage those drawbacks, and a
               | mind-boggling number of use cases where it's "good
               | enough" already.
               | 
               | But it's not my job to convince you, my lived experience
               | working with the tech is enough to convince me, and
               | that's all I care about, to be honest. Everyone else will
               | get there sooner or later.
        
               | Workaccount2 wrote:
               | You don't need production level code to make your life
               | easier.
               | 
               | You're missing the forest for the trees. Most people
               | can't even make a block diagram, but they can explain
               | what they have and what they want to do with it.
        
         | terminalshort wrote:
         | 1. Hyperbolic statement about LLM capabilities with no concrete
         | examples
         | 
         | 2. Wild claim that the companies that sell LLMs are actually
         | downplaying their capabilities instead of hyping them
        
           | danielmarkbruce wrote:
           | Yup, he's totally lying. Not happening. Just carry on.
        
             | BoorishBears wrote:
             | Agreed, but why are they lying?
        
               | danielmarkbruce wrote:
               | That was sarcasm. He's not lying.
        
               | BoorishBears wrote:
               | Didn't read any sarcasm in what he said?
        
           | crorella wrote:
           | Personal experience here in a FAANG, there has been a
           | considerable increase in: 1. Teams exploring how to leverage
           | LLMs for coding. 2. Teams/orgs that already standardized some
           | of the processes to work with LLMs (MCP servers, standardized
           | the creation of the agents.md files, etc) 3. Teams actively
           | using it for coding new features, documenting code,
           | increasing test coverage, using it for code reviews etc.
           | 
           | Again, personal, experience, but in my team ~40-50% of the
           | PRs are generated by Codex.
        
             | rhetocj23 wrote:
             | Im sure the MBA folks love stats like that - theres plenty
             | that have infested big tech. I mean Pichai is an
             | MBA+Mckinsey Alumni.
             | 
             | Ready for the impending lay off fella?
        
               | alex-nt wrote:
               | There are places that offer Copilot to any team that
               | wants it, and then behind the scenes they informed their
               | managers that if the team (1+ persons) adopts it they
               | will have to shed 10%+ human capacity (lose a person,
               | move a person, fire a person) in the upcoming quarters
               | next year.
        
             | ruszki wrote:
             | "Teams exploring how to leverage [AI]s for [anything]" is
             | true for about a decade now in every large multinational
             | companies at every level. It's not new at all. AI is the
             | driving buzzword for a while now, even well before ChatGPT.
             | I've encountered many people who just wanted the stamp that
             | they use AI, no matter how, because my team was one of the
             | main entry point to achieve this at that specific company.
             | But before ChatGPT and co, you had to work for it a lot, so
             | most of them failed miserably, or immediately backtracked
             | when they realized this.
        
         | incompatible wrote:
         | I've heard of one study that said AI slows developers down,
         | even when they think it's helping.
         | 
         | https://www.infoworld.com/article/4061078/the-productivity-p...
        
           | xvector wrote:
           | AI may slow coding a bit but dramatically reduces cognitive
           | load.
           | 
           | The real value of AI isn't in helping coding. It's in having
           | a human-like intelligence to automate processes. I can't get
           | into details but my team is doing things that I couldn't
           | dream of three years ago.
        
             | qingcharles wrote:
             | It does dramatically reduce cognitive load. I think that
             | part is understated and lost to the headline of how it
             | writes two thousand lines of code in 30 seconds.
        
           | naasking wrote:
           | It is true sometimes, but other times it saves hours. We're
           | all still in the learning stage of how best to use these new
           | tools, and their capabilities are growing constantly.
        
         | gdulli wrote:
         | Not everyone has given in to the crutch.
        
           | xvector wrote:
           | That's why I still use an abacus.
        
             | gdulli wrote:
             | The abacus skills are safely obsolete, the skills of
             | general thinking and creativity must not become that. This
             | couldn't be more specious.
             | 
             | Meme thinking like this, repeating something you've heard
             | as reflex without regard to whether it fits a situation, is
             | the exact kind of unoriginality we can't allow to become
             | the default mode of thinking.
        
               | xvector wrote:
               | I am not the one being unoriginal here. You are thinking
               | that AI will obsolete critical thinking, so there's no
               | point developing with it.
               | 
               | However, in your moral crusade against using AI you are
               | missing the big picture. No one is making you code with
               | AI. But there are many things that you can only build if
               | you use AI as a component.
               | 
               | The ability to plug a human-like decisionmaker into
               | anything, anywhere massively expands what we can build.
               | There are applications and use cases that you cannot even
               | conceptualize without having the ability to plug AI in.
               | This does not impacting critical thinking whatsoever.
               | 
               | Be original. Put your engineer hat on and think on what
               | this new tool lets you build, that you couldn't
               | beforehand.
        
               | throw_this_one wrote:
               | I find the AI can make me more creative. I don't have to
               | waste mental energy on boilerplate or straightforward
               | stuff that would take me typing through some event
               | processing loop etc. I can extract out and reuse
               | components easier and focus on big picture design. Or
               | build more bespoke admin tools that I wouldn't have
               | wanted to waste time building some JS stuff before.
        
       | godelski wrote:
       | It's kinda funny, Meta has long had some of the best in the
       | field, but left them untapped. I really think if they just took a
       | step back and stop being so metric focused and let their people
       | freely explore then they'd be winning the AI race. But with this
       | new team, I feel like meta mostly hired the people who are really
       | good at gaming the system. The people that care more about the
       | money than the research.
       | 
       | A bit of this is true at every major lab. There's tons of
       | untapped potential. But these organizations are very risk
       | adverse. I mean why not continue with the strategy that got us to
       | the point we're at in the first place. Labs used to hire
       | researchers and give them a lot of free reign. But those times
       | ended and AI progress also slowed down. Maybe if you want to get
       | ahead you gotta stop thinking like everyone else
       | 
       | Well meta... you can "hold me hostage" for a lot cheaper than
       | those guys. I'm sure this is true for hundreds of passionate ML
       | researchers. I'd take a huge pay cut to have autonomy and
       | resources. I know for a fact there's many working at Mets right
       | now that would do the same. Do maybe if you're going to throw
       | money at the problem, diversify a bit and look back at what made
       | SV what it is today and what made AI take leaps forward
        
         | bobxmax wrote:
         | I thought Alex Wang was a very curious choice. There are so
         | many foundational AI labs with interesting CEOs... I get that
         | Wang is remarkable in his own right, but he basically just
         | built MTurk and timed the bubble.
         | 
         | Doesn't really scream CEO of AGI to me.
        
           | thereitgoes456 wrote:
           | The reportings at the time said that he was Mark's 5th choice
           | or similar. It is fairly clear he would prefer Ilya, Murati,
           | Mark Chen, and perhaps others, but they said no, and Alex
           | Wang was the first one to say yes.
        
             | tsunamifury wrote:
             | Why in the world would he want Murati? She has absolutely
             | no technical chops and was not functionally CTO of OpenAI.
        
               | shuckles wrote:
               | Because she was CTO of OpenAI.
        
               | CuriouslyC wrote:
               | Pretty ironic when access to trade secrets and people
               | skills is seen as more important in a technical field
               | than technical competence.
        
               | hn_throwaway_99 wrote:
               | > was not functionally CTO of OpenAI.
               | 
               | Why do you say that?
        
               | tsunamifury wrote:
               | Her history was entirely non technical up until openAI.
        
               | hn_throwaway_99 wrote:
               | I think that's total BS, based on this article about her,
               | https://fortune.com/2025/10/03/mira-murati-career-ai-
               | thinkin...
               | 
               | 1. She has 2 BAs, one in math and one in mechanical
               | engineering.
               | 
               | 2. She was an "Advanced Concepts Engineer at Zodiac
               | Aerospace from 2012 to 2013".
               | 
               | 3. She was a product manager at Tesla on the Model X
               | 
               | 4. She was VP of product and engineering at Leap Motion.
               | 
               | Going from that fact that she wasn't a deep learning
               | researcher to "her history was entirely non technical up
               | until Open AI" is plain false. And plus, the job of CTO
               | is 90%+ people management, and she appears more than
               | smart enough and experienced enough to evaluate technical
               | decisions of her team.
        
               | bobxmax wrote:
               | What technical chops does Sam Altman have?
        
               | tsunamifury wrote:
               | Altman is ceo not cto. Is Hackernews so ignorant now they
               | don't understand these differences?
        
               | seanmcau wrote:
               | He started coding at age 8
        
             | arthurcolle wrote:
             | The self-supervised mesa-optimizer strikes again
        
           | godelski wrote:
           | A lot of people also don't know that many of the well known
           | papers are just variations on small time papers with a fuck
           | ton more compute thrown at the problem. Probably the
           | strongest feature that correlates to successful researcher is
           | compute. Many have taken this to claim that the GPU poor
           | can't contribute but that ignores so many other valid
           | explanations... and we wonder why innovation has slowed...
           | It's also weird because if compute was all you need then
           | there's a much cheaper option than Zuck paid. But he's paying
           | for fame.
        
             | rhetocj23 wrote:
             | Frankly this is the reason why Im not convinced the current
             | movement of LLMs will yield anything close to the dream.
             | 
             | The right people to deliver immense progress dont exist
             | right now.
        
               | godelski wrote:
               | > The right people to deliver immense progress dont exist
               | right now.
               | 
               | I wouldn't go this far. But I would say that we're not
               | giving them a good shot.
               | 
               | The people are always there, you just need to find them
               | and enable them.                 How do you manage
               | genius? You don't.       -- Mervin Kelly
        
             | BobbyTables2 wrote:
             | It's funny.
             | 
             | I learnt the hard way that communications/image/signal
             | processing research basically doesn't care about Computer
             | Architecture at the nuts and bolts level of compiler
             | optimization and implementation.
             | 
             | When they encounter a problem whose normal solution
             | requires excessive amounts of computation, they reduce
             | complexity algorithmically using mathematical techniques,
             | and quantify the effects.
             | 
             | They don't quibble about a 10x speed up, they reduce the
             | "big O()" complexity. They could care less whether it was
             | implemented in interpreted Python or hand-optimized
             | assembly code.
             | 
             | On one hand, I know there's a lot of talent in AI today.
             | But throwing hardware at the problem is the dumbest way
             | forward.
             | 
             | WiFI adapters would be wheeled luggage if we had the same
             | mentality during their development.
        
               | godelski wrote:
               | > They don't quibble about a 10x speed up, they reduce
               | the "big O()" complexity. They could care less whether it
               | was implemented in interpreted Python or hand-optimized
               | assembly code.
               | 
               | I can at least say that's not all of us. But you're
               | probably right that this is dominating. I find it so
               | weird since everyone stresses empirics yet also seems to
               | not care about them. It took me my entire PhD to figure
               | out what was really going on. I've written too many long
               | winded rants on this site though
        
               | shwaj wrote:
               | At some point it becomes difficult to improve the O()
               | complexity. How do you do better that the O(n-squared) of
               | the Transformer, with acceptable tradeoffs? Many big
               | brains in all the big labs are very aware of the
               | importance of algorithmic advances. There is no low
               | hanging fruit, but they're doing their best.
               | 
               | Then _in parallel to that_ looking at compiler
               | optimizations, and other higher-level algorithmic
               | innovations such as Flash Attention (a classic at this
               | point) which had a drastic impact on performance due to
               | cache awareness, without changing the O() complexity.
        
               | tomrod wrote:
               | Sometimes it's the theory, sometimes it's the
               | engineering, and often it's both.
        
               | helix278 wrote:
               | You make it sound like reducing the big O complexity is a
               | dumb thing to do in research, but this is really the only
               | way to make lasting progress in computer science.
               | Computer architectures become obsolete as hardware
               | changes, but any theoretical advances in the problem
               | space will remain true forever.
        
             | crystal_revenge wrote:
             | > A lot of people also don't know that many of the well
             | known papers are just variations on small time papers with
             | a fuck ton more compute thrown at the problem.
             | 
             | I worked for a small research heavy AI startup for a bit
             | and it was heart breaking how many people I would interact
             | with in that general space with research they worked hard
             | and passionately on only to have been beaten to the punch
             | by a famous lab that could rush the paper out quicker and
             | at a larger scale.
             | 
             | There were also more than a few instances of high-
             | probability plagiarism. My team had a paper that had been
             | existing for years basically re-written without citation by
             | a major lab. After some complaining they added a footnote.
             | But it doesn't really matter because no big lab is going to
             | have to defend themselves publicly against some small
             | startup, and their job at the big labs is to churn out
             | papers.
        
               | godelski wrote:
               | > only to have been beaten to the punch by a famous lab
               | that could rush the paper out quicker and at a larger
               | scale.
               | 
               | This added at least a year to my PhD... Reviewers kept
               | rejecting my works saying "add more datasets" and such
               | comments. That's nice and all, but on the few datasets I
               | did use I beat out top labs and used a tenth of the
               | compute. I'd love to add more datasets but even though I
               | only used a tenth of the compute I blew my entire compute
               | budget. Guess state of the art results, a smaller model,
               | higher throughput, and 3rd party validation were not
               | enough (use an unpopular model architecture).
               | 
               | I always felt like my works were being evaluated as
               | engineering products, not as research.                 >
               | a few instances of high-probability plagiarism
               | 
               | I was reviewing a work once and I actually couldn't tell
               | if the researchers knew that they ripped me off or not.
               | They compared to my method, citing, and showing figures
               | using it. But then dropped the performance metrics from
               | the table. So I asked. I got them in return and saw that
               | there was no difference... So I dove in and worked out
               | that they were just doing 99% my method with additional
               | complexity (computational overhead). I was pretty upset.
               | 
               | I was also upset because otherwise the paper was good.
               | The results were nice and they even tested our work in a
               | domain we hadn't. Were they just upfront I would have
               | gladly accepted the work. Though I'm pretty confident the
               | other reviewers wouldn't have due to "lack of novelty."
               | 
               | It's a really weird system that we've constructed. We're
               | our own worst enemies.                 > their job at the
               | big labs is to churn out papers.
               | 
               | I'd modify this slightly. Their job is to get citations.
               | Churning out papers really helps with that, but so does
               | all the tweeting and evangelizing of their works. It's an
               | unfortunate truth that as researchers we have to sell our
               | works, and not just by the scientific merit that they
               | hold. People have to read them after all. But we should
               | also note that it is easier for some groups to get
               | noticed more than others. Prestige doesn't make a paper
               | good, but it sure acts as a multiplying factor for all
               | the metrics we use for determining if it is good.
        
           | tsunamifury wrote:
           | Alexandr Wang is not interesting and a few steps short of a
           | fraud that Mark had to bail out because he was so co
           | invested.
           | 
           | Shareholders should be livid if they knew a single thing
           | about what was going on.
        
             | typpilol wrote:
             | Tell me more
        
               | tsunamifury wrote:
               | Scale promised cutting-edge data pipelines and model-
               | training infra but mostly sold outsourced labeling with a
               | tech veneer. Great margins, weak moat -- classic Valley
               | overclaim, not outright fraud.
        
         | didip wrote:
         | I always wonder about that. Those $100m Mathematicians... how
         | can they have rooms to think under Meta's crushing IMPACT
         | pressure?
        
           | trhway wrote:
           | For just 10% of those money a $100M mathematician can hire 10
           | $1M mathematicians or a whole math dept in some European
           | university to do the work and the thinking for them and thus
           | beat any pressure while resting and vesting on the remaining
           | 90%.
        
             | lblume wrote:
             | Sure, but they weren't hired as managers, right?
        
               | vasco wrote:
               | Ok ok, another $1m/year to hire a manager.
        
         | rhetocj23 wrote:
         | "Maybe if you want to get ahead you gotta stop thinking like
         | everyone else"
         | 
         | Well for starters you need a leader who can rally the troops
         | who "think(s) different" - something like a S Jobs.
         | 
         | That person doesnt seem to exist in the industry right now.
        
         | ProofHouse wrote:
         | winning the AI race? Meta? Oh that was a good one. Zuck is a
         | follower not a leader. It is in his DNA
        
         | hamasho wrote:
         | My theory is that as more people compete, the top candidates
         | become those who are best at gaming the system rather than
         | actually being the best. Someone has probably studied this. My
         | only evidence is job applications for GAFAM and Tinder tho.
        
           | godelski wrote:
           | > Someone has probably studied this
           | 
           | There's even a name for it
           | 
           | https://en.wikipedia.org/wiki/Goodhart%27s_law
        
             | julienreszka wrote:
             | It's a false law tho. Collapses under scrutiny
        
               | godelski wrote:
               | Sorry, remind me; how many cobras are there in India?
        
               | bandrami wrote:
               | The Zoological Survey of India would like to know but
               | hasn't figured out a good way to do a full census. If you
               | have any ideas they would love to hear them.
               | 
               | Naja naja has Least Concern conservation status, so there
               | isn't much funding in doing a full count, but there are
               | concerns as encroachment both reduces their livable
               | habitat and puts them into more frequent contact with
               | humans and livestock.
        
               | oblio wrote:
               | The comment was a joke.
               | 
               | https://en.wikipedia.org/wiki/Perverse_incentive
        
               | epwr wrote:
               | Could you elaborate or link something here? I think about
               | this pretty frequently, so would love to read something!
        
               | vasco wrote:
               | Metric: time to run 100m
               | 
               | Context: track athlete
               | 
               | Does it cease to be a good metric? No. After this you can
               | likely come up with many examples of target metrics which
               | never turn bad.
        
               | godelski wrote:
               | So what is your argument, that it doesn't apply
               | everywhere therefore it applies nowhere?
               | 
               | You're misunderstanding the root cause. Your example
               | works as the the metric is well aligned. I'm sure you can
               | also think of many examples where the metric is not well
               | aligned and maximizing it becomes harmful. How do you
               | think we ended up with clickbait titles? Why was everyone
               | so focused on clicks? Let's think about engagement
               | metrics. Is that what we really want to measure? Do we
               | have no preference over users being happy vs users being
               | angry or sad? Or are those things much harder to measure,
               | if not impossible to, and thus we focus on our proxies
               | instead? So what happens when someone doesn't realize it
               | is a proxy and becomes hyper fixated on it? What happens
               | if someone does realize it is a proxy but is rewarded via
               | the metric so they don't really care?
               | 
               | Your example works in the simple case, but a lot of
               | things look trivial when you only approach them from a
               | first order approximation. You left out all the hard
               | stuff. It's kinda like...
               | 
               | Edit: Looks like some people are bringing up metric
               | limits that I couldn't come up with. Thanks!
        
               | vasco wrote:
               | > So what is your argument, that it doesn't apply
               | everywhere therefore it applies nowhere?
               | 
               | I never said that. Someone said the law collapses,
               | someone asked for a link, I gave an example to prove it
               | does break down in some cases at least, but many cases
               | once you think more about it. I never said all cases.
               | 
               | If it works sometimes and not others, it's not a law.
               | It's just an observation of something that can happen or
               | not.
        
               | godelski wrote:
               | > I never said all cases.
               | 
               | You're right. My bad. I inferred that through the context
               | of the conversation.                 > If it works
               | sometimes and not others, it's not a law.
               | 
               | I think you are misreading and that is likely what lead
               | to the aforementioned misunderstanding. You're right that
               | it isn't a _scientific_ law, but the term  "law" gets
               | thrown around a lot in a more colloquial manner.
               | Unfortunately words are overloaded and have multiple
               | meanings. We do the same thing to "hypothesis",
               | "paradox", and lots of other things. I hope this
               | clarifies the context. (even many of the physics laws
               | aren't as strong as you might think)
               | 
               | But there are many "laws" used in the same form. They're
               | _eponymous_ laws[0], not _scientific_ ones. Read
               | "adage". You'll also find that word used in the opening
               | sentence on the Wiki article I linked as well as most (if
               | not all) of them in [0]
               | 
               | [0] https://en.wikipedia.org/wiki/List_of_eponymous_laws
        
               | exe34 wrote:
               | it doesn't break down - see comments about rules above.
               | it was the perfect example to prove yourself wrong.
        
               | vasco wrote:
               | I disagree with all of those examples, they are
               | misunderstanding what it means for the metric to break
               | down in the context of the law, but alas. "If you run a
               | different race" lol.
        
               | exe34 wrote:
               | could you explain what you think the difference is?
               | 
               | a metric is chosen, people start to game the system by
               | doing things that make the metric improve but the
               | original intent is lost. increasingly specific rules/laws
               | have to be made up to make the metric appear to work, but
               | it becomes a lost cause as more and more creative ways
               | are found to work around the rules.
        
               | vasco wrote:
               | Exactly, that's the definition. It doesn't apply to
               | timing a 100m race. There's many such situations that are
               | simple enough and with perfect information available
               | where this doesn't break down and a metric is just a
               | metric and it works great.
               | 
               | Which is not to the detriment of the observation being
               | true in other contexts, all I did was provide a counter
               | example. But the example requires the metric AND the
               | context.
        
               | exe34 wrote:
               | it wasn't a very good counter example.
        
               | godelski wrote:
               | Do you know certain shoes are banned in running
               | competitions?
               | 
               | There's a really fine line here. We make shoes to help us
               | run faster and keep our feet safe, right? Those two are
               | directly related, as we can't run very fast if our feet
               | are injured. But how far can this be taken? You can make
               | shoes that dramatically reduce the impact when the foot
               | strikes the ground, which reduces stress on the foot and
               | legs. But that might take away running energy, which adds
               | stresses and strains to the muscles and ligaments. So you
               | modify your material to put energy back into the person's
               | motion. This all makes running safer. But it also makes
               | the runner faster.
               | 
               | Does that example hack the metric? You might say yes but
               | I'm certain someone will disagree with you. There's
               | always things like this where they get hairy when you get
               | down to the details. Context isn't perfectly defined and
               | things aren't trivial to understand. Hell, that's why we
               | use pedantic programming languages in the first place,
               | because we're dealing with machines that have to operate
               | void of context[0]. Even dealing with humans is hard
               | because there's multiple ways to interpret anything.
               | Natural language isn't pedantic enough for perfect
               | interpretation.
               | 
               | [0] https://www.youtube.com/watch?v=FN2RM-CHkuI
        
               | godelski wrote:
               | > in the context of the law
               | 
               | That's the key part. The metric has context, right?
               | 
               | And that's where Goodhart's "Law" comes in. A metric has
               | no meaning without context. This is why metrics need to
               | be interpreted. They need to be evaluated in context.
               | Sometimes this context is explicit but other times it is
               | implicit. Often people will hack the metric as the
               | implicit rule is not explicit and well that's usually a
               | quick way to make those rules explicit.
               | 
               | Here's another way to think about it: no rule can be so
               | perfectly written that it has no exceptions.
        
               | MR_Bulldops wrote:
               | Do you have an example that doesn't involve an objective
               | metric? Of course objective metrics won't turn bad.
               | They're more measurements than metrics, really.
        
               | godelski wrote:
               | > an objective metric
               | 
               | I'd like to push back on this a little, because I think
               | it's important to understanding why Goodhart's Law shows
               | up so frequently.
               | 
               | * _There are no /objective/ metrics*_, only proxies.
               | 
               | You can't measure a meter directly, you have to use a
               | proxy like a tape measure. Similarly you can't measure
               | time directly, you have to use a stop watch. In a normal
               | conversation I wouldn't be nitpicking like this because
               | those proxies are so well aligned with our intended
               | measures and the lack of precision is generally
               | inconsequential. But once you start measuring anything
               | with precision you cannot ignore the fact that you're
               | limited to proxies.
               | 
               | The difference of when we get more abstract in our goals
               | is not too dissimilar. Our measuring tools are just
               | really imprecise. So we have to take great care to
               | understand the meaning of our metrics and their limits,
               | just like we would if we were doing high precision
               | measurements with something more "mundane" like distance.
               | 
               | I think this is something most people don't have to
               | contend with because frankly, very few people do high
               | precision work. And unfortunately we often use algorithms
               | as black boxes. But the more complex a subject is the
               | more important an expert is. It looks like they are just
               | throwing data into a black box and reading the answer,
               | but that's just a naive interpretation.
        
               | AnthonyMouse wrote:
               | This isn't what Goodhart's law is about.
               | 
               | Sure, if you get a ruler from the store it might be off
               | by a fraction of a percent in a way that usually doesn't
               | matter and occasionally does, but even if you could
               | measure distance _exactly_ that doesn 't get you out of
               | it.
               | 
               | Because what Goodhart's law is really about is
               | bureaucratic cleavage. People care about lots of
               | diverging and overlapping things, but bureaucratic rules
               | don't. As soon as you make something a target, you've
               | created the incentive to make that number go up at the
               | expense of all the other things you're not targeting but
               | still care about.
               | 
               | You can take something which is clearly what you actually
               | want. Suppose you're commissioning a spaceship to take
               | you to Alpha Centauri and then it's important that it go
               | fast because otherwise it'll take too long. We don't even
               | need to get into exactly how fast it needs to go or how
               | to measure a meter or anything like that, we can just say
               | that going fast is a target. And it's a _valid_ target;
               | it actually needs to do that.
               | 
               | Which leaves you already in trouble. If your organization
               | solicits bids for the spaceship and that's the only
               | target, you better not accept one before you notice that
               | you also need things like "has the ability to carry
               | occupants" and "doesn't kill the occupants" and "doesn't
               | cost 999 trillion dollars" or else those are all on the
               | chopping block in the interest of going fast.
               | 
               | So you add those things as targets too and then people
               | come up with new and fascinating ways to meet them by
               | sacrificing other things you wanted but didn't require.
               | 
               | What's really happening here is that if you set targets
               | and then require someone else to meet them, they will
               | meet the targets in ways that you will not like. It's the
               | principal-agent problem. The only real way out of it is
               | for principals to be their own agents, which is exactly
               | the thing a bureaucracy isn't.
        
               | godelski wrote:
               | I agree with you, in a way.
               | 
               | I've just taken another step to understand the philosophy
               | of those bureaucrats. Clearly they have some logic,
               | right? So we have to understand why they think they can
               | organize and regulate from the spreadsheet. Ultimately it
               | comes down to a belief that the measurements (or numbers)
               | are "good enough" and that they have a good understanding
               | of how to interpret them. Which with many bureaucracies
               | that is the belief that no interpretation is needed. But
               | we also see that behavior with armchair experts who try
               | to use data to evidence their conclusion rather than
               | interpret data and conclude from that interpretation.
               | 
               | Goodhart had focused on the incentive structure of the
               | rule, but that does not tell us _how_ this all happens
               | and _why_ the rule is so persistent. I think you 're
               | absolutely right that there is a problem with agents, and
               | it's no surprise that when many introduce the concept of
               | "reward hacking" that they reference Goodhart's Law. Yes,
               | humans _can_ typically see beyond the metric and infer
               | the intended outcome, but ignore this because they don 't
               | care and so fixate on the measurement because that gives
               | them the reward. Bureaucracies no doubt amplify this
               | behavior as they are well known to be soul crushing.
               | 
               | But we should also be asking ourselves if the same effect
               | can apply in settings where we have the best of
               | intentions and all the agents are acting in good faith
               | and trying to interpret the measure instead of just game
               | it. The answer is yes. Idk, call it Godelski's Corollary
               | if you want (I wouldn't), but it this relates to
               | Goodhart's Law at a fundamental level. You can still have
               | metric hacking even when agents aren't aware or even
               | intending to do so. Bureaucracy is not required.
        
               | AnthonyMouse wrote:
               | In a sense you can do the same thing to yourself. If you
               | self-impose a target and try to meet it while ignoring a
               | lot of things that you're not measuring even though
               | they're still important, you can unintentionally
               | sacrifice those things. But there's a difference.
               | 
               | In that case you have to not notice it, which sets a much
               | lower cap on how messed up things can get. If things are
               | really on fire then you notice right away and you have
               | the agency to do something different.
               | 
               | Whereas if the target is imposed by a far-off hierarchy
               | or regulatory bureaucracy, the people on the ground who
               | notice that things are going wrong have no authority to
               | change it, which means they carry on going wrong.
               | 
               | Or put it this way: The degree to which it's a problem is
               | proportional to the size of the bureaucracy. You can
               | cause some trouble for yourself if you're not paying
               | attention but you're still directly exposed to "hear
               | reason or she'll make you feel her". If it's just you and
               | your boss who you talk to every day, that's not as good
               | but it's still not that bad. But if the people imposing
               | the target aren't even in the same state, you can be
               | filling the morgue with bodies and still not have them
               | notice.
        
               | ccortes wrote:
               | > Does it cease to be a good metric?
               | 
               | Yes if you run anything other than the 100m
        
               | AnthonyMouse wrote:
               | > Metric: time to run 100m
               | 
               | > Context: track athlete
               | 
               | > Does it cease to be a good metric? No.
               | 
               | What do you mean? People start doping or showing up with
               | creatively designed shoes and you need to layer on a
               | complicated system to decide if that's cheating, but some
               | of the methods are harder to detect and then some people
               | cheat anyway, or you ban steroids or stimulants but allow
               | them if they're by prescription to treat an unrelated
               | medical condition and then people start getting
               | prescriptions under false pretexts in order to get better
               | times. Or worse, someone notices that the competition
               | can't set a good time with a broken leg.
        
               | noosphr wrote:
               | If it were a good metric there wouldn't be a few phone
               | books worth of regulations on what you can do before and
               | during running 100 meters. From banning rocket shoes, to
               | steroids, to robot legs the 100 meter run is a perfect
               | example of a terrible metric both intrinsically as a
               | measure of running speed and extrinsically as a measure
               | of fitness.
        
               | NBJack wrote:
               | If I hadn't seen it in action countless times, I would
               | belive you. Changelists, line counts, documents made,
               | collaborator counts, teams lead, reference counts in peer
               | reviewed journals...the list goes on.
               | 
               | You are welcome to prove me wrong though. You might even
               | restore some faith in humanity, too!
        
             | ivanbelenky wrote:
             | Thanks for sharing. I did not know this law existed and had
             | a name. I know nothing about nothing but it appears to be
             | the case that the interpretation of metrics for policies
             | assume implicitly the "shape" of the domain. E.g. in RL for
             | games we see a bunch of outlier behavior for policies just
             | gaming the signal.
             | 
             | There seems to be 2 types
             | 
             | - Specification failure: signal is bad-ish, a completely
             | broken behavior --> local optimal points achieved for
             | policies that phenomenologically do not represent what was
             | expected/desired to cover --> signaling an improvable
             | reward signal definition
             | 
             | - Domain constraint failure: signal is still good and
             | optimization is "legitimate", but you are prompted with the
             | question "do I need to constraint my domain of solutions?"
             | - finding a bug that reduces time to completion of a game
             | in a speedrun setting would be a new acceptable baseline,
             | because there are no rules to finishing the game earlier
             | - shooting amphetamines on a 100m run would probably
             | minimize time, but other factors will make people consider
             | disallowing such practices.
        
               | Eisenstein wrote:
               | I view Goodhart's law more as a lesson for why we can
               | never achieve a goal by offering specific incentives if
               | we are measuring success by the outcome of the incentives
               | and not by the achievement of the goal.
               | 
               | This is of course inevitable if the goal cannot be
               | directly measured but is composed of many constantly
               | moving variables such as education or public health.
               | 
               | This doesn't mean we shouldn't bother having such goals,
               | it just means we have to be diligent at pivoting the
               | incentives when it becomes evident that secondary effects
               | are being produced at the expense of the desired effect.
        
               | godelski wrote:
               | > This is of course inevitable if the goal cannot be
               | directly measured
               | 
               | It's worth noting that _no goal can be directly measured_
               | [0].
               | 
               | I agree with you, this doesn't mean we shouldn't bother
               | with goals. They are fantastic tools. But they are
               | guides. The better aligned our proxy measurement is with
               | the intended measurement then the less we have to
               | interpret our results. We have to think less, spending
               | less energy. But even poorly defined goals can be
               | helpful, as they get refined as we progress in them.
               | We've all done this since we were kids and we do this to
               | this day. All long term goals are updated as we progress
               | in them. It's not like we just state a goal and then hop
               | on the railroad to success.
               | 
               | It's like writing tests for code. Tests don't prove that
               | your code is bug free (can't write a test for a bug you
               | don't know about: unknown unknown). But tests are still
               | helpful because they help evidence the code is bug free
               | and constrain the domain in which bugs can live. It's
               | also why TDD is naive, because tests aren't proof and you
               | have to continue to think beyond the tests.
               | 
               | [0] https://news.ycombinator.com/item?id=45555551
        
           | bjornsing wrote:
           | Yeah I think this is a general principle. Just look at the
           | quality of US presidents over time, or generations of top
           | physicists. I guess it's just a numbers game: the number of
           | genuinely interested people is relatively constant while the
           | number of gamers grows with the compensation and perceived
           | status of the activity. So when compensation and perceived
           | status skyrockets the ratio between those numbers changes
           | drastically.
        
             | godelski wrote:
             | I think the number of generally interested people goes up.
             | Maybe the percent stays the same? But honestly, I think we
             | kill passion for a lot of people. To be cliche, how many
             | people _lose_ the curiosity of a child? I think the cliche
             | exists for a reason. It seems the capacity is in all of us
             | and even once existed.
        
           | crystal_revenge wrote:
           | I've spent most of my career working, chatting and hanging
           | out with what might be best described as "passionate weirdos"
           | in various quantitative areas of research. I say "weirdos"
           | because they're people driven by an obsession with a topic,
           | but don't always fit the mold by having the ideal combination
           | of background, credentials and personality to land them on a
           | big tech company research team.
           | 
           | The other day I was spending some time with a researcher from
           | Deep Mind and I was surprised to find that while they were
           | sharp and curious to an extent, nearly every ounce of energy
           | they expended on research was _strategic_. They didn 't write
           | about research they were fascinated by, they wrote and
           | researched on topics they strategically felt had the highest
           | probability getting into a major conference in a short period
           | of time to earn them a promotion. While I was a bit
           | disappointed, I certainly didn't judge them because they are
           | _just playing the game_. This person probably earns more than
           | many rooms of smart, passionate people I 've been in, and
           | that money isn't for smarts alone; it's for appealing to the
           | interests of people with the money.
           | 
           | You can see this very clearly by comparing the work being
           | done in the LLM space to that being done in the Image/Video
           | diffusion model space. There's much more money in LLMs right
           | now, and the field is flooded with papers on any random
           | topic. If you dive in, most of them are not reproducible or
           | make very questionable conclusions based on the data they
           | present, but that's not of very much concern so long as the
           | paper can be added to a CV.
           | 
           | In the stable diffusion world it's mostly people driven by
           | personal interest (usually very _non-commericial_ personal
           | interests) and you see _tons_ of innovation in that field but
           | almost no papers. In fact, if you really want to understand a
           | lot of the most novel work coming out of the image generation
           | world you often need to dig into PRs made by an anonymous
           | users with anime themed profile pic.
           | 
           | The bummer of course is that there are very hard limits on
           | what any researcher can do with a home GPU training setup. It
           | does lead to creative solutions to problems, but I can't help
           | but wonder what the world would look like if more of these
           | people had even a fraction of the resources available
           | exclusively to people playing the game.
        
             | smokel wrote:
             | _> I certainly didn 't judge them because they are just
             | playing the game._
             | 
             | Please _do_ judge them for being parasitical. They might
             | seem successful by certain measures, like the amount of
             | money they make, but I for one simply dislike it when
             | people only think about themselves.
             | 
             | As a society, we should be more cautious about narcissism
             | and similar behaviors. Also, in the long run, this kind of
             | behaviour makes them an annoying person at parties.
        
               | what-the-grump wrote:
               | But this is in itself selfish right?
               | 
               | You dislike them because they don't benefit you
               | indirectly by benefiting society at large.
               | 
               | The incentive structure is wrong, incentivizing things
               | that benefit society would be the solution not judging
               | those that exist in the current system by pretending
               | altruism is somehow not part of the same game.
        
               | smokel wrote:
               | I agree that the system itself is dysfunctional, and I
               | understand the argument that individuals are shaped or
               | even constrained by it. However, in this case, we are
               | talking about people who are both exceptionally
               | intelligent and materially secure. I think it's
               | reasonable to expect such individuals to feel some moral
               | responsibility to use their abilities for broader good.
               | 
               | As for whether that expectation is "selfish" on my part,
               | I think that question has been debated for centuries in
               | ethics, and I'm quite comfortable landing on the side
               | that says not all disapproval is self-interest. In my own
               | case, I'm not benefiting much either :)
        
               | Eisenstein wrote:
               | There is a difference between being selfish in the sense
               | that you want others to contribute back to the society
               | that we are all part of, and being selfish in the sense
               | that you want to compete for exclusive rewards.
               | 
               | You can call this difference whatever you want, don't
               | pretend that they are morally or effectively equivalent.
        
               | kakacik wrote:
               | Selfish for the long term future and prosperity of
               | mankind? Thats some good selfishness all right.
        
               | bradleyjg wrote:
               | _but I for one simply dislike it when people only think
               | about themselves_
               | 
               | The key word there is only. Nothing in the post you
               | suggested only. You have one vignette about one facet of
               | this guy's life.
               | 
               | I really dislike the resurgence in Puritanism.
        
               | smokel wrote:
               | Please don't read too much into this single word. The
               | comment above mentioned _" nearly every ounce of energy
               | they expended on research was strategic"_, and I was
               | keeping that in mind while writing my remark.
               | 
               | Please read my sibling comment where I expand a bit on
               | what I meant to say.
        
               | idiotsecant wrote:
               | This take is simply wrong in a way that I would normally
               | just sigh and move on, but it's such a privileged HN
               | typical pov that I feel like I need to address it. If a
               | plumber did plumbing specifically because someone needed
               | it and he would be paid, would you call them a
               | narcissist? If a gardener built a garden how their
               | customer wanted would you call them a narcissist? Most of
               | the world doesn't get to float around in a sea of VC
               | money doing whatever feels good. They find a need,
               | address it, and get to live another day. Productively
               | addressing what other people need and making money from
               | it isn't narcissism, it's productivity.
        
               | lkey wrote:
               | You are comparing a skilled trade that commands ~100k
               | annual compensation to positions that have recently
               | commanded _100 million_ dollars in compensation _upon
               | signing_ , no immediate productivity required, as this
               | talent denial is considered strategic.
               | 
               | You consider the person who expects eventual ethical
               | behavior from people that have 'won' capitalism (never
               | have to labour again) to be privileged.
        
             | kcexn wrote:
             | This is such a nuanced problem. Like any creative
             | endeavour, the most powerful and significant research is
             | driven by an innate joy of learning, creating, and sharing
             | ideas with others. How far the research can be taken is
             | then shaped by resource constraints. The more money you
             | throw at the researchers, the more results they can get.
             | But there seems to be a diminishing returns kind of effect
             | as individual contributors become less able to produce
             | results independently. The research narrative also gets
             | distorted by who has the most money and influence, and not
             | always for the better (as recent events in Alzheimer's
             | research has shown).
             | 
             | The problem is once people's livelihoods depend on their
             | research output rather than the research process, the whole
             | research process becomes steadily distorted to optimise for
             | being able to reliably produce outputs.
             | 
             | Anyone who has invested a great deal of time and effort
             | into solving a hard problem knows that the 'eureka' moment
             | is not really something that you can force. So people end
             | up spending less time working on problems that would
             | contribute to 'breakthroughs' and more time working on
             | problems that will publish.
        
             | RataNova wrote:
             | The tragedy is exactly what you said: all that energy,
             | creativity, and deep domain obsession locked out of impact
             | because it's not institutionally "strategic."
        
           | xvector wrote:
           | I have seen absolutely incredible, best in the world type
           | engineers, much smarter than myself, get fired from my FAANG
           | because of the performance games.
           | 
           | I persist because I'm fantastic at politics while being good
           | enough to do my job. Feels weird man.
        
           | t_serpico wrote:
           | But there is no way to know who is truly the 'best'. The
           | people who position and market themselves to be viewed as the
           | best are the only ones who even have a chance to be viewed as
           | such. So if you're a great researcher but don't project
           | yourself that way, no one will ever know you're a great
           | researcher (except for the other great researchers who aren't
           | really invested in communicating how great you are). The
           | system seems to incentivize people to not only optimize for
           | their output but also their image. This isn't a bad thing per
           | se, but is sort of antithetical to the whole shoulder of
           | giants ethos of science.
        
             | kcexn wrote:
             | The problem is that the best research is not a competitive
             | process but a collaborative one. Positioning research
             | output as a race or a competition is already problematic.
        
               | bwfan123 wrote:
               | right. Also, the idea that there is a "best" researcher
               | is already problematic. You could have 10 great people in
               | a team, and it would be hard to rank them. Rating people
               | in order of performance in a team is contradictory to the
               | idea of building a great team. ie, you could have 10
               | people all rated 10 which is really the goal when
               | building a team.
        
           | rightbyte wrote:
           | This is an interesting theory. I think there is something to
           | it. It is really hard to do good in a competitive
           | environment. Very constrained.
        
           | meindnoch wrote:
           | Goodhart's law
        
           | RataNova wrote:
           | Anytime a system gets hyper-competitive and the stakes are
           | high, it starts selecting for people who are good at playing
           | the system rather than just excelling at the underlying skill
        
           | bwfan123 wrote:
           | I would categorize people into 2 broad extremes. 1) those
           | that care two hoots about what others or the system expects
           | of them and in that sense are authentic and 2) those that
           | only care about what others or the system expects of them,
           | and in that sense are not authentic. There is a spectrum in
           | there.
        
           | b00ty4breakfast wrote:
           | that's what happens at the top of most competitive domains.
           | Just take a look at pro sports; guys are looking for
           | millimeters to shave off and they turn to "playing the game"
           | rather than merely improving athletic performance. Watching a
           | football game (either kind) and a not-small portion of the
           | action is guys trying to draw penalties or exploit the rules
           | to get an edge.
        
         | contrarian1234 wrote:
         | > Labs used to hire researchers and give them a lot of free
         | reign.
         | 
         | I can't think of it ever really paying off. Bell Labs is the
         | best example. Amazing research that was unrelated to the core
         | business off the parent company. Microsoft Research is another
         | great one. Lots of interesting research that .. got MS some
         | nerd points? But has materialized into very very few actual
         | products and revenue streams. Moving AI research doesn't help
         | Meta build any motes or revenue streams. It just progresses our
         | collective knowledge.
         | 
         | On the "human progress" scale it's fantastic to put lots of
         | smart people in a room and let them do their thing. But from a
         | business perspective it seems to almost never pay off. Waiting
         | on the irrational charity of businesses executive is probably
         | not the best way to structure thing.
         | 
         | I'd tell them to go become academics.. but all the academics I
         | know are just busy herding their students and attending
         | meetings
        
           | whiplash451 wrote:
           | Indeed. And it feels like there is this untold in-between
           | where if you belong to an unknown applied AI team, you don't
           | have to deal with academia's yak shaving, you don't have to
           | deal with Meta's politics and you end up single handedly
           | inventing TRMs.
        
           | Gigachad wrote:
           | Perhaps these companies just end up with so much money that
           | they can't possibly find ways to spend all of it rationally
           | for purely product driven work and just end up funding
           | projects with no clear business case.
        
             | trenchpilgrim wrote:
             | Or they hire researchers specifically so a competitor or
             | upstart can't hire them and put them to work on something
             | that disrupts their cash cow.
        
           | godelski wrote:
           | > I can't think of it ever really paying off
           | 
           | Sure worked for Bell Labs
           | 
           | Also it is what big tech was doing until LLMs hit the scene
           | 
           | So I'm not sure what you mean by it never paying off. We were
           | doing it right up till one of those things seemed to pay off
           | and then hyper focused on it. I actually think this is a
           | terrible thing we frequently do in tech. We find promise in a
           | piece of tech, hyper focus on it. Specifically, hyper focus
           | on how to monetizing it which ends up stunting the technology
           | because it hasn't had time to mature and we're trying to
           | monetize the alpha product instead of trying to get that
           | thing to beta.                 > But from a business
           | perspective it seems to almost never pay off.
           | 
           | So this is actually what I'm trying to argue. It actually
           | does pay off. It has paid off. Seriously, look again at
           | Silicon Valley and how we got to where we are today. And look
           | at how things changed in the last decade...
           | 
           | Why is it that we like off the wall thinkers? That
           | programmers used to be known as a bunch of nerds and weirdos.
           | How many companies were started out of garages (Apple)? How
           | many started as open source projects (Android)? Why did
           | Google start giving work lifestyle perks and 20% time?
           | 
           | So I don't know what you're talking about. It has frequently
           | paid off. Does it always pay off? Of course not! It
           | frequently fails! But that is pretty true for everything.
           | Maybe the company stocks are doing great[0], but let's be
           | honest, the products are not. Look at the last 20 years and
           | compare it to the 20 years before that. The last 20 years has
           | been much slower. Now maybe it is a coincidence, but the
           | biggest innovation in the last 20 years has been in AI and
           | from 2012 to 2021 there were a lot of nice free reign AI
           | research jobs at these big tech companies where researchers
           | got paid well, had a lot of autonomy in research, and had a
           | lot of resources at their disposal. It really might be a
           | coincidence, but a number of times things like this have
           | happened in history and they tend to be fairly productive. So
           | idk, you be the judge. Hard to conclude that this is
           | definitely what creates success, but I find it hard to rule
           | this out.                 > I'd tell them to go become
           | academics.. but all the academics I know are just busy
           | herding their students and attending meetings
           | 
           | Same problem, different step of the ladder
           | 
           | [0] https://news.ycombinator.com/item?id=45555175
        
           | heavyset_go wrote:
           | How many patents did that research result in that paid off in
           | terms of use, licensing and royalties?
        
           | gopher_space wrote:
           | The problem here is management expecting researchers to dump
           | out actionable insights like a chicken laying eggs.
           | Researchers exist so that you can rifle through their notes
           | and steal ideas.
        
           | iisan7 wrote:
           | It paid off for PARC, iirc the laser printer justified lots
           | of other things that Xerox didn't profit from but turned out
           | to be incredibly important.
        
         | zer0zzz wrote:
         | > I really think if they just took a step back and stop being
         | so metric focused and let their people freely explore then
         | they'd be win..
         | 
         | This is very true, and more than just in ai.
         | 
         | I think if they weren't so metric focused they probably
         | wouldn't have hit so much bad publicity and scandal too.
        
         | bboygravity wrote:
         | AI progress has slowed down?! By what metric?
         | 
         | Quite the statement for anybody who follows developments
         | (without excluding xAI).
        
         | RataNova wrote:
         | The money chase is real. You can kind of tell who's in it for
         | the comp package vs. who'd be doing the same work on a laptop
         | in their garage if that's what it took
        
       | ipsum2 wrote:
       | This has nothing to do with superintelligence, it's just the
       | people that were working on the paper prior to the re-org
       | happened to publish after the name change.
       | 
       | Though it is notable that contrary to many (on HN and Twitter)
       | that Meta would stop publishing papers and be like other AI labs
       | (e.g. OpenAI). They're continued their rapid pace of releasing
       | papers AND open source models.
        
         | Zacharias030 wrote:
         | Should be the top comment.
         | 
         | MSL is not only those few high profile hires.
        
         | ekianjo wrote:
         | Open weights models, not open source. And even their weights
         | are under a specific license not as permissive as apache 2.
        
           | drexlspivey wrote:
           | Does an "open source" model the way you describe it exist or
           | is it a mythical creature?
        
             | qcoret wrote:
             | Unicorns also don't exist, but we don't change the
             | definition to include horses.
        
               | jakupovic wrote:
               | Prove to me that unicorns don't exist, first level
               | arguments only!
        
               | aerhardt wrote:
               | The first level argument is that old horse, burden of
               | proof.
        
             | omneity wrote:
             | There aren't many but they do exist. OLMo for example.
        
             | Rattled wrote:
             | Olmo by AllenAI and Pythia by EleutherAI.
        
             | ayewo wrote:
             | An open source model does exist _now_ [1] and is
             | multilingual. Previous discussion [2].
             | 
             | [1] https://ethz.ch/en/news-and-events/eth-
             | news/news/2025/07/a-l...
             | 
             | [2] https://news.ycombinator.com/item?id=44535637
        
             | CaptainOfCoit wrote:
             | It does, but does it matter? Even if every software
             | released in 2025 was proprietary, doesn't make their
             | published binaries "open source" because no other software
             | could be classified as "open source".
             | 
             | We name things based on what they are, not based on the
             | lack of other things.
        
           | HPsquared wrote:
           | This is the right terminology. Model weights are literally
           | compiled binary data; they are the output of an algorithm run
           | on a bunch of source data. That training dataset is the
           | "source" of the model. Training data (or the scripts used to
           | generate it) is human-readable and modifiable, like source
           | code. Binary weights are not.
        
             | carom wrote:
             | Just to note though, source copyright extends to its
             | compiled form. There is probably an analogue there for
             | model weights.
        
               | jeremyjh wrote:
               | Tell me about the companies that own the copyrights to
               | their training data.
        
             | phkahler wrote:
             | Binary weights can still be "edited" with additional
             | training.
        
           | sdeframond wrote:
           | I propose that from now on we call freewares "open binaries".
        
           | hippo22 wrote:
           | I'm not a lawyer, but I believe that the weights aren't
           | subject to copyright. So, you can use them outside of Meta's
           | license agreement provided you get them from somewhere else.
        
         | pityJuke wrote:
         | What model(s) have Meta released since the Lab re-org?
         | 
         | Also, that wasn't based on purely hearsay, Zuck explicitly
         | said:
         | 
         | > We believe the benefits of superintelligence should be shared
         | with the world as broadly as possible. That said,
         | superintelligence will raise novel safety concerns. We'll need
         | to be rigorous about mitigating these risks and careful about
         | what we choose to open source. Still, we believe that building
         | a free society requires that we aim to empower people as much
         | as possible. [0]
         | 
         | [0]: https://www.meta.com/superintelligence/
        
           | ipsum2 wrote:
           | That has always been the policy. To answer your question,
           | Meta has released ~100 models since the Superintelligence Lab
           | reorg.
           | 
           | https://huggingface.co/facebook/models
           | 
           | The most interesting ones to me are:
           | 
           | - CWM (Code world model), an LLM for coding
           | https://github.com/facebookresearch/cwm
           | 
           | - DINOv3, A vision encoder https://ai.meta.com/dinov3/
           | 
           | - MAPAnything, a 3d reconstruction model
           | https://huggingface.co/facebook/map-anything
           | 
           | - VJEPA v2, Self-supervised video pre-training model
           | https://github.com/facebookresearch/vjepa2
        
           | gessha wrote:
           | You still believe anything that comes out of his mouth?
        
           | PatronBernard wrote:
           | When did Zuck start caring about society?
        
             | cwmoore wrote:
             | Is this a trick question? Probably before he was even born.
        
           | parpfish wrote:
           | > We believe the benefits of superintelligence should be
           | shared with the world as broadly as possible.
           | 
           | i'd interpret that as meaning "everybody is welcome to be our
           | customer, but we're still control all of it"
        
         | RataNova wrote:
         | Still, I think the optics matter... the fact that Meta's still
         | putting out technical work (and open sourcing it) after the
         | restructure says a lot about where they want to position
         | themselves
        
       | bigcat12345678 wrote:
       | https://docs.lamini.ai/memory_rag/ Similar approaches have been
       | tried before already
        
       | pppoe wrote:
       | I find it absurd that, compared to the past, large companies now
       | have more abundant stock prices and cash than ever before, yet
       | nearly every AI Lab in these companies is facing greater pressure
       | than ever, being asked to generate short-term profits. In the
       | midst of AI's unprecedented boom, the research environment and
       | atmosphere in the industry seem to have worsened compared to the
       | past.
        
         | signatoremo wrote:
         | Is this Meta's lab pressured to generate short term profits?
         | 
         | Which other under pressure labs are you talking about?
        
         | sefrost wrote:
         | Is it because of the "winner takes all" and "lock-in effects"
         | of being the first to market?
        
       | foldl2022 wrote:
       | So, show me the model weights, please.
        
       | yalogin wrote:
       | I am not surprised because the culture at meta is not at all,
       | even in the slightest, to focus on science for the sake of it.
       | It's actively actively purged out of you. The focus is on metrics
       | and how the bottom line is impacted. So this is in line with that
        
         | rhetocj23 wrote:
         | Yeah and this problem is near impossible to fix once it has
         | infested into the culture of the firm.
        
           | DangitBobby wrote:
           | It's not always a bad thing though, like in this case they
           | looked for a practical win and found one because impractical
           | wins can't make them money.
        
         | georgeburdell wrote:
         | It's not that simple. I worked at a supplier of Meta and they
         | paid us large NREs to fund our exploratory work
        
         | alex1138 wrote:
         | "People are using our service more!" turns out to be a horrible
         | metric when they outright lie to you (x has sent you a message!
         | - when no message exists)
        
       | CShorten wrote:
       | Here is a video I made diving into the paper, hopefully helpful!
       | 
       | https://www.youtube.com/watch?v=Ek0tZootK00
        
         | htk wrote:
         | I like your style, subscribed!
        
           | CShorten wrote:
           | Thank you so much!
        
       | nmca wrote:
       | This is not work by any of the high profile new hires, in case
       | folks are confused.
        
       | elyobo wrote:
       | Can we have a more informative, less clickbaity, title?
        
         | dang wrote:
         | What would a more informative, less clickbaity title be?
         | 
         | (preferably using representative language from the article)
        
           | airstrike wrote:
           | Meta Superintelligence Labs' first paper is about RAG
        
             | dang wrote:
             | Ok thanks! Belatedly updated.
        
         | smeeger wrote:
         | there should be a guideline to get rid of clickbait titles. its
         | an epidemic here
        
           | dang wrote:
           | There is of course such a guideline:
           | https://news.ycombinator.com/newsguidelines.html
           | 
           | We don't catch every case, but if you're talking about the
           | frontpage, I'm surprised to hear you say "epidemic". What are
           | some recent examples?
        
       | puttycat wrote:
       | Seems very incremental and very far from the pompous
       | 'superintelligence' goal.
        
         | antonvs wrote:
         | It's unlikely that the existing LLM architecture will evolve
         | into anything that resembles superintelligence any more than it
         | does already.
         | 
         | Which means that modifications to the architecture, and
         | combining it with other components and approaches, are the next
         | likely step. This paper fits that.
        
         | naasking wrote:
         | A 30 fold improvement seems a tad more than incremental.
        
           | vasco wrote:
           | I can start brushing my teeth 30 times faster but it won't
           | change my life. This is nice for RAG but it's a very
           | localized improvement. And 30x sounds big but is just an
           | order of magnitude improvement also.
        
             | naasking wrote:
             | Brushing your teeth is not central to your life, recalling
             | facts correctly is, and a 30 fold improvement in the latter
             | very well could change your life. I'll leave it to you to
             | figure out which is a better analogy to RAG.
        
               | vasco wrote:
               | Just remember that in this example you don't remember 30x
               | more things, you just remember the same things 30x
               | faster. That is a significant difference.
        
         | btilly wrote:
         | If you can collapse "retrieve this complex chunk when it is
         | needed" into a single token, what else can you put into a
         | token?
         | 
         | "Send this through the math coprocessor." "Validate against the
         | checklist." "Call out to an agent for X." "Recheck against
         | input stream Y." And so on.
         | 
         | Retrieval augmentation is only one of many uses for this. If
         | this winds up with better integration with agents, it is very
         | possible that the whole is more than the sum of its parts.
        
         | lukev wrote:
         | Think about it this way; they are encoding whole "thoughts" or
         | "ideas" as single tokens.
         | 
         | It's effectively a multimodal model, which handles "concept"
         | tokens alongside "language" tokens and "image" tokens.
         | 
         | A really big conceptual step, actually, IMO.
        
       | koolala wrote:
       | Did a "superintelligence" lab publish a superintelligence related
       | paper with no results for intelligence? What measured
       | improvements did this proposal make in their LLM's intelligence?
        
       | pbd wrote:
       | https://github.com/simulanics/REFRAG
        
       | singularity2001 wrote:
       | somewhere in my hacker news comment history I presented this very
       | idea
        
       | asim wrote:
       | This was inevitable. You can't keep training LLMs and expect
       | that's the answer to the evolution of AI. Yes it'll happen and
       | we'll keep creating new more refined and bigger models but it's
       | like DNA or something like the cortex of the brain. After that
       | you need these systems that essentially "live" for years
       | digesting information and develop a more refined way to process,
       | store and retrieve the information. Compression of RAG was also
       | inevitable. It's like the btree index of a database. The thing
       | is, we're probably one or two iterations away from being good
       | enough on the RAG pipeline and then we'll need to focus more on
       | the other pieces of sensory input that need to be connected and
       | processed at higher throughput. Right now it's not fast or
       | efficient enough. This is where the likes of Google will shine.
       | They are probably two decades ahead of everyone on internal
       | technology and there is some team with the breakthrough but it
       | hasn't seen the light of day yet. What's coming out of DeepMind
       | is really a forced effort in productization and publication of
       | work in a consumable format but internally they are likely way
       | ahead. I don't have as much faith in Meta's efforts despite
       | seeing things like this. Quite frankly those people, the ones
       | doing the work should move to more honourable companies. Not feed
       | crack addiction in the form of Meta's universe.
        
         | smeeger wrote:
         | exactly. the real focus internally is working on new
         | architectures. there is no other possibility.
        
       | zem wrote:
       | this was really weird to read:
       | 
       | > But RAG is a very real world, practical topic for something as
       | significant as a new lab's first paper.
       | 
       | I would expect exactly the opposite - that a new lab would put
       | out a few random papers that happen to be in areas their
       | researchers were interested in and already working on, and once
       | people had been working together a while and developed some
       | synergy they would maybe come out with something really
       | groundbreaking.
       | 
       | do people really view a "first paper" as something deeply
       | significant and weighty? because that just seems like a good way
       | to get bogged down in trying to second guess whether any given
       | paper was good enough to be your all-important debut!
        
         | Al-Khwarizmi wrote:
         | As an academic I would expect the same as you, and no, to my
         | knowledge "first paper" is meaningless, at least in academia.
         | Most people's first paper is some small contribution to what
         | their PhD supervisor is doing at the time, where the student
         | tries their best at writing but it ends up so heavily edited
         | that probably 90% of the final text comes from the supervisor
         | :) So typically first papers don't define or represent a
         | researcher. When you start you just don't have the experience
         | to have a great idea and carry it through to a good paper.
         | 
         | Of course here we are talking about a lab, not an individual
         | person, but still I haven't heard of first papers being
         | considered special in any way, even for labs.
        
       | schmorptron wrote:
       | One thing I don't get about the ever-reoccuring RAG discussions
       | and hype men proclaiming "Rag is dead", is that people seem to be
       | talking about wholly different things? My mental model is that
       | what is called RAG can either be:
       | 
       | - a predefined document store / document chunk store where every
       | chunk gets a a vector embedding, and a lookup decides what gets
       | pulled into context as to not have to pull whole classes of
       | document, filling it up
       | 
       | - the web search like features in LLM chat interfaces, where they
       | do keyword search, and pull relevant documents into context, but
       | somehow only ephemerally, with the full documents not taking up
       | context in the future of the thread (unsure about this, did I
       | understand it right?) .
       | 
       | with the new models with million + tokens of context windows,
       | some where arguing that we can just throw whole books into the
       | context non-ephemerally, but doesnt that significantly reduce the
       | diversity of possible sources we can include at once if we hard
       | commit to everything staying in context forever? I guess it might
       | help with consistency? But is the mechanism with which we decide
       | what to keep in context not still some kind of RAG, just with
       | larger chunks of whole documents instead of only parts?
       | 
       | I'd be extatic if someone who really knows their stuff could
       | clear this up for me
        
         | kgeist wrote:
         | Technically, RAG is anything that augments generation with
         | external search. However, it often has a narrower meaning:
         | "uses a vector DB."
         | 
         | Throwing everything into one large context window is often
         | impractical - it takes much more time to process, and many
         | models struggle to find information accurately if too much is
         | going on in the context window ("lost in the middle").
         | 
         | The "classic" RAG still has its place when you want low latency
         | (or you're limited by VRAM) and the results are already good
         | enough.
        
         | make3 wrote:
         | no one is saying rag is dead, you're never going to put the
         | whole Internet in the context of the model, & the more you put
         | the more expensive it is.
        
           | viraptor wrote:
           | Lots of people say rag is dead: https://kagi.com/search?q=rag
           | +is+dead&r=au&sh=g52XEb93vx691I...
        
         | GistNoesis wrote:
         | The answer is adaptability.
         | 
         | In both cases for "Question Answering" it's about similarity
         | search but there are two main orthogonal differences between
         | RAG and Non-RAG :
         | 
         | -Knowing the question at the time of index building
         | 
         | -Higher order features : the ability to compare fetched
         | documents with one another and refine the question
         | 
         | Non-RAG, aka multi-layer (non-causal) transformer with infinite
         | context, is the more generic version, fully differentiable
         | meaning you can use machine learning to learn how to Non-RAG
         | better. Each layer of the transformer can use the previous
         | layer to reason and refine the similarity search. (A causal
         | transformer know the question at the time when it is feed the
         | question, and can choose to focus it's attention on different
         | part of the previously computed features of the provided
         | documents but may benefit from having some reflection token, or
         | better : be given the question before being presented the
         | documents (provided you've trained it to answer it like that).)
         | 
         | RAG is an approximation of the generic case to make it faster
         | and cheaper. Usually it breaks end-to-end differentiability by
         | using external tools, so this mean that if you want to use
         | machine learning to learn how to RAG better you will need to
         | use some variant of Reinforcement Learning which is slower to
         | learn things. RAG usually don't know the question at the time
         | of index building, and documents are treated independently of
         | each other, so no (automatic) higher order features (embeddings
         | are fixed).
         | 
         | A third usual approximation, is to feed the output of RAG into
         | Non-RAG, to hopefully get the best of both world. You can learn
         | the Non-RAG given RAG with machine learning (if you train it
         | with some conversations where it used RAG), but the RAG part
         | won't improve by itself.
         | 
         | Non-RAG need to learn so it needs a big training dataset, but
         | fortunately it can pick-up question answer pair in an
         | unsupervised fashion when you feed it the whole web, and you
         | only need a small instruction training and preference
         | optimization dataset to shape it to your need. If performance
         | isn't what you expect in a specific case, you can provide more
         | specific examples and retrain the model until it gets it and
         | you get better performance for the case you were interested in.
         | You can improve the best case but it's hard to improve the
         | worst case.
         | 
         | RAG has more control on what you feed it but content should be
         | in a more structured way. You can prevent worst cases more
         | easily but it's hard to improve good case.
        
         | impossiblefork wrote:
         | We can't throw in infinite things in the context though.
         | 
         | My impression is that GPT-5 gets confused, not quite right
         | away, but after a couple of pages it has no idea. It doesn't
         | take pages on pages before it forgets things.
        
           | aerhardt wrote:
           | I'm currently experimenting with prompts of ~300k tokens for
           | a certain classification task and I think I _might_ be able
           | to make it work. GPT5 chokes but Gemini 2.5 Pro is showing
           | promise. Jury's still out and I might change my tune in a
           | couple of weeks.
        
             | impossiblefork wrote:
             | It should also be said, that what I say here is focused on
             | things where these models have problems.
             | 
             | For example, I consider the model confused when it starts
             | outputting stereotyped or cliche responses, and I
             | intentionally go at problems that I know that the models
             | have problems with (I already know they can program and do
             | some maths, but I want to see what they can't do). But if
             | you're using them for things they're made for, and which
             | aren't confusing, such as people arguing with each other,
             | then you are probably likely to succeed.
             | 
             | Prompts with lots of examples are reasonable and I know
             | they can get very long.
        
       | armcat wrote:
       | I couldn't immediately see in their graphs/tables any comparison
       | against simple lexical/statistical based context compression,
       | such as candidate selection of chunks using TF-IDF, word overlap
       | etc. For most of us in the industry we need to find these quick
       | wins that give us equivalent performance to sending huge amount
       | of information to the LLM, while compressing by 10x.
        
       | macleginn wrote:
       | So this looks essentially like continuous prompting (see prefix
       | tuning) with RL-driven selection of what to present as tokens and
       | what as continuous inputs (embeddings).
        
       | i5heu wrote:
       | Can we please get rid of the clickbait titles?
        
       | RataNova wrote:
       | Refreshing (and slightly unexpected) to see Meta
       | Superintelligence start with something this practical instead of
       | a headline-grabbing new model
        
       | mark_l_watson wrote:
       | A great idea, bypassing as much conversion as possible between
       | vector space and natural language tokens. Reminds me of a
       | discussion of having AI's "talk" to each other using vector
       | space.
       | 
       | There was an interesting quote "plain old BM25 from 1994
       | outperforms vector search on recall" and super relevant to what I
       | did yesterday. I am trying to use small local models more often
       | and yesterday I wrote Common Lisp code that uses a large corpus
       | of text and a user query or prompt to construct a fairly concise
       | one-shot prompt with select context from the text corpus. This is
       | RAG, and I used both BM25 and vector embeddings matching. I added
       | the code and an example as a new chapter in my CL book (link
       | directly to new material:
       | https://leanpub.com/lovinglisp/read#leanpub-auto-autocontext...)
       | yesterday afternoon. BM25 is fast. This is new code, and I will
       | certainly be experimenting more with it, but as-is it is useful
       | when working with small local LLMs.
        
       | Palmik wrote:
       | The observation about the "block-diagonal patterns" in RAG isn't
       | new and has been exploited / explored before:
       | 
       | - https://arxiv.org/abs/2410.07590 (literally titled "Block-
       | Attention for Efficient RAG")
       | 
       | - https://arxiv.org/abs/2409.15355v3
       | 
       | - https://arxiv.org/abs/2212.10947
       | 
       | The REFRAG paper does not cite any of these.
        
       | SknCode wrote:
       | I am not sure if I understand things correctly.
       | 
       | I came to believe the LLMs work with token embeddings. Is then
       | the REFRAG only "something" in front of the LLM, and the decoder
       | is the RL policy which expands only some token chunk embeddings
       | into token embeddings feedable to LLM? Or the REFRAG needs you to
       | 'tune' the LLM to be able to work with both token embeddings and
       | token chunk embeddings?
        
       ___________________________________________________________________
       (page generated 2025-10-12 23:00 UTC)