[HN Gopher] Meta Superintelligence Labs' first paper is about RAG
___________________________________________________________________
Meta Superintelligence Labs' first paper is about RAG
https://arxiv.org/abs/2509.01092
Author : skadamat
Score : 392 points
Date : 2025-10-11 23:16 UTC (23 hours ago)
(HTM) web link (paddedinputs.substack.com)
(TXT) w3m dump (paddedinputs.substack.com)
| bigyabai wrote:
| > Long awaited first paper from Meta Superintelligence Labs is
| not a model layer innovation. What does this mean?
|
| It means you're reading into it too much and need to be let down,
| gently, from the hype train.
| nine_k wrote:
| A great post, it starts with this:
|
| _TL;DR
|
| * MSI's first paper, REFRAG, is about a new way to do RAG.
|
| * This slightly modified LLM converts most retrieved document
| chunks into compact, LLM-aligned chunk embeddings that the LLM
| can consume directly.
|
| * A lightweight policy (trained with RL) decides which chunk
| embeddings should be expanded back into full tokens under a
| budget; the LLM runs normally on this mixed input.
|
| * The net effect is far less KV cache and attention cost, much
| faster first-byte latency and higher throughput, while preserving
| perplexity and task accuracy in benchmarks._
|
| I wish more long posts followed this model of a scientific paper.
| jongjong wrote:
| Interesting. All developers I know who tinkered around with
| embeddings and vector similarity scoring were instantly hooked.
| The efficiency of computing the embeddings once and then reusing
| as many times as needed, comparing the vectors with a cheap
| <30-line function is extremely appealing. Not to mention the
| indexing capabilities to make it work at scale.
|
| IMO vector embedding is the most important innovation in
| computing of the last decade. There's something magical about it.
| These people deserve some kind of prize. The idea that you can
| reduce almost any intricate concept including whole paragraphs to
| a fixed-size vector which encapsulates its meaning and proximity
| to other concepts across a large number of dimensions is pure
| genius.
| _jayhack_ wrote:
| Vector embedding is not an invention of the last decade.
| Featurization in ML goes back to the 60s - even deep learning-
| based featurization is decades old at a minimum. Like
| everything else in ML this became much more useful with data
| and compute scale
| senderista wrote:
| Yup, when I was at MSFT 20 years ago they were already
| productizing vector embedding of documents and queries (LSI).
| jongjong wrote:
| Interesting. Makes one think.
| senderista wrote:
| To be clear, LSA[1] is simply applied linear algebra, not
| ML. I'm sure learned embeddings outperform the simple
| SVD[2] used in LSA.
|
| [1]
| https://en.wikipedia.org/wiki/Latent_semantic_analysis
|
| [2] https://en.wikipedia.org/wiki/Singular_value_decompos
| ition
| ekidd wrote:
| Vector embeddings are slightly interesting because they come
| pre-trained with large amounts of data.
|
| But similar ways to reduce huge numbers of dimensions to a much
| smaller set of "interesting" dimensions have been known for a
| long time.
|
| Examples include principal component analysis/single value
| decomposition, which was the first big breakthrough in face
| recognition (in the early 90s), and also used in latent
| semantic indexing, the Netflix prize, and a large pile of other
| things. And the underlying technique was invented in 1901.
|
| Dimensionality reduction is cool, and vector embedding is
| definitely an interesting way to do it (at significant
| computational cost).
| liampulles wrote:
| If you take the embedding for king, subtract the embedding for
| male, add the embedding for female, and lookup the closest
| embedding you get queen.
|
| The fact that dot product addition can encode the concept of
| royalty and gender (among all other sorts) is kind of magic to
| me.
| puttycat wrote:
| This was actually shown to not really work in practice.
| intelkishan wrote:
| I have seen this particular work example to work. You don't
| get the exact match but the closest one is indeed Queen.
| mirekrusin wrote:
| Shouldn't this itself be a part of training?
|
| Having set of "king - male + female = queen" like
| relations, including more complex phrases to align
| embeddings.
|
| It seems like terse, lightweight, information dense way
| to address essence of knowldge.
| godelski wrote:
| Yes but it doesn't generalize very well. Even on simple
| features like gender. If you go look at embeddings you'll
| find that man and woman are neighbors, just as king and
| queen are[0]. This is a better explanation for the result
| as you're just taking very small steps in the latent
| space.
|
| Here, play around[1] mother - parent +
| man = woman father - parent + woman = man
| father - parent + man = woman mother - parent +
| woman = man woman - human + man = girl
|
| Or some that should be trivial woman -
| man + man = girl man - man + man = woman
| woman - woman + woman = man
|
| Working in very high dimensions is funky stuff. Embedding
| high dimensions into low dimensions results in even
| funkier stuff
|
| [0] https://projector.tensorflow.org/
|
| [1] https://www.cs.cmu.edu/~dst/WordEmbeddingDemo/
| yellowcake0 wrote:
| so addition is not associative?
| godelski wrote:
| I think you're missing the point
| yellowcake0 wrote:
| It's a pretty exotic type of addition that would lead to
| the second set of examples, just trying to get an idea of
| its nature.
| CuriouslyC wrote:
| Vector embeddings are so overhyped. They're decent as a
| secondary signal, but they're expensive to compute and fragile.
| BM25 based solutions are more robust and WAY lower latency, at
| the cost of some accuracy loss vs hybrid solutions. You can get
| the majority of the lift from hybrid solutions with ingest time
| semantic expansion/reverse hyde type input annotation with a
| sparse embedding BM25 at a fraction of the computational cost.
| jongjong wrote:
| But it's much cheaper to compute than inference, and also you
| only have to compute once for any content and reuse multiple
| times.
| calf wrote:
| The idea of reducing language to mere bits, in general, sounds
| like it would violate the Godel/Turing theorems about
| computability.
| mountainriver wrote:
| This was a very obvious next step, I played around with
| implementing something similar at one point.
|
| In general we need to make it simpler for LLMs to take in
| different forms of embeddings. At least frameworks that simplify
| it.
| cm2012 wrote:
| At first I thought the super intelligence wrote a novel
| scientific paper
| Imnimo wrote:
| I'm curious whether this is work that was specifically begun
| under the "superintelligence" umbrella, or if it's just that the
| people who were working on it had been shifted to the
| Superintelligence team by the time they wrote the paper. I would
| guess the former?
| lblume wrote:
| Another commenter claims the latter:
| https://news.ycombinator.com/item?id=45554169
| naasking wrote:
| > the core insight here is actually: if embeddings are generated
| by layers within the LLM, it makes no sense to convert them back
| to natural language, just for another LLM to compress those
| tokens back to embeddings.
|
| Doesn't this tie the two layers together in a way that they can't
| evolve separately?
| xvector wrote:
| Working in big tech it's pretty wild to see how integral AI has
| become to our work internally, vs the public perception of it.
| People are NOT prepared.
| fishmicrowaver wrote:
| Not prepared for what? Seems like the rest of the world is
| desperate to be shown the way to unlock something of value?
| Workaccount2 wrote:
| I think at this point it's software devs looking for the
| value unlock.
|
| Non-software devs are actually making functional programs for
| themselves for the first time ever. The value is crazy.
| ceejayoz wrote:
| It's not the first time ever. People did the same with
| Access and HyperCard in the 90s.
| fishmicrowaver wrote:
| Sure but I'm the real world do you think businesses are
| going to deploy piles of code into production generated
| this way? No, non technical people will continue to whip up
| MS PowerApps. AI generated code has no value to many
| businesses.
| xvector wrote:
| The value of AI is not in generating code. That's just a
| "nice-to-have."
|
| The value of AI is in having a scalable, human-like
| decision maker that you can plug into anything, anywhere.
| This has unlocked countless use cases for my team, that
| we could scarcely imagine a few years ago.
| cbg0 wrote:
| "Human-like decision maker" except it's just as if not
| more unpredictable than a human, has no understanding of
| what it's actually outputting or the impact of it, and it
| isn't concerned with losing their job or facing legal
| repercussions for their actions.
| xvector wrote:
| There are plenty of ways to manage those drawbacks, and a
| mind-boggling number of use cases where it's "good
| enough" already.
|
| But it's not my job to convince you, my lived experience
| working with the tech is enough to convince me, and
| that's all I care about, to be honest. Everyone else will
| get there sooner or later.
| Workaccount2 wrote:
| You don't need production level code to make your life
| easier.
|
| You're missing the forest for the trees. Most people
| can't even make a block diagram, but they can explain
| what they have and what they want to do with it.
| terminalshort wrote:
| 1. Hyperbolic statement about LLM capabilities with no concrete
| examples
|
| 2. Wild claim that the companies that sell LLMs are actually
| downplaying their capabilities instead of hyping them
| danielmarkbruce wrote:
| Yup, he's totally lying. Not happening. Just carry on.
| BoorishBears wrote:
| Agreed, but why are they lying?
| danielmarkbruce wrote:
| That was sarcasm. He's not lying.
| BoorishBears wrote:
| Didn't read any sarcasm in what he said?
| crorella wrote:
| Personal experience here in a FAANG, there has been a
| considerable increase in: 1. Teams exploring how to leverage
| LLMs for coding. 2. Teams/orgs that already standardized some
| of the processes to work with LLMs (MCP servers, standardized
| the creation of the agents.md files, etc) 3. Teams actively
| using it for coding new features, documenting code,
| increasing test coverage, using it for code reviews etc.
|
| Again, personal, experience, but in my team ~40-50% of the
| PRs are generated by Codex.
| rhetocj23 wrote:
| Im sure the MBA folks love stats like that - theres plenty
| that have infested big tech. I mean Pichai is an
| MBA+Mckinsey Alumni.
|
| Ready for the impending lay off fella?
| alex-nt wrote:
| There are places that offer Copilot to any team that
| wants it, and then behind the scenes they informed their
| managers that if the team (1+ persons) adopts it they
| will have to shed 10%+ human capacity (lose a person,
| move a person, fire a person) in the upcoming quarters
| next year.
| ruszki wrote:
| "Teams exploring how to leverage [AI]s for [anything]" is
| true for about a decade now in every large multinational
| companies at every level. It's not new at all. AI is the
| driving buzzword for a while now, even well before ChatGPT.
| I've encountered many people who just wanted the stamp that
| they use AI, no matter how, because my team was one of the
| main entry point to achieve this at that specific company.
| But before ChatGPT and co, you had to work for it a lot, so
| most of them failed miserably, or immediately backtracked
| when they realized this.
| incompatible wrote:
| I've heard of one study that said AI slows developers down,
| even when they think it's helping.
|
| https://www.infoworld.com/article/4061078/the-productivity-p...
| xvector wrote:
| AI may slow coding a bit but dramatically reduces cognitive
| load.
|
| The real value of AI isn't in helping coding. It's in having
| a human-like intelligence to automate processes. I can't get
| into details but my team is doing things that I couldn't
| dream of three years ago.
| qingcharles wrote:
| It does dramatically reduce cognitive load. I think that
| part is understated and lost to the headline of how it
| writes two thousand lines of code in 30 seconds.
| naasking wrote:
| It is true sometimes, but other times it saves hours. We're
| all still in the learning stage of how best to use these new
| tools, and their capabilities are growing constantly.
| gdulli wrote:
| Not everyone has given in to the crutch.
| xvector wrote:
| That's why I still use an abacus.
| gdulli wrote:
| The abacus skills are safely obsolete, the skills of
| general thinking and creativity must not become that. This
| couldn't be more specious.
|
| Meme thinking like this, repeating something you've heard
| as reflex without regard to whether it fits a situation, is
| the exact kind of unoriginality we can't allow to become
| the default mode of thinking.
| xvector wrote:
| I am not the one being unoriginal here. You are thinking
| that AI will obsolete critical thinking, so there's no
| point developing with it.
|
| However, in your moral crusade against using AI you are
| missing the big picture. No one is making you code with
| AI. But there are many things that you can only build if
| you use AI as a component.
|
| The ability to plug a human-like decisionmaker into
| anything, anywhere massively expands what we can build.
| There are applications and use cases that you cannot even
| conceptualize without having the ability to plug AI in.
| This does not impacting critical thinking whatsoever.
|
| Be original. Put your engineer hat on and think on what
| this new tool lets you build, that you couldn't
| beforehand.
| throw_this_one wrote:
| I find the AI can make me more creative. I don't have to
| waste mental energy on boilerplate or straightforward
| stuff that would take me typing through some event
| processing loop etc. I can extract out and reuse
| components easier and focus on big picture design. Or
| build more bespoke admin tools that I wouldn't have
| wanted to waste time building some JS stuff before.
| godelski wrote:
| It's kinda funny, Meta has long had some of the best in the
| field, but left them untapped. I really think if they just took a
| step back and stop being so metric focused and let their people
| freely explore then they'd be winning the AI race. But with this
| new team, I feel like meta mostly hired the people who are really
| good at gaming the system. The people that care more about the
| money than the research.
|
| A bit of this is true at every major lab. There's tons of
| untapped potential. But these organizations are very risk
| adverse. I mean why not continue with the strategy that got us to
| the point we're at in the first place. Labs used to hire
| researchers and give them a lot of free reign. But those times
| ended and AI progress also slowed down. Maybe if you want to get
| ahead you gotta stop thinking like everyone else
|
| Well meta... you can "hold me hostage" for a lot cheaper than
| those guys. I'm sure this is true for hundreds of passionate ML
| researchers. I'd take a huge pay cut to have autonomy and
| resources. I know for a fact there's many working at Mets right
| now that would do the same. Do maybe if you're going to throw
| money at the problem, diversify a bit and look back at what made
| SV what it is today and what made AI take leaps forward
| bobxmax wrote:
| I thought Alex Wang was a very curious choice. There are so
| many foundational AI labs with interesting CEOs... I get that
| Wang is remarkable in his own right, but he basically just
| built MTurk and timed the bubble.
|
| Doesn't really scream CEO of AGI to me.
| thereitgoes456 wrote:
| The reportings at the time said that he was Mark's 5th choice
| or similar. It is fairly clear he would prefer Ilya, Murati,
| Mark Chen, and perhaps others, but they said no, and Alex
| Wang was the first one to say yes.
| tsunamifury wrote:
| Why in the world would he want Murati? She has absolutely
| no technical chops and was not functionally CTO of OpenAI.
| shuckles wrote:
| Because she was CTO of OpenAI.
| CuriouslyC wrote:
| Pretty ironic when access to trade secrets and people
| skills is seen as more important in a technical field
| than technical competence.
| hn_throwaway_99 wrote:
| > was not functionally CTO of OpenAI.
|
| Why do you say that?
| tsunamifury wrote:
| Her history was entirely non technical up until openAI.
| hn_throwaway_99 wrote:
| I think that's total BS, based on this article about her,
| https://fortune.com/2025/10/03/mira-murati-career-ai-
| thinkin...
|
| 1. She has 2 BAs, one in math and one in mechanical
| engineering.
|
| 2. She was an "Advanced Concepts Engineer at Zodiac
| Aerospace from 2012 to 2013".
|
| 3. She was a product manager at Tesla on the Model X
|
| 4. She was VP of product and engineering at Leap Motion.
|
| Going from that fact that she wasn't a deep learning
| researcher to "her history was entirely non technical up
| until Open AI" is plain false. And plus, the job of CTO
| is 90%+ people management, and she appears more than
| smart enough and experienced enough to evaluate technical
| decisions of her team.
| bobxmax wrote:
| What technical chops does Sam Altman have?
| tsunamifury wrote:
| Altman is ceo not cto. Is Hackernews so ignorant now they
| don't understand these differences?
| seanmcau wrote:
| He started coding at age 8
| arthurcolle wrote:
| The self-supervised mesa-optimizer strikes again
| godelski wrote:
| A lot of people also don't know that many of the well known
| papers are just variations on small time papers with a fuck
| ton more compute thrown at the problem. Probably the
| strongest feature that correlates to successful researcher is
| compute. Many have taken this to claim that the GPU poor
| can't contribute but that ignores so many other valid
| explanations... and we wonder why innovation has slowed...
| It's also weird because if compute was all you need then
| there's a much cheaper option than Zuck paid. But he's paying
| for fame.
| rhetocj23 wrote:
| Frankly this is the reason why Im not convinced the current
| movement of LLMs will yield anything close to the dream.
|
| The right people to deliver immense progress dont exist
| right now.
| godelski wrote:
| > The right people to deliver immense progress dont exist
| right now.
|
| I wouldn't go this far. But I would say that we're not
| giving them a good shot.
|
| The people are always there, you just need to find them
| and enable them. How do you manage
| genius? You don't. -- Mervin Kelly
| BobbyTables2 wrote:
| It's funny.
|
| I learnt the hard way that communications/image/signal
| processing research basically doesn't care about Computer
| Architecture at the nuts and bolts level of compiler
| optimization and implementation.
|
| When they encounter a problem whose normal solution
| requires excessive amounts of computation, they reduce
| complexity algorithmically using mathematical techniques,
| and quantify the effects.
|
| They don't quibble about a 10x speed up, they reduce the
| "big O()" complexity. They could care less whether it was
| implemented in interpreted Python or hand-optimized
| assembly code.
|
| On one hand, I know there's a lot of talent in AI today.
| But throwing hardware at the problem is the dumbest way
| forward.
|
| WiFI adapters would be wheeled luggage if we had the same
| mentality during their development.
| godelski wrote:
| > They don't quibble about a 10x speed up, they reduce
| the "big O()" complexity. They could care less whether it
| was implemented in interpreted Python or hand-optimized
| assembly code.
|
| I can at least say that's not all of us. But you're
| probably right that this is dominating. I find it so
| weird since everyone stresses empirics yet also seems to
| not care about them. It took me my entire PhD to figure
| out what was really going on. I've written too many long
| winded rants on this site though
| shwaj wrote:
| At some point it becomes difficult to improve the O()
| complexity. How do you do better that the O(n-squared) of
| the Transformer, with acceptable tradeoffs? Many big
| brains in all the big labs are very aware of the
| importance of algorithmic advances. There is no low
| hanging fruit, but they're doing their best.
|
| Then _in parallel to that_ looking at compiler
| optimizations, and other higher-level algorithmic
| innovations such as Flash Attention (a classic at this
| point) which had a drastic impact on performance due to
| cache awareness, without changing the O() complexity.
| tomrod wrote:
| Sometimes it's the theory, sometimes it's the
| engineering, and often it's both.
| helix278 wrote:
| You make it sound like reducing the big O complexity is a
| dumb thing to do in research, but this is really the only
| way to make lasting progress in computer science.
| Computer architectures become obsolete as hardware
| changes, but any theoretical advances in the problem
| space will remain true forever.
| crystal_revenge wrote:
| > A lot of people also don't know that many of the well
| known papers are just variations on small time papers with
| a fuck ton more compute thrown at the problem.
|
| I worked for a small research heavy AI startup for a bit
| and it was heart breaking how many people I would interact
| with in that general space with research they worked hard
| and passionately on only to have been beaten to the punch
| by a famous lab that could rush the paper out quicker and
| at a larger scale.
|
| There were also more than a few instances of high-
| probability plagiarism. My team had a paper that had been
| existing for years basically re-written without citation by
| a major lab. After some complaining they added a footnote.
| But it doesn't really matter because no big lab is going to
| have to defend themselves publicly against some small
| startup, and their job at the big labs is to churn out
| papers.
| godelski wrote:
| > only to have been beaten to the punch by a famous lab
| that could rush the paper out quicker and at a larger
| scale.
|
| This added at least a year to my PhD... Reviewers kept
| rejecting my works saying "add more datasets" and such
| comments. That's nice and all, but on the few datasets I
| did use I beat out top labs and used a tenth of the
| compute. I'd love to add more datasets but even though I
| only used a tenth of the compute I blew my entire compute
| budget. Guess state of the art results, a smaller model,
| higher throughput, and 3rd party validation were not
| enough (use an unpopular model architecture).
|
| I always felt like my works were being evaluated as
| engineering products, not as research. >
| a few instances of high-probability plagiarism
|
| I was reviewing a work once and I actually couldn't tell
| if the researchers knew that they ripped me off or not.
| They compared to my method, citing, and showing figures
| using it. But then dropped the performance metrics from
| the table. So I asked. I got them in return and saw that
| there was no difference... So I dove in and worked out
| that they were just doing 99% my method with additional
| complexity (computational overhead). I was pretty upset.
|
| I was also upset because otherwise the paper was good.
| The results were nice and they even tested our work in a
| domain we hadn't. Were they just upfront I would have
| gladly accepted the work. Though I'm pretty confident the
| other reviewers wouldn't have due to "lack of novelty."
|
| It's a really weird system that we've constructed. We're
| our own worst enemies. > their job at the
| big labs is to churn out papers.
|
| I'd modify this slightly. Their job is to get citations.
| Churning out papers really helps with that, but so does
| all the tweeting and evangelizing of their works. It's an
| unfortunate truth that as researchers we have to sell our
| works, and not just by the scientific merit that they
| hold. People have to read them after all. But we should
| also note that it is easier for some groups to get
| noticed more than others. Prestige doesn't make a paper
| good, but it sure acts as a multiplying factor for all
| the metrics we use for determining if it is good.
| tsunamifury wrote:
| Alexandr Wang is not interesting and a few steps short of a
| fraud that Mark had to bail out because he was so co
| invested.
|
| Shareholders should be livid if they knew a single thing
| about what was going on.
| typpilol wrote:
| Tell me more
| tsunamifury wrote:
| Scale promised cutting-edge data pipelines and model-
| training infra but mostly sold outsourced labeling with a
| tech veneer. Great margins, weak moat -- classic Valley
| overclaim, not outright fraud.
| didip wrote:
| I always wonder about that. Those $100m Mathematicians... how
| can they have rooms to think under Meta's crushing IMPACT
| pressure?
| trhway wrote:
| For just 10% of those money a $100M mathematician can hire 10
| $1M mathematicians or a whole math dept in some European
| university to do the work and the thinking for them and thus
| beat any pressure while resting and vesting on the remaining
| 90%.
| lblume wrote:
| Sure, but they weren't hired as managers, right?
| vasco wrote:
| Ok ok, another $1m/year to hire a manager.
| rhetocj23 wrote:
| "Maybe if you want to get ahead you gotta stop thinking like
| everyone else"
|
| Well for starters you need a leader who can rally the troops
| who "think(s) different" - something like a S Jobs.
|
| That person doesnt seem to exist in the industry right now.
| ProofHouse wrote:
| winning the AI race? Meta? Oh that was a good one. Zuck is a
| follower not a leader. It is in his DNA
| hamasho wrote:
| My theory is that as more people compete, the top candidates
| become those who are best at gaming the system rather than
| actually being the best. Someone has probably studied this. My
| only evidence is job applications for GAFAM and Tinder tho.
| godelski wrote:
| > Someone has probably studied this
|
| There's even a name for it
|
| https://en.wikipedia.org/wiki/Goodhart%27s_law
| julienreszka wrote:
| It's a false law tho. Collapses under scrutiny
| godelski wrote:
| Sorry, remind me; how many cobras are there in India?
| bandrami wrote:
| The Zoological Survey of India would like to know but
| hasn't figured out a good way to do a full census. If you
| have any ideas they would love to hear them.
|
| Naja naja has Least Concern conservation status, so there
| isn't much funding in doing a full count, but there are
| concerns as encroachment both reduces their livable
| habitat and puts them into more frequent contact with
| humans and livestock.
| oblio wrote:
| The comment was a joke.
|
| https://en.wikipedia.org/wiki/Perverse_incentive
| epwr wrote:
| Could you elaborate or link something here? I think about
| this pretty frequently, so would love to read something!
| vasco wrote:
| Metric: time to run 100m
|
| Context: track athlete
|
| Does it cease to be a good metric? No. After this you can
| likely come up with many examples of target metrics which
| never turn bad.
| godelski wrote:
| So what is your argument, that it doesn't apply
| everywhere therefore it applies nowhere?
|
| You're misunderstanding the root cause. Your example
| works as the the metric is well aligned. I'm sure you can
| also think of many examples where the metric is not well
| aligned and maximizing it becomes harmful. How do you
| think we ended up with clickbait titles? Why was everyone
| so focused on clicks? Let's think about engagement
| metrics. Is that what we really want to measure? Do we
| have no preference over users being happy vs users being
| angry or sad? Or are those things much harder to measure,
| if not impossible to, and thus we focus on our proxies
| instead? So what happens when someone doesn't realize it
| is a proxy and becomes hyper fixated on it? What happens
| if someone does realize it is a proxy but is rewarded via
| the metric so they don't really care?
|
| Your example works in the simple case, but a lot of
| things look trivial when you only approach them from a
| first order approximation. You left out all the hard
| stuff. It's kinda like...
|
| Edit: Looks like some people are bringing up metric
| limits that I couldn't come up with. Thanks!
| vasco wrote:
| > So what is your argument, that it doesn't apply
| everywhere therefore it applies nowhere?
|
| I never said that. Someone said the law collapses,
| someone asked for a link, I gave an example to prove it
| does break down in some cases at least, but many cases
| once you think more about it. I never said all cases.
|
| If it works sometimes and not others, it's not a law.
| It's just an observation of something that can happen or
| not.
| godelski wrote:
| > I never said all cases.
|
| You're right. My bad. I inferred that through the context
| of the conversation. > If it works
| sometimes and not others, it's not a law.
|
| I think you are misreading and that is likely what lead
| to the aforementioned misunderstanding. You're right that
| it isn't a _scientific_ law, but the term "law" gets
| thrown around a lot in a more colloquial manner.
| Unfortunately words are overloaded and have multiple
| meanings. We do the same thing to "hypothesis",
| "paradox", and lots of other things. I hope this
| clarifies the context. (even many of the physics laws
| aren't as strong as you might think)
|
| But there are many "laws" used in the same form. They're
| _eponymous_ laws[0], not _scientific_ ones. Read
| "adage". You'll also find that word used in the opening
| sentence on the Wiki article I linked as well as most (if
| not all) of them in [0]
|
| [0] https://en.wikipedia.org/wiki/List_of_eponymous_laws
| exe34 wrote:
| it doesn't break down - see comments about rules above.
| it was the perfect example to prove yourself wrong.
| vasco wrote:
| I disagree with all of those examples, they are
| misunderstanding what it means for the metric to break
| down in the context of the law, but alas. "If you run a
| different race" lol.
| exe34 wrote:
| could you explain what you think the difference is?
|
| a metric is chosen, people start to game the system by
| doing things that make the metric improve but the
| original intent is lost. increasingly specific rules/laws
| have to be made up to make the metric appear to work, but
| it becomes a lost cause as more and more creative ways
| are found to work around the rules.
| vasco wrote:
| Exactly, that's the definition. It doesn't apply to
| timing a 100m race. There's many such situations that are
| simple enough and with perfect information available
| where this doesn't break down and a metric is just a
| metric and it works great.
|
| Which is not to the detriment of the observation being
| true in other contexts, all I did was provide a counter
| example. But the example requires the metric AND the
| context.
| exe34 wrote:
| it wasn't a very good counter example.
| godelski wrote:
| Do you know certain shoes are banned in running
| competitions?
|
| There's a really fine line here. We make shoes to help us
| run faster and keep our feet safe, right? Those two are
| directly related, as we can't run very fast if our feet
| are injured. But how far can this be taken? You can make
| shoes that dramatically reduce the impact when the foot
| strikes the ground, which reduces stress on the foot and
| legs. But that might take away running energy, which adds
| stresses and strains to the muscles and ligaments. So you
| modify your material to put energy back into the person's
| motion. This all makes running safer. But it also makes
| the runner faster.
|
| Does that example hack the metric? You might say yes but
| I'm certain someone will disagree with you. There's
| always things like this where they get hairy when you get
| down to the details. Context isn't perfectly defined and
| things aren't trivial to understand. Hell, that's why we
| use pedantic programming languages in the first place,
| because we're dealing with machines that have to operate
| void of context[0]. Even dealing with humans is hard
| because there's multiple ways to interpret anything.
| Natural language isn't pedantic enough for perfect
| interpretation.
|
| [0] https://www.youtube.com/watch?v=FN2RM-CHkuI
| godelski wrote:
| > in the context of the law
|
| That's the key part. The metric has context, right?
|
| And that's where Goodhart's "Law" comes in. A metric has
| no meaning without context. This is why metrics need to
| be interpreted. They need to be evaluated in context.
| Sometimes this context is explicit but other times it is
| implicit. Often people will hack the metric as the
| implicit rule is not explicit and well that's usually a
| quick way to make those rules explicit.
|
| Here's another way to think about it: no rule can be so
| perfectly written that it has no exceptions.
| MR_Bulldops wrote:
| Do you have an example that doesn't involve an objective
| metric? Of course objective metrics won't turn bad.
| They're more measurements than metrics, really.
| godelski wrote:
| > an objective metric
|
| I'd like to push back on this a little, because I think
| it's important to understanding why Goodhart's Law shows
| up so frequently.
|
| * _There are no /objective/ metrics*_, only proxies.
|
| You can't measure a meter directly, you have to use a
| proxy like a tape measure. Similarly you can't measure
| time directly, you have to use a stop watch. In a normal
| conversation I wouldn't be nitpicking like this because
| those proxies are so well aligned with our intended
| measures and the lack of precision is generally
| inconsequential. But once you start measuring anything
| with precision you cannot ignore the fact that you're
| limited to proxies.
|
| The difference of when we get more abstract in our goals
| is not too dissimilar. Our measuring tools are just
| really imprecise. So we have to take great care to
| understand the meaning of our metrics and their limits,
| just like we would if we were doing high precision
| measurements with something more "mundane" like distance.
|
| I think this is something most people don't have to
| contend with because frankly, very few people do high
| precision work. And unfortunately we often use algorithms
| as black boxes. But the more complex a subject is the
| more important an expert is. It looks like they are just
| throwing data into a black box and reading the answer,
| but that's just a naive interpretation.
| AnthonyMouse wrote:
| This isn't what Goodhart's law is about.
|
| Sure, if you get a ruler from the store it might be off
| by a fraction of a percent in a way that usually doesn't
| matter and occasionally does, but even if you could
| measure distance _exactly_ that doesn 't get you out of
| it.
|
| Because what Goodhart's law is really about is
| bureaucratic cleavage. People care about lots of
| diverging and overlapping things, but bureaucratic rules
| don't. As soon as you make something a target, you've
| created the incentive to make that number go up at the
| expense of all the other things you're not targeting but
| still care about.
|
| You can take something which is clearly what you actually
| want. Suppose you're commissioning a spaceship to take
| you to Alpha Centauri and then it's important that it go
| fast because otherwise it'll take too long. We don't even
| need to get into exactly how fast it needs to go or how
| to measure a meter or anything like that, we can just say
| that going fast is a target. And it's a _valid_ target;
| it actually needs to do that.
|
| Which leaves you already in trouble. If your organization
| solicits bids for the spaceship and that's the only
| target, you better not accept one before you notice that
| you also need things like "has the ability to carry
| occupants" and "doesn't kill the occupants" and "doesn't
| cost 999 trillion dollars" or else those are all on the
| chopping block in the interest of going fast.
|
| So you add those things as targets too and then people
| come up with new and fascinating ways to meet them by
| sacrificing other things you wanted but didn't require.
|
| What's really happening here is that if you set targets
| and then require someone else to meet them, they will
| meet the targets in ways that you will not like. It's the
| principal-agent problem. The only real way out of it is
| for principals to be their own agents, which is exactly
| the thing a bureaucracy isn't.
| godelski wrote:
| I agree with you, in a way.
|
| I've just taken another step to understand the philosophy
| of those bureaucrats. Clearly they have some logic,
| right? So we have to understand why they think they can
| organize and regulate from the spreadsheet. Ultimately it
| comes down to a belief that the measurements (or numbers)
| are "good enough" and that they have a good understanding
| of how to interpret them. Which with many bureaucracies
| that is the belief that no interpretation is needed. But
| we also see that behavior with armchair experts who try
| to use data to evidence their conclusion rather than
| interpret data and conclude from that interpretation.
|
| Goodhart had focused on the incentive structure of the
| rule, but that does not tell us _how_ this all happens
| and _why_ the rule is so persistent. I think you 're
| absolutely right that there is a problem with agents, and
| it's no surprise that when many introduce the concept of
| "reward hacking" that they reference Goodhart's Law. Yes,
| humans _can_ typically see beyond the metric and infer
| the intended outcome, but ignore this because they don 't
| care and so fixate on the measurement because that gives
| them the reward. Bureaucracies no doubt amplify this
| behavior as they are well known to be soul crushing.
|
| But we should also be asking ourselves if the same effect
| can apply in settings where we have the best of
| intentions and all the agents are acting in good faith
| and trying to interpret the measure instead of just game
| it. The answer is yes. Idk, call it Godelski's Corollary
| if you want (I wouldn't), but it this relates to
| Goodhart's Law at a fundamental level. You can still have
| metric hacking even when agents aren't aware or even
| intending to do so. Bureaucracy is not required.
| AnthonyMouse wrote:
| In a sense you can do the same thing to yourself. If you
| self-impose a target and try to meet it while ignoring a
| lot of things that you're not measuring even though
| they're still important, you can unintentionally
| sacrifice those things. But there's a difference.
|
| In that case you have to not notice it, which sets a much
| lower cap on how messed up things can get. If things are
| really on fire then you notice right away and you have
| the agency to do something different.
|
| Whereas if the target is imposed by a far-off hierarchy
| or regulatory bureaucracy, the people on the ground who
| notice that things are going wrong have no authority to
| change it, which means they carry on going wrong.
|
| Or put it this way: The degree to which it's a problem is
| proportional to the size of the bureaucracy. You can
| cause some trouble for yourself if you're not paying
| attention but you're still directly exposed to "hear
| reason or she'll make you feel her". If it's just you and
| your boss who you talk to every day, that's not as good
| but it's still not that bad. But if the people imposing
| the target aren't even in the same state, you can be
| filling the morgue with bodies and still not have them
| notice.
| ccortes wrote:
| > Does it cease to be a good metric?
|
| Yes if you run anything other than the 100m
| AnthonyMouse wrote:
| > Metric: time to run 100m
|
| > Context: track athlete
|
| > Does it cease to be a good metric? No.
|
| What do you mean? People start doping or showing up with
| creatively designed shoes and you need to layer on a
| complicated system to decide if that's cheating, but some
| of the methods are harder to detect and then some people
| cheat anyway, or you ban steroids or stimulants but allow
| them if they're by prescription to treat an unrelated
| medical condition and then people start getting
| prescriptions under false pretexts in order to get better
| times. Or worse, someone notices that the competition
| can't set a good time with a broken leg.
| noosphr wrote:
| If it were a good metric there wouldn't be a few phone
| books worth of regulations on what you can do before and
| during running 100 meters. From banning rocket shoes, to
| steroids, to robot legs the 100 meter run is a perfect
| example of a terrible metric both intrinsically as a
| measure of running speed and extrinsically as a measure
| of fitness.
| NBJack wrote:
| If I hadn't seen it in action countless times, I would
| belive you. Changelists, line counts, documents made,
| collaborator counts, teams lead, reference counts in peer
| reviewed journals...the list goes on.
|
| You are welcome to prove me wrong though. You might even
| restore some faith in humanity, too!
| ivanbelenky wrote:
| Thanks for sharing. I did not know this law existed and had
| a name. I know nothing about nothing but it appears to be
| the case that the interpretation of metrics for policies
| assume implicitly the "shape" of the domain. E.g. in RL for
| games we see a bunch of outlier behavior for policies just
| gaming the signal.
|
| There seems to be 2 types
|
| - Specification failure: signal is bad-ish, a completely
| broken behavior --> local optimal points achieved for
| policies that phenomenologically do not represent what was
| expected/desired to cover --> signaling an improvable
| reward signal definition
|
| - Domain constraint failure: signal is still good and
| optimization is "legitimate", but you are prompted with the
| question "do I need to constraint my domain of solutions?"
| - finding a bug that reduces time to completion of a game
| in a speedrun setting would be a new acceptable baseline,
| because there are no rules to finishing the game earlier
| - shooting amphetamines on a 100m run would probably
| minimize time, but other factors will make people consider
| disallowing such practices.
| Eisenstein wrote:
| I view Goodhart's law more as a lesson for why we can
| never achieve a goal by offering specific incentives if
| we are measuring success by the outcome of the incentives
| and not by the achievement of the goal.
|
| This is of course inevitable if the goal cannot be
| directly measured but is composed of many constantly
| moving variables such as education or public health.
|
| This doesn't mean we shouldn't bother having such goals,
| it just means we have to be diligent at pivoting the
| incentives when it becomes evident that secondary effects
| are being produced at the expense of the desired effect.
| godelski wrote:
| > This is of course inevitable if the goal cannot be
| directly measured
|
| It's worth noting that _no goal can be directly measured_
| [0].
|
| I agree with you, this doesn't mean we shouldn't bother
| with goals. They are fantastic tools. But they are
| guides. The better aligned our proxy measurement is with
| the intended measurement then the less we have to
| interpret our results. We have to think less, spending
| less energy. But even poorly defined goals can be
| helpful, as they get refined as we progress in them.
| We've all done this since we were kids and we do this to
| this day. All long term goals are updated as we progress
| in them. It's not like we just state a goal and then hop
| on the railroad to success.
|
| It's like writing tests for code. Tests don't prove that
| your code is bug free (can't write a test for a bug you
| don't know about: unknown unknown). But tests are still
| helpful because they help evidence the code is bug free
| and constrain the domain in which bugs can live. It's
| also why TDD is naive, because tests aren't proof and you
| have to continue to think beyond the tests.
|
| [0] https://news.ycombinator.com/item?id=45555551
| bjornsing wrote:
| Yeah I think this is a general principle. Just look at the
| quality of US presidents over time, or generations of top
| physicists. I guess it's just a numbers game: the number of
| genuinely interested people is relatively constant while the
| number of gamers grows with the compensation and perceived
| status of the activity. So when compensation and perceived
| status skyrockets the ratio between those numbers changes
| drastically.
| godelski wrote:
| I think the number of generally interested people goes up.
| Maybe the percent stays the same? But honestly, I think we
| kill passion for a lot of people. To be cliche, how many
| people _lose_ the curiosity of a child? I think the cliche
| exists for a reason. It seems the capacity is in all of us
| and even once existed.
| crystal_revenge wrote:
| I've spent most of my career working, chatting and hanging
| out with what might be best described as "passionate weirdos"
| in various quantitative areas of research. I say "weirdos"
| because they're people driven by an obsession with a topic,
| but don't always fit the mold by having the ideal combination
| of background, credentials and personality to land them on a
| big tech company research team.
|
| The other day I was spending some time with a researcher from
| Deep Mind and I was surprised to find that while they were
| sharp and curious to an extent, nearly every ounce of energy
| they expended on research was _strategic_. They didn 't write
| about research they were fascinated by, they wrote and
| researched on topics they strategically felt had the highest
| probability getting into a major conference in a short period
| of time to earn them a promotion. While I was a bit
| disappointed, I certainly didn't judge them because they are
| _just playing the game_. This person probably earns more than
| many rooms of smart, passionate people I 've been in, and
| that money isn't for smarts alone; it's for appealing to the
| interests of people with the money.
|
| You can see this very clearly by comparing the work being
| done in the LLM space to that being done in the Image/Video
| diffusion model space. There's much more money in LLMs right
| now, and the field is flooded with papers on any random
| topic. If you dive in, most of them are not reproducible or
| make very questionable conclusions based on the data they
| present, but that's not of very much concern so long as the
| paper can be added to a CV.
|
| In the stable diffusion world it's mostly people driven by
| personal interest (usually very _non-commericial_ personal
| interests) and you see _tons_ of innovation in that field but
| almost no papers. In fact, if you really want to understand a
| lot of the most novel work coming out of the image generation
| world you often need to dig into PRs made by an anonymous
| users with anime themed profile pic.
|
| The bummer of course is that there are very hard limits on
| what any researcher can do with a home GPU training setup. It
| does lead to creative solutions to problems, but I can't help
| but wonder what the world would look like if more of these
| people had even a fraction of the resources available
| exclusively to people playing the game.
| smokel wrote:
| _> I certainly didn 't judge them because they are just
| playing the game._
|
| Please _do_ judge them for being parasitical. They might
| seem successful by certain measures, like the amount of
| money they make, but I for one simply dislike it when
| people only think about themselves.
|
| As a society, we should be more cautious about narcissism
| and similar behaviors. Also, in the long run, this kind of
| behaviour makes them an annoying person at parties.
| what-the-grump wrote:
| But this is in itself selfish right?
|
| You dislike them because they don't benefit you
| indirectly by benefiting society at large.
|
| The incentive structure is wrong, incentivizing things
| that benefit society would be the solution not judging
| those that exist in the current system by pretending
| altruism is somehow not part of the same game.
| smokel wrote:
| I agree that the system itself is dysfunctional, and I
| understand the argument that individuals are shaped or
| even constrained by it. However, in this case, we are
| talking about people who are both exceptionally
| intelligent and materially secure. I think it's
| reasonable to expect such individuals to feel some moral
| responsibility to use their abilities for broader good.
|
| As for whether that expectation is "selfish" on my part,
| I think that question has been debated for centuries in
| ethics, and I'm quite comfortable landing on the side
| that says not all disapproval is self-interest. In my own
| case, I'm not benefiting much either :)
| Eisenstein wrote:
| There is a difference between being selfish in the sense
| that you want others to contribute back to the society
| that we are all part of, and being selfish in the sense
| that you want to compete for exclusive rewards.
|
| You can call this difference whatever you want, don't
| pretend that they are morally or effectively equivalent.
| kakacik wrote:
| Selfish for the long term future and prosperity of
| mankind? Thats some good selfishness all right.
| bradleyjg wrote:
| _but I for one simply dislike it when people only think
| about themselves_
|
| The key word there is only. Nothing in the post you
| suggested only. You have one vignette about one facet of
| this guy's life.
|
| I really dislike the resurgence in Puritanism.
| smokel wrote:
| Please don't read too much into this single word. The
| comment above mentioned _" nearly every ounce of energy
| they expended on research was strategic"_, and I was
| keeping that in mind while writing my remark.
|
| Please read my sibling comment where I expand a bit on
| what I meant to say.
| idiotsecant wrote:
| This take is simply wrong in a way that I would normally
| just sigh and move on, but it's such a privileged HN
| typical pov that I feel like I need to address it. If a
| plumber did plumbing specifically because someone needed
| it and he would be paid, would you call them a
| narcissist? If a gardener built a garden how their
| customer wanted would you call them a narcissist? Most of
| the world doesn't get to float around in a sea of VC
| money doing whatever feels good. They find a need,
| address it, and get to live another day. Productively
| addressing what other people need and making money from
| it isn't narcissism, it's productivity.
| lkey wrote:
| You are comparing a skilled trade that commands ~100k
| annual compensation to positions that have recently
| commanded _100 million_ dollars in compensation _upon
| signing_ , no immediate productivity required, as this
| talent denial is considered strategic.
|
| You consider the person who expects eventual ethical
| behavior from people that have 'won' capitalism (never
| have to labour again) to be privileged.
| kcexn wrote:
| This is such a nuanced problem. Like any creative
| endeavour, the most powerful and significant research is
| driven by an innate joy of learning, creating, and sharing
| ideas with others. How far the research can be taken is
| then shaped by resource constraints. The more money you
| throw at the researchers, the more results they can get.
| But there seems to be a diminishing returns kind of effect
| as individual contributors become less able to produce
| results independently. The research narrative also gets
| distorted by who has the most money and influence, and not
| always for the better (as recent events in Alzheimer's
| research has shown).
|
| The problem is once people's livelihoods depend on their
| research output rather than the research process, the whole
| research process becomes steadily distorted to optimise for
| being able to reliably produce outputs.
|
| Anyone who has invested a great deal of time and effort
| into solving a hard problem knows that the 'eureka' moment
| is not really something that you can force. So people end
| up spending less time working on problems that would
| contribute to 'breakthroughs' and more time working on
| problems that will publish.
| RataNova wrote:
| The tragedy is exactly what you said: all that energy,
| creativity, and deep domain obsession locked out of impact
| because it's not institutionally "strategic."
| xvector wrote:
| I have seen absolutely incredible, best in the world type
| engineers, much smarter than myself, get fired from my FAANG
| because of the performance games.
|
| I persist because I'm fantastic at politics while being good
| enough to do my job. Feels weird man.
| t_serpico wrote:
| But there is no way to know who is truly the 'best'. The
| people who position and market themselves to be viewed as the
| best are the only ones who even have a chance to be viewed as
| such. So if you're a great researcher but don't project
| yourself that way, no one will ever know you're a great
| researcher (except for the other great researchers who aren't
| really invested in communicating how great you are). The
| system seems to incentivize people to not only optimize for
| their output but also their image. This isn't a bad thing per
| se, but is sort of antithetical to the whole shoulder of
| giants ethos of science.
| kcexn wrote:
| The problem is that the best research is not a competitive
| process but a collaborative one. Positioning research
| output as a race or a competition is already problematic.
| bwfan123 wrote:
| right. Also, the idea that there is a "best" researcher
| is already problematic. You could have 10 great people in
| a team, and it would be hard to rank them. Rating people
| in order of performance in a team is contradictory to the
| idea of building a great team. ie, you could have 10
| people all rated 10 which is really the goal when
| building a team.
| rightbyte wrote:
| This is an interesting theory. I think there is something to
| it. It is really hard to do good in a competitive
| environment. Very constrained.
| meindnoch wrote:
| Goodhart's law
| RataNova wrote:
| Anytime a system gets hyper-competitive and the stakes are
| high, it starts selecting for people who are good at playing
| the system rather than just excelling at the underlying skill
| bwfan123 wrote:
| I would categorize people into 2 broad extremes. 1) those
| that care two hoots about what others or the system expects
| of them and in that sense are authentic and 2) those that
| only care about what others or the system expects of them,
| and in that sense are not authentic. There is a spectrum in
| there.
| b00ty4breakfast wrote:
| that's what happens at the top of most competitive domains.
| Just take a look at pro sports; guys are looking for
| millimeters to shave off and they turn to "playing the game"
| rather than merely improving athletic performance. Watching a
| football game (either kind) and a not-small portion of the
| action is guys trying to draw penalties or exploit the rules
| to get an edge.
| contrarian1234 wrote:
| > Labs used to hire researchers and give them a lot of free
| reign.
|
| I can't think of it ever really paying off. Bell Labs is the
| best example. Amazing research that was unrelated to the core
| business off the parent company. Microsoft Research is another
| great one. Lots of interesting research that .. got MS some
| nerd points? But has materialized into very very few actual
| products and revenue streams. Moving AI research doesn't help
| Meta build any motes or revenue streams. It just progresses our
| collective knowledge.
|
| On the "human progress" scale it's fantastic to put lots of
| smart people in a room and let them do their thing. But from a
| business perspective it seems to almost never pay off. Waiting
| on the irrational charity of businesses executive is probably
| not the best way to structure thing.
|
| I'd tell them to go become academics.. but all the academics I
| know are just busy herding their students and attending
| meetings
| whiplash451 wrote:
| Indeed. And it feels like there is this untold in-between
| where if you belong to an unknown applied AI team, you don't
| have to deal with academia's yak shaving, you don't have to
| deal with Meta's politics and you end up single handedly
| inventing TRMs.
| Gigachad wrote:
| Perhaps these companies just end up with so much money that
| they can't possibly find ways to spend all of it rationally
| for purely product driven work and just end up funding
| projects with no clear business case.
| trenchpilgrim wrote:
| Or they hire researchers specifically so a competitor or
| upstart can't hire them and put them to work on something
| that disrupts their cash cow.
| godelski wrote:
| > I can't think of it ever really paying off
|
| Sure worked for Bell Labs
|
| Also it is what big tech was doing until LLMs hit the scene
|
| So I'm not sure what you mean by it never paying off. We were
| doing it right up till one of those things seemed to pay off
| and then hyper focused on it. I actually think this is a
| terrible thing we frequently do in tech. We find promise in a
| piece of tech, hyper focus on it. Specifically, hyper focus
| on how to monetizing it which ends up stunting the technology
| because it hasn't had time to mature and we're trying to
| monetize the alpha product instead of trying to get that
| thing to beta. > But from a business
| perspective it seems to almost never pay off.
|
| So this is actually what I'm trying to argue. It actually
| does pay off. It has paid off. Seriously, look again at
| Silicon Valley and how we got to where we are today. And look
| at how things changed in the last decade...
|
| Why is it that we like off the wall thinkers? That
| programmers used to be known as a bunch of nerds and weirdos.
| How many companies were started out of garages (Apple)? How
| many started as open source projects (Android)? Why did
| Google start giving work lifestyle perks and 20% time?
|
| So I don't know what you're talking about. It has frequently
| paid off. Does it always pay off? Of course not! It
| frequently fails! But that is pretty true for everything.
| Maybe the company stocks are doing great[0], but let's be
| honest, the products are not. Look at the last 20 years and
| compare it to the 20 years before that. The last 20 years has
| been much slower. Now maybe it is a coincidence, but the
| biggest innovation in the last 20 years has been in AI and
| from 2012 to 2021 there were a lot of nice free reign AI
| research jobs at these big tech companies where researchers
| got paid well, had a lot of autonomy in research, and had a
| lot of resources at their disposal. It really might be a
| coincidence, but a number of times things like this have
| happened in history and they tend to be fairly productive. So
| idk, you be the judge. Hard to conclude that this is
| definitely what creates success, but I find it hard to rule
| this out. > I'd tell them to go become
| academics.. but all the academics I know are just busy
| herding their students and attending meetings
|
| Same problem, different step of the ladder
|
| [0] https://news.ycombinator.com/item?id=45555175
| heavyset_go wrote:
| How many patents did that research result in that paid off in
| terms of use, licensing and royalties?
| gopher_space wrote:
| The problem here is management expecting researchers to dump
| out actionable insights like a chicken laying eggs.
| Researchers exist so that you can rifle through their notes
| and steal ideas.
| iisan7 wrote:
| It paid off for PARC, iirc the laser printer justified lots
| of other things that Xerox didn't profit from but turned out
| to be incredibly important.
| zer0zzz wrote:
| > I really think if they just took a step back and stop being
| so metric focused and let their people freely explore then
| they'd be win..
|
| This is very true, and more than just in ai.
|
| I think if they weren't so metric focused they probably
| wouldn't have hit so much bad publicity and scandal too.
| bboygravity wrote:
| AI progress has slowed down?! By what metric?
|
| Quite the statement for anybody who follows developments
| (without excluding xAI).
| RataNova wrote:
| The money chase is real. You can kind of tell who's in it for
| the comp package vs. who'd be doing the same work on a laptop
| in their garage if that's what it took
| ipsum2 wrote:
| This has nothing to do with superintelligence, it's just the
| people that were working on the paper prior to the re-org
| happened to publish after the name change.
|
| Though it is notable that contrary to many (on HN and Twitter)
| that Meta would stop publishing papers and be like other AI labs
| (e.g. OpenAI). They're continued their rapid pace of releasing
| papers AND open source models.
| Zacharias030 wrote:
| Should be the top comment.
|
| MSL is not only those few high profile hires.
| ekianjo wrote:
| Open weights models, not open source. And even their weights
| are under a specific license not as permissive as apache 2.
| drexlspivey wrote:
| Does an "open source" model the way you describe it exist or
| is it a mythical creature?
| qcoret wrote:
| Unicorns also don't exist, but we don't change the
| definition to include horses.
| jakupovic wrote:
| Prove to me that unicorns don't exist, first level
| arguments only!
| aerhardt wrote:
| The first level argument is that old horse, burden of
| proof.
| omneity wrote:
| There aren't many but they do exist. OLMo for example.
| Rattled wrote:
| Olmo by AllenAI and Pythia by EleutherAI.
| ayewo wrote:
| An open source model does exist _now_ [1] and is
| multilingual. Previous discussion [2].
|
| [1] https://ethz.ch/en/news-and-events/eth-
| news/news/2025/07/a-l...
|
| [2] https://news.ycombinator.com/item?id=44535637
| CaptainOfCoit wrote:
| It does, but does it matter? Even if every software
| released in 2025 was proprietary, doesn't make their
| published binaries "open source" because no other software
| could be classified as "open source".
|
| We name things based on what they are, not based on the
| lack of other things.
| HPsquared wrote:
| This is the right terminology. Model weights are literally
| compiled binary data; they are the output of an algorithm run
| on a bunch of source data. That training dataset is the
| "source" of the model. Training data (or the scripts used to
| generate it) is human-readable and modifiable, like source
| code. Binary weights are not.
| carom wrote:
| Just to note though, source copyright extends to its
| compiled form. There is probably an analogue there for
| model weights.
| jeremyjh wrote:
| Tell me about the companies that own the copyrights to
| their training data.
| phkahler wrote:
| Binary weights can still be "edited" with additional
| training.
| sdeframond wrote:
| I propose that from now on we call freewares "open binaries".
| hippo22 wrote:
| I'm not a lawyer, but I believe that the weights aren't
| subject to copyright. So, you can use them outside of Meta's
| license agreement provided you get them from somewhere else.
| pityJuke wrote:
| What model(s) have Meta released since the Lab re-org?
|
| Also, that wasn't based on purely hearsay, Zuck explicitly
| said:
|
| > We believe the benefits of superintelligence should be shared
| with the world as broadly as possible. That said,
| superintelligence will raise novel safety concerns. We'll need
| to be rigorous about mitigating these risks and careful about
| what we choose to open source. Still, we believe that building
| a free society requires that we aim to empower people as much
| as possible. [0]
|
| [0]: https://www.meta.com/superintelligence/
| ipsum2 wrote:
| That has always been the policy. To answer your question,
| Meta has released ~100 models since the Superintelligence Lab
| reorg.
|
| https://huggingface.co/facebook/models
|
| The most interesting ones to me are:
|
| - CWM (Code world model), an LLM for coding
| https://github.com/facebookresearch/cwm
|
| - DINOv3, A vision encoder https://ai.meta.com/dinov3/
|
| - MAPAnything, a 3d reconstruction model
| https://huggingface.co/facebook/map-anything
|
| - VJEPA v2, Self-supervised video pre-training model
| https://github.com/facebookresearch/vjepa2
| gessha wrote:
| You still believe anything that comes out of his mouth?
| PatronBernard wrote:
| When did Zuck start caring about society?
| cwmoore wrote:
| Is this a trick question? Probably before he was even born.
| parpfish wrote:
| > We believe the benefits of superintelligence should be
| shared with the world as broadly as possible.
|
| i'd interpret that as meaning "everybody is welcome to be our
| customer, but we're still control all of it"
| RataNova wrote:
| Still, I think the optics matter... the fact that Meta's still
| putting out technical work (and open sourcing it) after the
| restructure says a lot about where they want to position
| themselves
| bigcat12345678 wrote:
| https://docs.lamini.ai/memory_rag/ Similar approaches have been
| tried before already
| pppoe wrote:
| I find it absurd that, compared to the past, large companies now
| have more abundant stock prices and cash than ever before, yet
| nearly every AI Lab in these companies is facing greater pressure
| than ever, being asked to generate short-term profits. In the
| midst of AI's unprecedented boom, the research environment and
| atmosphere in the industry seem to have worsened compared to the
| past.
| signatoremo wrote:
| Is this Meta's lab pressured to generate short term profits?
|
| Which other under pressure labs are you talking about?
| sefrost wrote:
| Is it because of the "winner takes all" and "lock-in effects"
| of being the first to market?
| foldl2022 wrote:
| So, show me the model weights, please.
| yalogin wrote:
| I am not surprised because the culture at meta is not at all,
| even in the slightest, to focus on science for the sake of it.
| It's actively actively purged out of you. The focus is on metrics
| and how the bottom line is impacted. So this is in line with that
| rhetocj23 wrote:
| Yeah and this problem is near impossible to fix once it has
| infested into the culture of the firm.
| DangitBobby wrote:
| It's not always a bad thing though, like in this case they
| looked for a practical win and found one because impractical
| wins can't make them money.
| georgeburdell wrote:
| It's not that simple. I worked at a supplier of Meta and they
| paid us large NREs to fund our exploratory work
| alex1138 wrote:
| "People are using our service more!" turns out to be a horrible
| metric when they outright lie to you (x has sent you a message!
| - when no message exists)
| CShorten wrote:
| Here is a video I made diving into the paper, hopefully helpful!
|
| https://www.youtube.com/watch?v=Ek0tZootK00
| htk wrote:
| I like your style, subscribed!
| CShorten wrote:
| Thank you so much!
| nmca wrote:
| This is not work by any of the high profile new hires, in case
| folks are confused.
| elyobo wrote:
| Can we have a more informative, less clickbaity, title?
| dang wrote:
| What would a more informative, less clickbaity title be?
|
| (preferably using representative language from the article)
| airstrike wrote:
| Meta Superintelligence Labs' first paper is about RAG
| dang wrote:
| Ok thanks! Belatedly updated.
| smeeger wrote:
| there should be a guideline to get rid of clickbait titles. its
| an epidemic here
| dang wrote:
| There is of course such a guideline:
| https://news.ycombinator.com/newsguidelines.html
|
| We don't catch every case, but if you're talking about the
| frontpage, I'm surprised to hear you say "epidemic". What are
| some recent examples?
| puttycat wrote:
| Seems very incremental and very far from the pompous
| 'superintelligence' goal.
| antonvs wrote:
| It's unlikely that the existing LLM architecture will evolve
| into anything that resembles superintelligence any more than it
| does already.
|
| Which means that modifications to the architecture, and
| combining it with other components and approaches, are the next
| likely step. This paper fits that.
| naasking wrote:
| A 30 fold improvement seems a tad more than incremental.
| vasco wrote:
| I can start brushing my teeth 30 times faster but it won't
| change my life. This is nice for RAG but it's a very
| localized improvement. And 30x sounds big but is just an
| order of magnitude improvement also.
| naasking wrote:
| Brushing your teeth is not central to your life, recalling
| facts correctly is, and a 30 fold improvement in the latter
| very well could change your life. I'll leave it to you to
| figure out which is a better analogy to RAG.
| vasco wrote:
| Just remember that in this example you don't remember 30x
| more things, you just remember the same things 30x
| faster. That is a significant difference.
| btilly wrote:
| If you can collapse "retrieve this complex chunk when it is
| needed" into a single token, what else can you put into a
| token?
|
| "Send this through the math coprocessor." "Validate against the
| checklist." "Call out to an agent for X." "Recheck against
| input stream Y." And so on.
|
| Retrieval augmentation is only one of many uses for this. If
| this winds up with better integration with agents, it is very
| possible that the whole is more than the sum of its parts.
| lukev wrote:
| Think about it this way; they are encoding whole "thoughts" or
| "ideas" as single tokens.
|
| It's effectively a multimodal model, which handles "concept"
| tokens alongside "language" tokens and "image" tokens.
|
| A really big conceptual step, actually, IMO.
| koolala wrote:
| Did a "superintelligence" lab publish a superintelligence related
| paper with no results for intelligence? What measured
| improvements did this proposal make in their LLM's intelligence?
| pbd wrote:
| https://github.com/simulanics/REFRAG
| singularity2001 wrote:
| somewhere in my hacker news comment history I presented this very
| idea
| asim wrote:
| This was inevitable. You can't keep training LLMs and expect
| that's the answer to the evolution of AI. Yes it'll happen and
| we'll keep creating new more refined and bigger models but it's
| like DNA or something like the cortex of the brain. After that
| you need these systems that essentially "live" for years
| digesting information and develop a more refined way to process,
| store and retrieve the information. Compression of RAG was also
| inevitable. It's like the btree index of a database. The thing
| is, we're probably one or two iterations away from being good
| enough on the RAG pipeline and then we'll need to focus more on
| the other pieces of sensory input that need to be connected and
| processed at higher throughput. Right now it's not fast or
| efficient enough. This is where the likes of Google will shine.
| They are probably two decades ahead of everyone on internal
| technology and there is some team with the breakthrough but it
| hasn't seen the light of day yet. What's coming out of DeepMind
| is really a forced effort in productization and publication of
| work in a consumable format but internally they are likely way
| ahead. I don't have as much faith in Meta's efforts despite
| seeing things like this. Quite frankly those people, the ones
| doing the work should move to more honourable companies. Not feed
| crack addiction in the form of Meta's universe.
| smeeger wrote:
| exactly. the real focus internally is working on new
| architectures. there is no other possibility.
| zem wrote:
| this was really weird to read:
|
| > But RAG is a very real world, practical topic for something as
| significant as a new lab's first paper.
|
| I would expect exactly the opposite - that a new lab would put
| out a few random papers that happen to be in areas their
| researchers were interested in and already working on, and once
| people had been working together a while and developed some
| synergy they would maybe come out with something really
| groundbreaking.
|
| do people really view a "first paper" as something deeply
| significant and weighty? because that just seems like a good way
| to get bogged down in trying to second guess whether any given
| paper was good enough to be your all-important debut!
| Al-Khwarizmi wrote:
| As an academic I would expect the same as you, and no, to my
| knowledge "first paper" is meaningless, at least in academia.
| Most people's first paper is some small contribution to what
| their PhD supervisor is doing at the time, where the student
| tries their best at writing but it ends up so heavily edited
| that probably 90% of the final text comes from the supervisor
| :) So typically first papers don't define or represent a
| researcher. When you start you just don't have the experience
| to have a great idea and carry it through to a good paper.
|
| Of course here we are talking about a lab, not an individual
| person, but still I haven't heard of first papers being
| considered special in any way, even for labs.
| schmorptron wrote:
| One thing I don't get about the ever-reoccuring RAG discussions
| and hype men proclaiming "Rag is dead", is that people seem to be
| talking about wholly different things? My mental model is that
| what is called RAG can either be:
|
| - a predefined document store / document chunk store where every
| chunk gets a a vector embedding, and a lookup decides what gets
| pulled into context as to not have to pull whole classes of
| document, filling it up
|
| - the web search like features in LLM chat interfaces, where they
| do keyword search, and pull relevant documents into context, but
| somehow only ephemerally, with the full documents not taking up
| context in the future of the thread (unsure about this, did I
| understand it right?) .
|
| with the new models with million + tokens of context windows,
| some where arguing that we can just throw whole books into the
| context non-ephemerally, but doesnt that significantly reduce the
| diversity of possible sources we can include at once if we hard
| commit to everything staying in context forever? I guess it might
| help with consistency? But is the mechanism with which we decide
| what to keep in context not still some kind of RAG, just with
| larger chunks of whole documents instead of only parts?
|
| I'd be extatic if someone who really knows their stuff could
| clear this up for me
| kgeist wrote:
| Technically, RAG is anything that augments generation with
| external search. However, it often has a narrower meaning:
| "uses a vector DB."
|
| Throwing everything into one large context window is often
| impractical - it takes much more time to process, and many
| models struggle to find information accurately if too much is
| going on in the context window ("lost in the middle").
|
| The "classic" RAG still has its place when you want low latency
| (or you're limited by VRAM) and the results are already good
| enough.
| make3 wrote:
| no one is saying rag is dead, you're never going to put the
| whole Internet in the context of the model, & the more you put
| the more expensive it is.
| viraptor wrote:
| Lots of people say rag is dead: https://kagi.com/search?q=rag
| +is+dead&r=au&sh=g52XEb93vx691I...
| GistNoesis wrote:
| The answer is adaptability.
|
| In both cases for "Question Answering" it's about similarity
| search but there are two main orthogonal differences between
| RAG and Non-RAG :
|
| -Knowing the question at the time of index building
|
| -Higher order features : the ability to compare fetched
| documents with one another and refine the question
|
| Non-RAG, aka multi-layer (non-causal) transformer with infinite
| context, is the more generic version, fully differentiable
| meaning you can use machine learning to learn how to Non-RAG
| better. Each layer of the transformer can use the previous
| layer to reason and refine the similarity search. (A causal
| transformer know the question at the time when it is feed the
| question, and can choose to focus it's attention on different
| part of the previously computed features of the provided
| documents but may benefit from having some reflection token, or
| better : be given the question before being presented the
| documents (provided you've trained it to answer it like that).)
|
| RAG is an approximation of the generic case to make it faster
| and cheaper. Usually it breaks end-to-end differentiability by
| using external tools, so this mean that if you want to use
| machine learning to learn how to RAG better you will need to
| use some variant of Reinforcement Learning which is slower to
| learn things. RAG usually don't know the question at the time
| of index building, and documents are treated independently of
| each other, so no (automatic) higher order features (embeddings
| are fixed).
|
| A third usual approximation, is to feed the output of RAG into
| Non-RAG, to hopefully get the best of both world. You can learn
| the Non-RAG given RAG with machine learning (if you train it
| with some conversations where it used RAG), but the RAG part
| won't improve by itself.
|
| Non-RAG need to learn so it needs a big training dataset, but
| fortunately it can pick-up question answer pair in an
| unsupervised fashion when you feed it the whole web, and you
| only need a small instruction training and preference
| optimization dataset to shape it to your need. If performance
| isn't what you expect in a specific case, you can provide more
| specific examples and retrain the model until it gets it and
| you get better performance for the case you were interested in.
| You can improve the best case but it's hard to improve the
| worst case.
|
| RAG has more control on what you feed it but content should be
| in a more structured way. You can prevent worst cases more
| easily but it's hard to improve good case.
| impossiblefork wrote:
| We can't throw in infinite things in the context though.
|
| My impression is that GPT-5 gets confused, not quite right
| away, but after a couple of pages it has no idea. It doesn't
| take pages on pages before it forgets things.
| aerhardt wrote:
| I'm currently experimenting with prompts of ~300k tokens for
| a certain classification task and I think I _might_ be able
| to make it work. GPT5 chokes but Gemini 2.5 Pro is showing
| promise. Jury's still out and I might change my tune in a
| couple of weeks.
| impossiblefork wrote:
| It should also be said, that what I say here is focused on
| things where these models have problems.
|
| For example, I consider the model confused when it starts
| outputting stereotyped or cliche responses, and I
| intentionally go at problems that I know that the models
| have problems with (I already know they can program and do
| some maths, but I want to see what they can't do). But if
| you're using them for things they're made for, and which
| aren't confusing, such as people arguing with each other,
| then you are probably likely to succeed.
|
| Prompts with lots of examples are reasonable and I know
| they can get very long.
| armcat wrote:
| I couldn't immediately see in their graphs/tables any comparison
| against simple lexical/statistical based context compression,
| such as candidate selection of chunks using TF-IDF, word overlap
| etc. For most of us in the industry we need to find these quick
| wins that give us equivalent performance to sending huge amount
| of information to the LLM, while compressing by 10x.
| macleginn wrote:
| So this looks essentially like continuous prompting (see prefix
| tuning) with RL-driven selection of what to present as tokens and
| what as continuous inputs (embeddings).
| i5heu wrote:
| Can we please get rid of the clickbait titles?
| RataNova wrote:
| Refreshing (and slightly unexpected) to see Meta
| Superintelligence start with something this practical instead of
| a headline-grabbing new model
| mark_l_watson wrote:
| A great idea, bypassing as much conversion as possible between
| vector space and natural language tokens. Reminds me of a
| discussion of having AI's "talk" to each other using vector
| space.
|
| There was an interesting quote "plain old BM25 from 1994
| outperforms vector search on recall" and super relevant to what I
| did yesterday. I am trying to use small local models more often
| and yesterday I wrote Common Lisp code that uses a large corpus
| of text and a user query or prompt to construct a fairly concise
| one-shot prompt with select context from the text corpus. This is
| RAG, and I used both BM25 and vector embeddings matching. I added
| the code and an example as a new chapter in my CL book (link
| directly to new material:
| https://leanpub.com/lovinglisp/read#leanpub-auto-autocontext...)
| yesterday afternoon. BM25 is fast. This is new code, and I will
| certainly be experimenting more with it, but as-is it is useful
| when working with small local LLMs.
| Palmik wrote:
| The observation about the "block-diagonal patterns" in RAG isn't
| new and has been exploited / explored before:
|
| - https://arxiv.org/abs/2410.07590 (literally titled "Block-
| Attention for Efficient RAG")
|
| - https://arxiv.org/abs/2409.15355v3
|
| - https://arxiv.org/abs/2212.10947
|
| The REFRAG paper does not cite any of these.
| SknCode wrote:
| I am not sure if I understand things correctly.
|
| I came to believe the LLMs work with token embeddings. Is then
| the REFRAG only "something" in front of the LLM, and the decoder
| is the RL policy which expands only some token chunk embeddings
| into token embeddings feedable to LLM? Or the REFRAG needs you to
| 'tune' the LLM to be able to work with both token embeddings and
| token chunk embeddings?
___________________________________________________________________
(page generated 2025-10-12 23:00 UTC)