[HN Gopher] Unsupervised Elicitation of Language Models
___________________________________________________________________
Unsupervised Elicitation of Language Models
Author : kordlessagain
Score : 106 points
Date : 2025-06-14 12:32 UTC (10 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| unchocked wrote:
| Philosophically, this looks like breaking the training data limit
| in the same way that humans do: by using an internally consistent
| view of the world to imagine new scenarios and integrate them
| into an updated worldview.
| robinduckett wrote:
| Exciting news, who watches the watchmen?
| Herring wrote:
| > _our goal is to fine-tune a pretrained model on its own
| generated labels_
|
| Haven't all the big labs been doing this for a couple years now?
| It's a good idea, with great execution, but it's far from novel.
|
| https://en.wikipedia.org/wiki/Weak_supervision
| platelminto wrote:
| I think this removes any amount of human-labeled data: no RLHF
| and stuff like that. You can use their technique to create an
| unsupervised reward model, and use that model to RL your way to
| having a useful assistant LLM.
|
| The paper is very accessible (it's mostly written by Anthropic
| researchers), and Section 4 summarises their findings really
| well. They were themselves really surprised by the results:
|
| > We were initially very skeptical of these findings, because
| they seemed clearly too good to be true, and suspiciously close
| to training with actual labels. To ensure we didn't
| accidentally train on the labels, (1) we re-ran the experiment
| several times on different datasets, (2) we copied the dataset
| into a new file, excluding any labels before re-running our
| algorithm with that file, and (3) _one coauthor independently
| replicated the findings on the Claude 3.5 Haiku base model
| using a different codebase_.
|
| (emphasis mine)
| abeppu wrote:
| > However, as tasks and model behaviors grow more complex, human
| supervision becomes increasingly unreliable: LMs can learn to
| mimic mistakes in demonstrations or exploit flaws in feedback.
| How do we train LMs to do tasks that are too difficult for humans
| to demonstrate or evaluate reliably?
|
| I didn't read the whole paper but it seems important that you
| still need real ground truth to measure improvement, so you still
| need to get real labels at some point. The task they focus on
| where LLMs have "superhuman" performance is guessing the gender
| of blog authors. While humans are bad at this, humans are decent
| as remembering their gender, and a bunch of them are willing to
| write a blog post, so there's obviously a better way to get
| supervised examples than asking humans to guess labels: you
| collect posts in from authors whose gender is known. i.e. "human
| generated labels are low quality" should not be taken to mean
| "good labels are not available so we should go fully
| unsupervised".
|
| So since you already need some real ground truth to know whether
| your algorithm accomplished anything, I think it's fair to ask:
| when would you commit to using _all_ your labeled data for
| evaluation and none for fine tuning, as described in this work?
| Logical consistency seems valuable, sure, but it seems like
| really you'd want to use both consistency and some (small?)
| amount of labeled examples, and a perhaps larger amount of self-
| labeled examples. In their loop where they revise labels to be
| more coherent, it seems natural to imagine that pre-provided
| labels should be stickier than self-generated ones, but not
| immutable, because there's always some chance of noise in your
| upstream data generation process.
| md224 wrote:
| I was intrigued that one of the researchers was listed as
| "independent", so I checked her out:
|
| https://lindapetrini.com
|
| It looks like she's a science communicator rather than a
| scientist herself. That's interesting... I'm not used to seeing
| academic papers that include an author devoted entirely to the
| writing aspect. (Then again, maybe I just haven't noticed?)
| joaogui1 wrote:
| The fact that she's a scientist communicator doesn't imply that
| she only did the communication part, I think
| majormajor wrote:
| I skimmed mostly, but was trying to understand how they came up
| with "superhuman" as a description, and it seems like a stretch?
|
| This might seem like a nit but the term "superhuman" is a VERY
| strong one to my mind. It doesn't suggest "better than the
| average human off the street at a particular random task" but
| instead suggests "better than humans are capable of getting with
| training, at a high percentile-level".
|
| One of the biggest advantages of LLMs as a tool are that they are
| generally quite good against a broad variety of things without
| needing a ton of further domain-specific training. Humans tend to
| be the opposite.
|
| It doesn't seem like they gave much training to the human
| annotators they recruited. Whereas an LLM trained on the internet
| has been trained on a LOT of blog posts + associated metadata.
| And nobody has ever really bothered figuring out "how would we
| best train humans to identify gender of blog post authors" -
| there's very little economic incentive for it. It's not like we
| generally train people to write in gender-specific ways in school
| either, so we haven't been formally instructed on potential
| differences. We'd have to rely on broad-brush generalizations if
| not given an opportunity to deep dive to try to find more
| specific tendencies.
|
| But if you pay people to study a big majority chunk of the corpus
| they're using for this for a couple years, focusing consciously
| on the post style, contents, and the gender both, and then test
| them on stuff from the ones you held out... how well could they
| do?
| jaggirs wrote:
| "Superhuman" refers to abilities, qualities, or powers that
| exceed those naturally found in humans. It implies being
| greater than normal human capabilities.
|
| The term is often used in fiction, particularly in superhero
| comics and fantasy, but it can also be used metaphorically to
| describe extraordinary effort or achievement in real life
| (e.g., "It took a superhuman effort to finish the marathon").
|
| (Definition from Gemini)
|
| It seems reasonable to use the term to me simply to say the
| abilities on a benchmark of the model were greater than the
| human annotated data. Computers have always been superhuman at
| many tasks, even before llms.
| majormajor wrote:
| > "Superhuman" refers to abilities, qualities, or powers that
| exceed those naturally found in humans. It implies being
| greater than normal human capabilities.
|
| How do you know what normal human capabilities are for an
| unusual task that humans have not trained for? Is identifying
| the gender of the author of a blog post 80% of the time
| "extraordinary"? How do I know what a human is capable of
| doing for that with training?
|
| If a person with no programming experience asked Claude or
| ChatGPT to produce some code, they'd get better code than
| their "normal" human capability could produce. So: superhuman
| coders?
|
| But also today, I have asked Claude and ChatGPT to do coding
| tasks for me that both models got stuck on. Then I fixed them
| myself because I've had a lot of training and practice. So:
| not superhuman? But wait, the model output the broken code
| faster than I would've. So: superhuman again?
|
| Extraordinary shouldn't be so easily disputable.
|
| LLMs have superhuman _breadth_ and superhuman _speed_. I
| haven 't seen superhuman _depth_ in any capabilities yet. I
| 've seen them have "better than untrained median person" and
| often "better than hobbyist" depth. But here the authors
| claim "superhuman capabilities" which is pretty specificly
| not just meaning the breadth or speed.
| majormajor wrote:
| On a separate note, using an LLM for a definition is a bit
| funny, when there are expert-curated sources easily
| available. The LLM didn't get it _wrong_ here, but...
|
| https://en.wikipedia.org/wiki/Superhuman
|
| First line: "The term superhuman refers to humans, humanoids
| or other beings with abilities and other qualities that
| exceed those naturally found in humans."
|
| Golly, I wonder what that model based its first sentence on.
| brumar wrote:
| So LLMs have their alpha go zero moment where training on human
| data is has-been? Sounds exciting? Terrifying?
| clbrmbr wrote:
| Marks' paper with Max Tegmark "Geometry of Truth" is a great
| read, and I can see the ideas repeated here. I've been meaning to
| repro some of the geotruth paper....
___________________________________________________________________
(page generated 2025-06-14 23:00 UTC)