[HN Gopher] Prover-Verifier Games improve legibility of language...
___________________________________________________________________
Prover-Verifier Games improve legibility of language model outputs
Author : davidbarker
Score : 68 points
Date : 2024-07-17 17:15 UTC (5 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| rjeli wrote:
| Funny that when I reached the "Key Findings" section, my brain
| immediately parsed it as ChatGPT output. Maybe it's the bullet
| points, the word choice, or just the font...
| Der_Einzige wrote:
| There appears to be a coherent effort among the general
| populace , conscious or unconscious, to shape discourse going
| forward to look more ChatGPT style in general. Words like
| "delve", "crucial" etc have become more common even among real
| people in face to face communication and in record time.
|
| Much as I find it overly formal, I support it on the grounds
| that it frustrates attempts to "detect" if LLMs are used and
| that is very good.
| molave wrote:
| I can tell technical papers influenced ChatGPT's outputs the
| most. Most of the articles generated using it may be
| regurgitated, but I can't deny how easily digestable the info
| is when presented that way.
| c3534l wrote:
| It seems like a lot of people these days are doing generative
| adversarial AI and the pretending like they invented a new thing.
| HanClinto wrote:
| GANs have been used for a long time in order to improve the
| training of images -- it seems like we're finally starting to
| see this approach catch on for LLMs.
|
| I'm aware of the SPAG paper -- who else have you seen take this
| approach with LLMs lately?
|
| https://github.com/Linear95/SPAG
| HanClinto wrote:
| Beautiful!
|
| OpenAI isn't just training a model to produce more-verifiable
| correct answers -- it's leveraging an adversarial relationship to
| train a model that's better at being correct, and also a model
| that's better at deceiving / being wrong. This is the key. There
| are three agents here:
|
| * A "verifier" (a small model, whose job it is to discern correct
| answers from incorrect answers)
|
| * A "helpful prover" (blue team, whose job it is to produce
| correct answers with an easy-to-follow explanation)
|
| * A "sneaky prover" (red team, whose job it is to produce
| incorrect answers with a deceptive explanation)
|
| By arranging these three models in an adversarial relationship
| with a _true reinforcement learning feedback loop_ , the entire
| model grows and gets better.
|
| This is fantastic to read, and corroborates the results achieved
| by SPAG -- easily one of my favorite papers from the past year.
| SPAG pioneered (as far as I'm aware) the approach of using
| adversarial language games in a true reinforcement-learning setup
| (not merely RLHF, which isn't true RL), and showed that training
| models in adversarial language games can show generalized
| improvements even in areas not directly related to the game. [1]
|
| Ever since the SPAG paper came out, I've been daydreaming about
| the different sorts of adversarial games that one could use to
| train LLMs. I've written down a bunch of notes on the subject [2]
| (in case anyone else is reading my rambling notes).
|
| I would really like to see some of these experiments actually get
| up and running on open-source LLMs -- I'm excited to see if / how
| they could be used to improve the quality of some of the open-
| source base models that are floating around out there.
|
| [1] https://github.com/Linear95/SPAG
|
| [2] https://github.com/HanClinto/MENTAT
| HanClinto wrote:
| Because the Red and Blue agents are both trying to convince a
| smaller language model of the rightness of their answer, they
| each have to simplify their logic and wording down.
|
| This feels like the ML equivalent of the old adage "If you
| can't explain it to a six year old, you don't understand it
| yourself."
| enthulhusiastic wrote:
| ELI6 why SPAG is better than just the default pretraining
| method (token context statistics?) of an LLM.
| TimPC wrote:
| The red and blue agents are effectively unlimited sources
| of true and false examples so you can get far more
| efficient scale than you can by pre training with labelled
| inputs. It's also far more targeted on correct/incorrect
| rather than a notion of answer quality which doesn't
| directly get at hallucination vs reality.
| jgalt212 wrote:
| Isn't this exactly how Alpha Go learns and works so good? It
| always knows the right answer because it knows the rules of the
| game and can easily compute W-L record.
|
| In life, it's hard and very expensive to codify the rules, and
| compute W-L record.
| HanClinto wrote:
| Yes, exactly.
|
| Using traditional RL is easiest when you're using a landscape
| with clearly defined rules -- like Go, or Starcraft, or
| whatever. The trouble is those games don't translate well to
| other domains -- it can learn about risk and reward and
| whatnot from Chess, but it can't become a better chatbot.
|
| But if the game space can operate through the realm of
| language and semantics, then the hope is that we can tap into
| the adversarial growth curve, but for LLMs.
|
| As you note, this only works for situations where we can
| clearly say "winner" or "loser". In OpenAI's case, they use
| correctness of the math problem as one W/L metric (discrete
| and measurable) as well as whether the Verifier was able to
| correctly identify the answer as correct (thus the
| understandability of the answer is also discrete and
| measurable).
|
| In the SPAG paper, they chose the game of "Taboo" as a way to
| discretely measure W/L (asking: "did the defender say the
| secret word or not").
|
| As you noted, it's hard and expensive to codify the rules of
| life. How do we objectively determine whether one poem is
| more beautiful than another? I think we're a long way from
| that.
|
| The breakthrough that the SPAG paper showed is that -- by
| teaching the models to be better at games that involve
| language and semantics -- that they get better at language-
| oriented tasks _overall_.
|
| And that possibility excites me.
|
| Sadly, as I've read further into the paper released by
| OpenAI, it doesn't appear that adversarial training for
| explainability increased the accuracy of the model -- and
| while it was more understandable / verifiable, it wasn't any
| better.
|
| I think a very interesting metric would be to measure the
| accuracy of the fine-tuned models on unrelated tasks to see
| if the lessons learned to be better at explaining math
| problems would help the model perform better for explaining
| other problems (such as logic or reasoning).
| bravura wrote:
| Thank you for the SPAG paper.
|
| Do you know how to play questions?
|
| https://www.youtube.com/watch?v=u3xIs0aajN4
|
| (Tom Stoppard, Rosencrantz and Guildenstern Are Dead).
|
| The important question in the OpenAI work that you haven't
| touched on is how to evaluate superintelligence. I guess I
| would frame the problem like this:
|
| Let's say there is a very esoteric but important branch of
| abstract mathematics that only a few people claim to
| understand. Is there a way for us to determine which
| mathematicians are actually intelligent, and which are
| bluffing? How?
___________________________________________________________________
(page generated 2024-07-17 23:03 UTC)