[HN Gopher] Prover-Verifier Games improve legibility of language...
       ___________________________________________________________________
        
       Prover-Verifier Games improve legibility of language model outputs
        
       Author : davidbarker
       Score  : 68 points
       Date   : 2024-07-17 17:15 UTC (5 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | rjeli wrote:
       | Funny that when I reached the "Key Findings" section, my brain
       | immediately parsed it as ChatGPT output. Maybe it's the bullet
       | points, the word choice, or just the font...
        
         | Der_Einzige wrote:
         | There appears to be a coherent effort among the general
         | populace , conscious or unconscious, to shape discourse going
         | forward to look more ChatGPT style in general. Words like
         | "delve", "crucial" etc have become more common even among real
         | people in face to face communication and in record time.
         | 
         | Much as I find it overly formal, I support it on the grounds
         | that it frustrates attempts to "detect" if LLMs are used and
         | that is very good.
        
         | molave wrote:
         | I can tell technical papers influenced ChatGPT's outputs the
         | most. Most of the articles generated using it may be
         | regurgitated, but I can't deny how easily digestable the info
         | is when presented that way.
        
       | c3534l wrote:
       | It seems like a lot of people these days are doing generative
       | adversarial AI and the pretending like they invented a new thing.
        
         | HanClinto wrote:
         | GANs have been used for a long time in order to improve the
         | training of images -- it seems like we're finally starting to
         | see this approach catch on for LLMs.
         | 
         | I'm aware of the SPAG paper -- who else have you seen take this
         | approach with LLMs lately?
         | 
         | https://github.com/Linear95/SPAG
        
       | HanClinto wrote:
       | Beautiful!
       | 
       | OpenAI isn't just training a model to produce more-verifiable
       | correct answers -- it's leveraging an adversarial relationship to
       | train a model that's better at being correct, and also a model
       | that's better at deceiving / being wrong. This is the key. There
       | are three agents here:
       | 
       | * A "verifier" (a small model, whose job it is to discern correct
       | answers from incorrect answers)
       | 
       | * A "helpful prover" (blue team, whose job it is to produce
       | correct answers with an easy-to-follow explanation)
       | 
       | * A "sneaky prover" (red team, whose job it is to produce
       | incorrect answers with a deceptive explanation)
       | 
       | By arranging these three models in an adversarial relationship
       | with a _true reinforcement learning feedback loop_ , the entire
       | model grows and gets better.
       | 
       | This is fantastic to read, and corroborates the results achieved
       | by SPAG -- easily one of my favorite papers from the past year.
       | SPAG pioneered (as far as I'm aware) the approach of using
       | adversarial language games in a true reinforcement-learning setup
       | (not merely RLHF, which isn't true RL), and showed that training
       | models in adversarial language games can show generalized
       | improvements even in areas not directly related to the game. [1]
       | 
       | Ever since the SPAG paper came out, I've been daydreaming about
       | the different sorts of adversarial games that one could use to
       | train LLMs. I've written down a bunch of notes on the subject [2]
       | (in case anyone else is reading my rambling notes).
       | 
       | I would really like to see some of these experiments actually get
       | up and running on open-source LLMs -- I'm excited to see if / how
       | they could be used to improve the quality of some of the open-
       | source base models that are floating around out there.
       | 
       | [1] https://github.com/Linear95/SPAG
       | 
       | [2] https://github.com/HanClinto/MENTAT
        
         | HanClinto wrote:
         | Because the Red and Blue agents are both trying to convince a
         | smaller language model of the rightness of their answer, they
         | each have to simplify their logic and wording down.
         | 
         | This feels like the ML equivalent of the old adage "If you
         | can't explain it to a six year old, you don't understand it
         | yourself."
        
           | enthulhusiastic wrote:
           | ELI6 why SPAG is better than just the default pretraining
           | method (token context statistics?) of an LLM.
        
             | TimPC wrote:
             | The red and blue agents are effectively unlimited sources
             | of true and false examples so you can get far more
             | efficient scale than you can by pre training with labelled
             | inputs. It's also far more targeted on correct/incorrect
             | rather than a notion of answer quality which doesn't
             | directly get at hallucination vs reality.
        
         | jgalt212 wrote:
         | Isn't this exactly how Alpha Go learns and works so good? It
         | always knows the right answer because it knows the rules of the
         | game and can easily compute W-L record.
         | 
         | In life, it's hard and very expensive to codify the rules, and
         | compute W-L record.
        
           | HanClinto wrote:
           | Yes, exactly.
           | 
           | Using traditional RL is easiest when you're using a landscape
           | with clearly defined rules -- like Go, or Starcraft, or
           | whatever. The trouble is those games don't translate well to
           | other domains -- it can learn about risk and reward and
           | whatnot from Chess, but it can't become a better chatbot.
           | 
           | But if the game space can operate through the realm of
           | language and semantics, then the hope is that we can tap into
           | the adversarial growth curve, but for LLMs.
           | 
           | As you note, this only works for situations where we can
           | clearly say "winner" or "loser". In OpenAI's case, they use
           | correctness of the math problem as one W/L metric (discrete
           | and measurable) as well as whether the Verifier was able to
           | correctly identify the answer as correct (thus the
           | understandability of the answer is also discrete and
           | measurable).
           | 
           | In the SPAG paper, they chose the game of "Taboo" as a way to
           | discretely measure W/L (asking: "did the defender say the
           | secret word or not").
           | 
           | As you noted, it's hard and expensive to codify the rules of
           | life. How do we objectively determine whether one poem is
           | more beautiful than another? I think we're a long way from
           | that.
           | 
           | The breakthrough that the SPAG paper showed is that -- by
           | teaching the models to be better at games that involve
           | language and semantics -- that they get better at language-
           | oriented tasks _overall_.
           | 
           | And that possibility excites me.
           | 
           | Sadly, as I've read further into the paper released by
           | OpenAI, it doesn't appear that adversarial training for
           | explainability increased the accuracy of the model -- and
           | while it was more understandable / verifiable, it wasn't any
           | better.
           | 
           | I think a very interesting metric would be to measure the
           | accuracy of the fine-tuned models on unrelated tasks to see
           | if the lessons learned to be better at explaining math
           | problems would help the model perform better for explaining
           | other problems (such as logic or reasoning).
        
             | bravura wrote:
             | Thank you for the SPAG paper.
             | 
             | Do you know how to play questions?
             | 
             | https://www.youtube.com/watch?v=u3xIs0aajN4
             | 
             | (Tom Stoppard, Rosencrantz and Guildenstern Are Dead).
             | 
             | The important question in the OpenAI work that you haven't
             | touched on is how to evaluate superintelligence. I guess I
             | would frame the problem like this:
             | 
             | Let's say there is a very esoteric but important branch of
             | abstract mathematics that only a few people claim to
             | understand. Is there a way for us to determine which
             | mathematicians are actually intelligent, and which are
             | bluffing? How?
        
       ___________________________________________________________________
       (page generated 2024-07-17 23:03 UTC)