[HN Gopher] Grandmaster-Level Chess Without Search
       ___________________________________________________________________
        
       Grandmaster-Level Chess Without Search
        
       Author : jonbaer
       Score  : 51 points
       Date   : 2024-02-08 13:57 UTC (9 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | fiforpg wrote:
       | Given that they used position evaluation from (a search chess
       | engine[1]) Stockfish, how is this "without search"?
       | 
       | Edit: looking further than the abstract, this is rather an
       | exploration of scale necessary for a strong engine. Could go
       | without "without search" in the title I guess.
       | 
       | [1]: IIRC, it also uses a Leela-inspired NN for evaluation.
        
         | throwaway81523 wrote:
         | Leela without search supposedly plays around expert level, but
         | I thought the no-search Leela approach ran out of gas around
         | there. Without search there means evaluating 1 board position
         | per move. The engine in the paper (per the abstract) use a big
         | LLM instead of a Leela style DCNN.
        
         | paulddraper wrote:
         | Training uses search, but it plays without search.
         | 
         | ChatGPT isn't human, but it was trained with humans.
        
           | karolist wrote:
           | So it's a space time trade-off then? Store enough searched
           | and weighted positions into the model and infer them. In this
           | way, inference is replacing Stockfish search, just less
           | accurately, but much faster and with memory required for the
           | model.
        
       | zniturah wrote:
       | They do use Stockfish for playing thought ...
       | 
       | "To prevent some of these situations, we check whether the
       | predicted scores for all top five moves lie above a win
       | percentage of 99% and double-check this condition with Stockfish,
       | and if so, use Stockfish's top move (out of these) to have
       | consistency in strategy across time-steps."
        
         | paulddraper wrote:
         | But only to complete a winning position.
        
           | phoe-krk wrote:
           | From the abstract:
           | 
           |  _> We annotate each board in the dataset with action-values
           | provided by the powerful Stockfish 16 engine, leading to
           | roughly 15 billion data points._
           | 
           | So some of the learning data comes from Stockfish.
        
             | paulddraper wrote:
             | The original comment was "for playing."
             | 
             | In training, traditional search is absolutely used to score
             | positions.
             | 
             | In playing, search is not used. (*Except to finish out an
             | already-won position.)
        
           | zniturah wrote:
           | That 'only' usage in the winning position could be a decisive
           | for gaining GM rating.
        
             | og_kalu wrote:
             | Search isn't used to play/win here. Just for training.
        
               | cool_dude85 wrote:
               | It looks like it does use search here in the sense that
               | Stockfish's top move is generated using search.
        
             | paulddraper wrote:
             | Positions with 99% win percentage are not decisive for GM
             | vs non-GM rating.
        
               | Someone wrote:
               | They are once your opponents know you're very bad at
               | converting them.
        
               | zniturah wrote:
               | Proof?
               | 
               | For winning any game at some point (at the end of the
               | game) there will be a position with >99% winning chances.
               | The move that follows are decisive.
        
               | littlestymaar wrote:
               | That's not how chess works. The move that follow aren't
               | usually decisive unless you don't know how to play the
               | game and make enormous mistakes.
               | 
               | Anyone that knows how to play can beat a GM with a big
               | enough advantage at the end of the game (which is what's
               | reflected in the win probability).
        
           | billforsternz wrote:
           | The process of converting a completely winning position
           | (typically one with a large material advantage) is a phase
           | change relative to normal play which is the struggle to
           | achieve such a position. In other words you are doing
           | something different at that point. For example, me as weak
           | FIDE CM (Candidate Master) could not compete with a top
           | grandmaster in a game of chess, but I could finish off a
           | trivial win.
           | 
           | Edit: Recently I brought some ancient (1978) chess software
           | back to life https://github.com/billforsternz/retro-sargon.
           | These two phases of chess, basically two different games,
           | were quite noticeable with that program, which is chess
           | software stripped back to the bone. Sargon 1978 could play
           | decently well, but it absolutely did not have the technique
           | to convert winning positions (because this is different
           | challenge to regular chess). For example, it could not in
           | general mate with rook (or even queen) and king against bare
           | king. The technique of squeezing the enemy king into a
           | progressively smaller box was unknown to it.
        
         | n2d4 wrote:
         | The context of that sentence:
         | 
         | > _Indecisiveness in the face of overwhelming victory_
         | 
         | > _If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k
         | and not a centipawn score. We map all such outputs to the
         | maximal value bin (i.e., a win percentage of 100%). Similarly,
         | in a very strong position, several actions may end up in the
         | maximum value bin. Thus, across time-steps this can lead to our
         | agent playing somewhat randomly, rather than committing to one
         | plan that finishes the game quickly (the agent has no knowledge
         | of its past moves). This creates the paradoxical situation that
         | our bot, despite being in a position of overwhelming win
         | percentage, fails to take the (virtually) guaranteed win and
         | might draw or even end up losing since small chances of a
         | mistake accumulate with longer games (see Figure 4). To prevent
         | some of these situations, we check whether the predicted scores
         | for all top five moves lie above a win percentage of 99% and
         | double-check this condition with Stockfish, and if so, use
         | Stockfish's top move (out of these) to have consistency in
         | strategy across time-steps._
        
           | Vecr wrote:
           | They should try to implement some kind of resolute agent in
           | that case. Might be hard to do if it needs to be "not
           | technically search" though.
        
       | xianshou wrote:
       | The path to AGI:
       | 
       | 0. Have model A.
       | 
       | 1. Use Monte Carlo with A to get supervised data.
       | 
       | 2. Train model B with data from A.
       | 
       | 3. Use Monte Carlo with B to get supervised data.
       | 
       | 4. Train model C with data from B...
        
         | spencerchubb wrote:
         | That is an awesome idea. I wish the authors would open source
         | the code and weights so this can be tried.
        
         | rprenger wrote:
         | This is pretty close to how AlphaZero works.
         | 
         | https://medium.com/applied-data-science/alphago-zero-explain...
        
         | jedberg wrote:
         | That's basically how OpenAI is working. They use generated
         | training sets from one model to train the next model (plus
         | other stuff with it).
         | 
         | But the "other stuff" is pretty important. That is what pulls
         | it away from just constantly re-amplifying the bias in the
         | initial training data.
        
           | londons_explore wrote:
           | I still want to see some examples of a "mistake" in the
           | training data getting detected or reduced.
           | 
           | For example, somewhere in the training data the string "All
           | cats are red" should get detected when lots of other data in
           | the training set contradicts the statement.
           | 
           | And obviously it doesn't have to be simple logical
           | statements, but also bigger questions like "how come the 2nd
           | world war happened despite X person and Y person being on
           | good speaking terms as evidenced by all these letters in the
           | archives?"
           | 
           | When AI can do that, it should be able to turn our body of
           | knowledge into a much bigger/more useful one by raising
           | questions that arise from data we already have, but never
           | noticed.
        
       | og_kalu wrote:
       | Well without _explicit_ search would probably be more accurate.
       | 
       | They note that though in the paper:
       | 
       | >Since transformers may learn to roll out iterative computation
       | (which arises in search) across layers, deeper networks may hold
       | the potential for deeper unrolls.
        
         | janalsncm wrote:
         | We don't know if it's using implicit search either. While it
         | would be interesting if the network was doing some internal
         | search, it's also possible it has just memorized the
         | evaluations from 10M games and is performing some function of
         | the similarity of the input to those previously seen.
        
           | og_kalu wrote:
           | >We don't know if it's using implicit search either.
           | 
           | Sure
           | 
           | >it's also possible it has just memorized the evaluations
           | from 10M games and is performing some function of the
           | similarity of the input to those previously seen.
           | 
           | That's not possible. The possible set of moves in chess is
           | incredibly large and it is incredibly easy to play a game
           | that has diverged from training. a model that has just
           | memorized all evaluations would break within ten or so moves
           | tops much less withstand robust evaluations.
           | 
           | However this model may work exactly and how much or little it
           | relies on search is unknown but it is no doubt a model of the
           | world of chess. https://adamkarvonen.github.io/machine_learni
           | ng/2024/01/03/c...
        
             | janalsncm wrote:
             | If it could reliably win a mate in N position without
             | inexplicably blundering, I would be more inclined to buy
             | your search hypothesis. But it doesn't, which is one of the
             | reasons the authors gave for finishing with stockfish. So
             | whatever it's doing is clearly lossy which an actual search
             | would not be.
             | 
             | Neural nets memorize all sorts of things. They memorize ad
             | clicks in high dimensional state spaces. Transformers
             | trained on the whole internet can often reproduce entire
             | texts. It's lossy, but it's still memorizing.
             | 
             | That seems like the simplest explanation for what's
             | happening here. There's some sort of lossy memorization,
             | not a search. The fact that the thing it has memorized is
             | the result of a search doesn't matter.
        
               | og_kalu wrote:
               | >If it could reliably win a mate in N position without
               | inexplicably blundering, I would be more inclined to buy
               | your search hypothesis.
               | 
               | I don't have a "search hypothesis". I don't know what
               | strategy the model employs to play. I was simply pointing
               | out that limited search learned by the transformer is not
               | out of the question. Stockfish finishing is not necessary
               | to play chess well above the level a memorization
               | hypothesis makes any sense. This is not the first LLM
               | chess machine.
               | 
               | >Neural nets memorize all sorts of things. They memorize
               | ad clicks in high dimensional state spaces. Transformers
               | trained on the whole internet can often reproduce entire
               | texts. It's lossy, but it's still memorizing.
               | 
               | Intelligent things memorize. Humans memorize a lot. I
               | never said the model hasn't memorized a fair few things.
               | Many human chess grandmaster memorize openings. What i'm
               | saying is that it's not playing games via memorization
               | any more than a human is doing the same.
               | 
               | >That seems like the simplest explanation for what's
               | happening here. There's some sort of lossy memorization,
               | not a search.
               | 
               | The options aren't only lossy memorization or lossless
               | search.
        
           | dawnofdusk wrote:
           | Even if it's "implicit" I'm not sure if that matters that
           | much. The point is that the model doesn't explicitly search
           | anything, it just applies the learned transformation. If the
           | weights of the learned transformation encode a sort of
           | precomputed search and interpolation over the dataset, from
           | an algorithmic perspective this still isn't search (it
           | doesn't enumerate board states or state-action transitions).
           | 
           | >performing some function of the similarity of the input to
           | those previously seen.
           | 
           | This is indeed what transformers do. But obviously it learns
           | some sort of interpolation/extrapolation which lets it do
           | well on board states/games outside the training set.
        
       | salamo wrote:
       | I think this is an interesting finding from a practical
       | perspective. A function which can reliably approximate stockfish
       | at a certain depth could replace it, basically "compressing"
       | search to a set depth. And unlike NNUE which is optimized for
       | CPU, a neural network is highly parallelizable on GPU meaning you
       | could send all possible future positions (at depth N) through the
       | network and use the results for a primitive tree search.
        
       | paulddraper wrote:
       | There is rampant misunderstanding of some parts of this article;
       | allow me to help :)
       | 
       | The "no-search" chess engine uses search (Stockfish) in in two
       | ways:
       | 
       | 1. To score positions in the training data. _This is only
       | training data, no search is performed when actually playing._
       | 
       | 2. To play moves when the position has many options with a 99%
       | win rate. _This is to prevent pathological behavior in already
       | won positions, and is not "meaningful" in the objective of
       | grandmaster-level play._
       | 
       | Thus, "without search" is a valid description.
        
         | zniturah wrote:
         | "Aready won position" or "99% win rate" is statistics given by
         | Stockfish (or professional chess player). It is weird to assume
         | that the same statement is true for the trained LLM since we
         | are assessing the LLM itself. If it is using during the game
         | then it is searching, thus the title doesn't reflect the actual
         | work.
        
         | janalsncm wrote:
         | In one sense, I can understand why they would choose to use
         | Stockfish in mate-in-N positions. The fact that the model can't
         | distinguish between mate in 5 and mate in 3 is an
         | implementation detail. Since the vast majority of positions are
         | not known to be wins or draws, it's still an interesting
         | finding.
         | 
         | However, in reality all positions _are_ actually wins (for
         | black or white) or draws. One reason they gave for why
         | stockfish is needed to finish the game is because their
         | evaluation function is imperfect, which is also an notable
         | result.
        
         | joe_the_user wrote:
         | Sort of but it seems a bit of a cheat.
         | 
         | Neural networks are universal approximators. Choose a function
         | and get enough data from it and you can approximate it very
         | closely and maybe exactly. If initially creating function F
         | required algorithm Y ("search" or whatever), you can do your
         | approximation to F and then say "Look F without Y" and for all
         | we know, the approximation might be doing things internally
         | that are actually nearly identical to the initial F.
        
           | jackblemming wrote:
           | No, arbitrarily wide neural networks are approximators of
           | Borel Measurable functions. Big difference between that and
           | "any function". RNNs are Turing Complete though.
        
             | eldenring wrote:
             | You can say the same thing about RNNs. Technically nothing
             | is turing complete without infinite scratch space.
        
           | janalsncm wrote:
           | A sufficiently large nn can learn an arbitrary function, yes.
           | But stockfish is also theoretically perfect given infinite
           | computational resources.
           | 
           | What is interesting is performing well under reasonable
           | computational constraints i.e. doing it faster/with fewer
           | flops than stockfish.
        
             | joe_the_user wrote:
             | Is the model more efficient than Stockfish? I think
             | Stockfish runs on regular CPU computer and I'd guess this "
             | 270M parameter transformer model" requires a GPU but I
             | can't find any reference to efficiency in the paper.
             | 
             | Also found in the paper: "While our largest model achieves
             | very good performance, it does not completely close the gap
             | to Stockfish 16". It's actually inferior but they still
             | think it's an interesting exercise. But that's the thing,
             | it's primarily an exercise like calculating pi to a billion
             | decimal points or overclocking a gaming laptop.
        
           | dawnofdusk wrote:
           | It doesn't get the actual optimal Q values computed from
           | Stockfish (presumably this takes infinite compute to
           | calculate), in fact it gets computed estimates from polling
           | Stockfish for only 50ms.
           | 
           | So you're estimating from data a function which is itself not
           | necessarily optimal. Moreover, the point is more like how far
           | can we get using a really generic transformer architecture
           | that is not tuned to domain-specific details of our problem,
           | which Stockfish is.
        
         | YeGoblynQueenne wrote:
         | >> 1. To score positions in the training data. This is only
         | training data, no search is performed when actually playing.
         | 
         | That's like saying you can have eggs without chickens, because
         | when you make an omelette you don't add chickens. It's
         | completely meaningless and a big fat lie to boot.
         | 
         | The truth is that the system created by DeepMind consists of
         | two components: a search-based system used to annotate a
         | dataset of moves and a neural-net based system that generates
         | moves similar to the ones in the dataset. DeepMind arbitrarily
         | draw the boundary of the system around the neural net component
         | and pretend that because the search is external to the neural
         | net, the neural net doesn't need the search.
         | 
         | And yet, without the search there is no dataset, and without
         | the dataset there is no model. They didn't train their system
         | by self-play and they certainly didn't hire an army of low-paid
         | workers to annotate moves for them. They generated training
         | moves with a search-based system and learned to reproduce them.
         | They used chickens to make eggs.
         | 
         | Their approach depends entirely on there being a powerful chess
         | search engine and they wouldn't be able to create their system
         | without it as a main component. Their "without search" claim is
         | just a marketing term.
        
       | jedberg wrote:
       | Now do Go. :)
        
         | Kon5ole wrote:
         | This used to be a comforting thought whenever computers beat
         | humans in chess, but I think that time has passed. The paper
         | mentions AlphaZero [1], which has beaten AlphaGo, which beat
         | Lee Sedol back in 2016 [2].
         | 
         | [1] https://en.wikipedia.org/wiki/AlphaZero
         | 
         | [2] https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
        
           | jedberg wrote:
           | My pithy comment probably wasn't enough to express what I
           | meant. :)
           | 
           | I know that computers have already beaten humans at Go. But
           | what's interesting is that in both the chess and Go cases, a
           | lot of real-time compute was necessary to win the games. Now
           | we have a potential way to build the model ahead of time such
           | that the computer during interactive play is much smaller.
           | 
           | This means that we can be much more portable with the
           | solution, and it also means that for online game companies,
           | they can spend a lot less money on gameplay, especially if
           | gameplay is most of their compute.
        
       | IIAOPSW wrote:
       | Slightly off topic but am I the only one that approaches strategy
       | games by making a "zeroth order approximation". Eg find the
       | shortest path to victory under the (obviously faulty) assumption
       | that my opponent does nothing and the board is unchanging except
       | for my moves. Now find my opponents shortest path to victory
       | under the same assumption. Then evaluate, if we both just ignore
       | each other and try to bum rush the victory condition, who gets
       | there first?
       | 
       | For most games, if you can see a way to an end state within 3-5
       | steps under these idealized conditions, there's only so much that
       | an actual opponent can do to make the board deviate from the
       | initial static board state that you used in your assumption. The
       | optimal strategy will always be just a few minor corrections of
       | edit distance from this dumb no-theory-of-mind strategy. You can
       | always be sure that whoever has the longer path to victory has to
       | do something to interfere with the shorter path of their
       | opponent, and there's only ever so many pieces which can interact
       | with that shorter path. Meaning whatever path to victory is
       | currently shortest short circuits the search for potential moves.
        
         | xbpx wrote:
         | That's why white has a higher statistical win rate ya?
        
         | dayjaby wrote:
         | On beginner level this might work, but if people are more
         | competitive they begin to realize the benefit of not only
         | playing the own game, but reading the enemies plan (e.g.
         | scouting in Starcraft/AoE2) to counteract it as much as
         | possible.
         | 
         | Chess against humen is different. Usually, there is no path to
         | victory, only to remis. People just follow strategic plans that
         | people told them would be slightly beneficial later on. Along
         | following that strategic plan people mess up and the first one
         | to realize that the opponent messed up usually wins. Like
         | having a piece advantage of 2-3 is usually considered a win
         | already.
        
           | jrockway wrote:
           | I agree with this. My default approach to board games is
           | basically to maximize victory points early. This usually
           | works; when 4 people are playing a new game for the first
           | time, I usually win. This doesn't really work when people
           | know how to play the game specifically, though.
           | 
           | I think this algorithm is better than many other algorithms
           | that people come up with, however.
           | 
           | (As an aside, when I play a card game I sort my cards with a
           | merge sort instead of an insertion sort. People said you
           | would never use these algorithms in real life, but you can if
           | you want to!)
        
         | bionsystem wrote:
         | A lot of decision in chess would be like "this square would be
         | nice for that piece, how can I get there ?" and then analyze
         | what your opponent can do to prevent you to do that, or what
         | counterplay that gives him. So what you are doing makes a lot
         | of sense.
        
         | jncfhnb wrote:
         | Ehh idk. Sounds like it's prone to the beginner strategy of
         | assuming your opponent will occasionally do something really
         | dumb.
        
       | ivanjermakov wrote:
       | I thought clickbaity titles were discouraged in whitepaper world.
        
       | bjourne wrote:
       | That must mean they found some similarity metric for high-level
       | chess which is very impressive. In chess one pawn moving one
       | square can be the difference between a won and a lost position.
       | But knowing that usually requires lots of calculation.
        
       | iamcreasy wrote:
       | I guess this current method is not applicable to have the model
       | explain why a given move was played as it is not planning more
       | than one more ahead. Very cool, nonetheless.
        
       | penjelly wrote:
       | i dont follow... even if its trained anc doesnt use search isnt
       | the act of it deciding the next move a sortof search anyway based
       | off its training? Ive heard people describe LLMs as extremely
       | broad search, basically attempting to build world model and then
       | predicting the next world based on that. Is this fundamentally
       | different from search? Am i wrong in my assumptions here?
        
       ___________________________________________________________________
       (page generated 2024-02-08 23:01 UTC)