[HN Gopher] Grandmaster-Level Chess Without Search
___________________________________________________________________
Grandmaster-Level Chess Without Search
Author : jonbaer
Score : 51 points
Date : 2024-02-08 13:57 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| fiforpg wrote:
| Given that they used position evaluation from (a search chess
| engine[1]) Stockfish, how is this "without search"?
|
| Edit: looking further than the abstract, this is rather an
| exploration of scale necessary for a strong engine. Could go
| without "without search" in the title I guess.
|
| [1]: IIRC, it also uses a Leela-inspired NN for evaluation.
| throwaway81523 wrote:
| Leela without search supposedly plays around expert level, but
| I thought the no-search Leela approach ran out of gas around
| there. Without search there means evaluating 1 board position
| per move. The engine in the paper (per the abstract) use a big
| LLM instead of a Leela style DCNN.
| paulddraper wrote:
| Training uses search, but it plays without search.
|
| ChatGPT isn't human, but it was trained with humans.
| karolist wrote:
| So it's a space time trade-off then? Store enough searched
| and weighted positions into the model and infer them. In this
| way, inference is replacing Stockfish search, just less
| accurately, but much faster and with memory required for the
| model.
| zniturah wrote:
| They do use Stockfish for playing thought ...
|
| "To prevent some of these situations, we check whether the
| predicted scores for all top five moves lie above a win
| percentage of 99% and double-check this condition with Stockfish,
| and if so, use Stockfish's top move (out of these) to have
| consistency in strategy across time-steps."
| paulddraper wrote:
| But only to complete a winning position.
| phoe-krk wrote:
| From the abstract:
|
| _> We annotate each board in the dataset with action-values
| provided by the powerful Stockfish 16 engine, leading to
| roughly 15 billion data points._
|
| So some of the learning data comes from Stockfish.
| paulddraper wrote:
| The original comment was "for playing."
|
| In training, traditional search is absolutely used to score
| positions.
|
| In playing, search is not used. (*Except to finish out an
| already-won position.)
| zniturah wrote:
| That 'only' usage in the winning position could be a decisive
| for gaining GM rating.
| og_kalu wrote:
| Search isn't used to play/win here. Just for training.
| cool_dude85 wrote:
| It looks like it does use search here in the sense that
| Stockfish's top move is generated using search.
| paulddraper wrote:
| Positions with 99% win percentage are not decisive for GM
| vs non-GM rating.
| Someone wrote:
| They are once your opponents know you're very bad at
| converting them.
| zniturah wrote:
| Proof?
|
| For winning any game at some point (at the end of the
| game) there will be a position with >99% winning chances.
| The move that follows are decisive.
| littlestymaar wrote:
| That's not how chess works. The move that follow aren't
| usually decisive unless you don't know how to play the
| game and make enormous mistakes.
|
| Anyone that knows how to play can beat a GM with a big
| enough advantage at the end of the game (which is what's
| reflected in the win probability).
| billforsternz wrote:
| The process of converting a completely winning position
| (typically one with a large material advantage) is a phase
| change relative to normal play which is the struggle to
| achieve such a position. In other words you are doing
| something different at that point. For example, me as weak
| FIDE CM (Candidate Master) could not compete with a top
| grandmaster in a game of chess, but I could finish off a
| trivial win.
|
| Edit: Recently I brought some ancient (1978) chess software
| back to life https://github.com/billforsternz/retro-sargon.
| These two phases of chess, basically two different games,
| were quite noticeable with that program, which is chess
| software stripped back to the bone. Sargon 1978 could play
| decently well, but it absolutely did not have the technique
| to convert winning positions (because this is different
| challenge to regular chess). For example, it could not in
| general mate with rook (or even queen) and king against bare
| king. The technique of squeezing the enemy king into a
| progressively smaller box was unknown to it.
| n2d4 wrote:
| The context of that sentence:
|
| > _Indecisiveness in the face of overwhelming victory_
|
| > _If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k
| and not a centipawn score. We map all such outputs to the
| maximal value bin (i.e., a win percentage of 100%). Similarly,
| in a very strong position, several actions may end up in the
| maximum value bin. Thus, across time-steps this can lead to our
| agent playing somewhat randomly, rather than committing to one
| plan that finishes the game quickly (the agent has no knowledge
| of its past moves). This creates the paradoxical situation that
| our bot, despite being in a position of overwhelming win
| percentage, fails to take the (virtually) guaranteed win and
| might draw or even end up losing since small chances of a
| mistake accumulate with longer games (see Figure 4). To prevent
| some of these situations, we check whether the predicted scores
| for all top five moves lie above a win percentage of 99% and
| double-check this condition with Stockfish, and if so, use
| Stockfish's top move (out of these) to have consistency in
| strategy across time-steps._
| Vecr wrote:
| They should try to implement some kind of resolute agent in
| that case. Might be hard to do if it needs to be "not
| technically search" though.
| xianshou wrote:
| The path to AGI:
|
| 0. Have model A.
|
| 1. Use Monte Carlo with A to get supervised data.
|
| 2. Train model B with data from A.
|
| 3. Use Monte Carlo with B to get supervised data.
|
| 4. Train model C with data from B...
| spencerchubb wrote:
| That is an awesome idea. I wish the authors would open source
| the code and weights so this can be tried.
| rprenger wrote:
| This is pretty close to how AlphaZero works.
|
| https://medium.com/applied-data-science/alphago-zero-explain...
| jedberg wrote:
| That's basically how OpenAI is working. They use generated
| training sets from one model to train the next model (plus
| other stuff with it).
|
| But the "other stuff" is pretty important. That is what pulls
| it away from just constantly re-amplifying the bias in the
| initial training data.
| londons_explore wrote:
| I still want to see some examples of a "mistake" in the
| training data getting detected or reduced.
|
| For example, somewhere in the training data the string "All
| cats are red" should get detected when lots of other data in
| the training set contradicts the statement.
|
| And obviously it doesn't have to be simple logical
| statements, but also bigger questions like "how come the 2nd
| world war happened despite X person and Y person being on
| good speaking terms as evidenced by all these letters in the
| archives?"
|
| When AI can do that, it should be able to turn our body of
| knowledge into a much bigger/more useful one by raising
| questions that arise from data we already have, but never
| noticed.
| og_kalu wrote:
| Well without _explicit_ search would probably be more accurate.
|
| They note that though in the paper:
|
| >Since transformers may learn to roll out iterative computation
| (which arises in search) across layers, deeper networks may hold
| the potential for deeper unrolls.
| janalsncm wrote:
| We don't know if it's using implicit search either. While it
| would be interesting if the network was doing some internal
| search, it's also possible it has just memorized the
| evaluations from 10M games and is performing some function of
| the similarity of the input to those previously seen.
| og_kalu wrote:
| >We don't know if it's using implicit search either.
|
| Sure
|
| >it's also possible it has just memorized the evaluations
| from 10M games and is performing some function of the
| similarity of the input to those previously seen.
|
| That's not possible. The possible set of moves in chess is
| incredibly large and it is incredibly easy to play a game
| that has diverged from training. a model that has just
| memorized all evaluations would break within ten or so moves
| tops much less withstand robust evaluations.
|
| However this model may work exactly and how much or little it
| relies on search is unknown but it is no doubt a model of the
| world of chess. https://adamkarvonen.github.io/machine_learni
| ng/2024/01/03/c...
| janalsncm wrote:
| If it could reliably win a mate in N position without
| inexplicably blundering, I would be more inclined to buy
| your search hypothesis. But it doesn't, which is one of the
| reasons the authors gave for finishing with stockfish. So
| whatever it's doing is clearly lossy which an actual search
| would not be.
|
| Neural nets memorize all sorts of things. They memorize ad
| clicks in high dimensional state spaces. Transformers
| trained on the whole internet can often reproduce entire
| texts. It's lossy, but it's still memorizing.
|
| That seems like the simplest explanation for what's
| happening here. There's some sort of lossy memorization,
| not a search. The fact that the thing it has memorized is
| the result of a search doesn't matter.
| og_kalu wrote:
| >If it could reliably win a mate in N position without
| inexplicably blundering, I would be more inclined to buy
| your search hypothesis.
|
| I don't have a "search hypothesis". I don't know what
| strategy the model employs to play. I was simply pointing
| out that limited search learned by the transformer is not
| out of the question. Stockfish finishing is not necessary
| to play chess well above the level a memorization
| hypothesis makes any sense. This is not the first LLM
| chess machine.
|
| >Neural nets memorize all sorts of things. They memorize
| ad clicks in high dimensional state spaces. Transformers
| trained on the whole internet can often reproduce entire
| texts. It's lossy, but it's still memorizing.
|
| Intelligent things memorize. Humans memorize a lot. I
| never said the model hasn't memorized a fair few things.
| Many human chess grandmaster memorize openings. What i'm
| saying is that it's not playing games via memorization
| any more than a human is doing the same.
|
| >That seems like the simplest explanation for what's
| happening here. There's some sort of lossy memorization,
| not a search.
|
| The options aren't only lossy memorization or lossless
| search.
| dawnofdusk wrote:
| Even if it's "implicit" I'm not sure if that matters that
| much. The point is that the model doesn't explicitly search
| anything, it just applies the learned transformation. If the
| weights of the learned transformation encode a sort of
| precomputed search and interpolation over the dataset, from
| an algorithmic perspective this still isn't search (it
| doesn't enumerate board states or state-action transitions).
|
| >performing some function of the similarity of the input to
| those previously seen.
|
| This is indeed what transformers do. But obviously it learns
| some sort of interpolation/extrapolation which lets it do
| well on board states/games outside the training set.
| salamo wrote:
| I think this is an interesting finding from a practical
| perspective. A function which can reliably approximate stockfish
| at a certain depth could replace it, basically "compressing"
| search to a set depth. And unlike NNUE which is optimized for
| CPU, a neural network is highly parallelizable on GPU meaning you
| could send all possible future positions (at depth N) through the
| network and use the results for a primitive tree search.
| paulddraper wrote:
| There is rampant misunderstanding of some parts of this article;
| allow me to help :)
|
| The "no-search" chess engine uses search (Stockfish) in in two
| ways:
|
| 1. To score positions in the training data. _This is only
| training data, no search is performed when actually playing._
|
| 2. To play moves when the position has many options with a 99%
| win rate. _This is to prevent pathological behavior in already
| won positions, and is not "meaningful" in the objective of
| grandmaster-level play._
|
| Thus, "without search" is a valid description.
| zniturah wrote:
| "Aready won position" or "99% win rate" is statistics given by
| Stockfish (or professional chess player). It is weird to assume
| that the same statement is true for the trained LLM since we
| are assessing the LLM itself. If it is using during the game
| then it is searching, thus the title doesn't reflect the actual
| work.
| janalsncm wrote:
| In one sense, I can understand why they would choose to use
| Stockfish in mate-in-N positions. The fact that the model can't
| distinguish between mate in 5 and mate in 3 is an
| implementation detail. Since the vast majority of positions are
| not known to be wins or draws, it's still an interesting
| finding.
|
| However, in reality all positions _are_ actually wins (for
| black or white) or draws. One reason they gave for why
| stockfish is needed to finish the game is because their
| evaluation function is imperfect, which is also an notable
| result.
| joe_the_user wrote:
| Sort of but it seems a bit of a cheat.
|
| Neural networks are universal approximators. Choose a function
| and get enough data from it and you can approximate it very
| closely and maybe exactly. If initially creating function F
| required algorithm Y ("search" or whatever), you can do your
| approximation to F and then say "Look F without Y" and for all
| we know, the approximation might be doing things internally
| that are actually nearly identical to the initial F.
| jackblemming wrote:
| No, arbitrarily wide neural networks are approximators of
| Borel Measurable functions. Big difference between that and
| "any function". RNNs are Turing Complete though.
| eldenring wrote:
| You can say the same thing about RNNs. Technically nothing
| is turing complete without infinite scratch space.
| janalsncm wrote:
| A sufficiently large nn can learn an arbitrary function, yes.
| But stockfish is also theoretically perfect given infinite
| computational resources.
|
| What is interesting is performing well under reasonable
| computational constraints i.e. doing it faster/with fewer
| flops than stockfish.
| joe_the_user wrote:
| Is the model more efficient than Stockfish? I think
| Stockfish runs on regular CPU computer and I'd guess this "
| 270M parameter transformer model" requires a GPU but I
| can't find any reference to efficiency in the paper.
|
| Also found in the paper: "While our largest model achieves
| very good performance, it does not completely close the gap
| to Stockfish 16". It's actually inferior but they still
| think it's an interesting exercise. But that's the thing,
| it's primarily an exercise like calculating pi to a billion
| decimal points or overclocking a gaming laptop.
| dawnofdusk wrote:
| It doesn't get the actual optimal Q values computed from
| Stockfish (presumably this takes infinite compute to
| calculate), in fact it gets computed estimates from polling
| Stockfish for only 50ms.
|
| So you're estimating from data a function which is itself not
| necessarily optimal. Moreover, the point is more like how far
| can we get using a really generic transformer architecture
| that is not tuned to domain-specific details of our problem,
| which Stockfish is.
| YeGoblynQueenne wrote:
| >> 1. To score positions in the training data. This is only
| training data, no search is performed when actually playing.
|
| That's like saying you can have eggs without chickens, because
| when you make an omelette you don't add chickens. It's
| completely meaningless and a big fat lie to boot.
|
| The truth is that the system created by DeepMind consists of
| two components: a search-based system used to annotate a
| dataset of moves and a neural-net based system that generates
| moves similar to the ones in the dataset. DeepMind arbitrarily
| draw the boundary of the system around the neural net component
| and pretend that because the search is external to the neural
| net, the neural net doesn't need the search.
|
| And yet, without the search there is no dataset, and without
| the dataset there is no model. They didn't train their system
| by self-play and they certainly didn't hire an army of low-paid
| workers to annotate moves for them. They generated training
| moves with a search-based system and learned to reproduce them.
| They used chickens to make eggs.
|
| Their approach depends entirely on there being a powerful chess
| search engine and they wouldn't be able to create their system
| without it as a main component. Their "without search" claim is
| just a marketing term.
| jedberg wrote:
| Now do Go. :)
| Kon5ole wrote:
| This used to be a comforting thought whenever computers beat
| humans in chess, but I think that time has passed. The paper
| mentions AlphaZero [1], which has beaten AlphaGo, which beat
| Lee Sedol back in 2016 [2].
|
| [1] https://en.wikipedia.org/wiki/AlphaZero
|
| [2] https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
| jedberg wrote:
| My pithy comment probably wasn't enough to express what I
| meant. :)
|
| I know that computers have already beaten humans at Go. But
| what's interesting is that in both the chess and Go cases, a
| lot of real-time compute was necessary to win the games. Now
| we have a potential way to build the model ahead of time such
| that the computer during interactive play is much smaller.
|
| This means that we can be much more portable with the
| solution, and it also means that for online game companies,
| they can spend a lot less money on gameplay, especially if
| gameplay is most of their compute.
| IIAOPSW wrote:
| Slightly off topic but am I the only one that approaches strategy
| games by making a "zeroth order approximation". Eg find the
| shortest path to victory under the (obviously faulty) assumption
| that my opponent does nothing and the board is unchanging except
| for my moves. Now find my opponents shortest path to victory
| under the same assumption. Then evaluate, if we both just ignore
| each other and try to bum rush the victory condition, who gets
| there first?
|
| For most games, if you can see a way to an end state within 3-5
| steps under these idealized conditions, there's only so much that
| an actual opponent can do to make the board deviate from the
| initial static board state that you used in your assumption. The
| optimal strategy will always be just a few minor corrections of
| edit distance from this dumb no-theory-of-mind strategy. You can
| always be sure that whoever has the longer path to victory has to
| do something to interfere with the shorter path of their
| opponent, and there's only ever so many pieces which can interact
| with that shorter path. Meaning whatever path to victory is
| currently shortest short circuits the search for potential moves.
| xbpx wrote:
| That's why white has a higher statistical win rate ya?
| dayjaby wrote:
| On beginner level this might work, but if people are more
| competitive they begin to realize the benefit of not only
| playing the own game, but reading the enemies plan (e.g.
| scouting in Starcraft/AoE2) to counteract it as much as
| possible.
|
| Chess against humen is different. Usually, there is no path to
| victory, only to remis. People just follow strategic plans that
| people told them would be slightly beneficial later on. Along
| following that strategic plan people mess up and the first one
| to realize that the opponent messed up usually wins. Like
| having a piece advantage of 2-3 is usually considered a win
| already.
| jrockway wrote:
| I agree with this. My default approach to board games is
| basically to maximize victory points early. This usually
| works; when 4 people are playing a new game for the first
| time, I usually win. This doesn't really work when people
| know how to play the game specifically, though.
|
| I think this algorithm is better than many other algorithms
| that people come up with, however.
|
| (As an aside, when I play a card game I sort my cards with a
| merge sort instead of an insertion sort. People said you
| would never use these algorithms in real life, but you can if
| you want to!)
| bionsystem wrote:
| A lot of decision in chess would be like "this square would be
| nice for that piece, how can I get there ?" and then analyze
| what your opponent can do to prevent you to do that, or what
| counterplay that gives him. So what you are doing makes a lot
| of sense.
| jncfhnb wrote:
| Ehh idk. Sounds like it's prone to the beginner strategy of
| assuming your opponent will occasionally do something really
| dumb.
| ivanjermakov wrote:
| I thought clickbaity titles were discouraged in whitepaper world.
| bjourne wrote:
| That must mean they found some similarity metric for high-level
| chess which is very impressive. In chess one pawn moving one
| square can be the difference between a won and a lost position.
| But knowing that usually requires lots of calculation.
| iamcreasy wrote:
| I guess this current method is not applicable to have the model
| explain why a given move was played as it is not planning more
| than one more ahead. Very cool, nonetheless.
| penjelly wrote:
| i dont follow... even if its trained anc doesnt use search isnt
| the act of it deciding the next move a sortof search anyway based
| off its training? Ive heard people describe LLMs as extremely
| broad search, basically attempting to build world model and then
| predicting the next world based on that. Is this fundamentally
| different from search? Am i wrong in my assumptions here?
___________________________________________________________________
(page generated 2024-02-08 23:01 UTC)