[HN Gopher] The neural network of the Stockfish chess engine
___________________________________________________________________
The neural network of the Stockfish chess engine
Author : c1ccccc1
Score : 328 points
Date : 2021-01-13 08:01 UTC (14 hours ago)
(HTM) web link (cp4space.hatsya.com)
(TXT) w3m dump (cp4space.hatsya.com)
| billiam wrote:
| Analyses like this are a great indication that the main effect of
| refining neural networks around chess will make neural networks
| more exciting and ultimately make chess more boring.
| gelert wrote:
| Why do you believe that neural networks being better at chess
| would make it more boring? Genuinely I just don't follow the
| logic.
| zetazzed wrote:
| The gap between white and black piece performance is massive in
| these top engines if I'm reading it right. LCZero won 0/ lost 4
| as black and won 24 / lost zero as white (with lots of draws)? I
| had no idea the split was so big now. Do human tournaments look
| like this too these days? (From https://tcec-chess.com/)
| bonzini wrote:
| Not really---similar for sure, but not as bad for black.
| Consider that computer chess tournaments do not start with the
| pieces in their initial positions. The first 5-15 moves are
| predetermined by human master players to achieve positions that
| are imbalanced while not having a side with a clearer
| advantage. Otherwise it would be even more of a draw fest.
|
| This makes things a bit worse for black, typically, because
| playing for the win as black means making some positional
| concession (that white can exploit after successfully
| defending) in exchange for the initiative and a chance for an
| attack.
|
| In human play, black has a little more ability than white to
| steer the game towards their opening preparation[1]. If they
| want to win, they will try to get have a position that they
| know in and out from (computer-assisted) home preparation. But
| there are still a lot of draws in classical (many hours per
| game) chess.
|
| [1] Chess openings that white chooses are typically called
| "attack" (King's Indian Attack) or "game" (Scotch Game), while
| those that black chooses are called "defense" (Sicilian
| defense). You'll find that there are a lot more "defenses" that
| black can choose from than "attacks".
| tayo42 wrote:
| > if they want to win, they will try
|
| Why would someone not want to win?
| Germanika wrote:
| Chess matches usually involve multiple games, so you don't
| need to win every game to win the match. It's not uncommon
| to play trying to avoid a loss (going for a draw), instead
| of playing riskier moves in an attempt to force a win.
| bonzini wrote:
| Because they may be content with drawing as black and seek
| an (ever so slightly) easier win as white. It depends on
| many factors: relative player strength, duration of the
| match, how players are paired in tournament rounds, current
| standings in the tournament, and so on. A computer won't do
| this kind of reasoning.
| dsjoerg wrote:
| Overwhelmingly the result is a draw. White can win sometimes,
| and wins for black are very rare. It's similar for top
| grandmasters, but not quite as stark.
| knuthsat wrote:
| This is very nice. Reminds me a lot of tricks used for simple
| linear models. And it seems to work, given that Leela Chess Zero
| is losing with quite a gap.
|
| Most of the times one would learn a model by just changing a
| single feature and then doing the whole sum made no sense.
|
| A good example is learning a sequence of decisions, where each
| decision might have a cost associated to it, you can then say
| that the current decision depends on a previous one and vary the
| previous one to learn to recover from errors. If previous
| decision was bad, then you'd still like to make the best decision
| for current state.).
|
| So even if your training data does not have this error-recovery
| example, you can just iterate through all previous decisions and
| make the model learn to recover.
|
| An optimization in that case would be to just not redo the whole
| sum (for computing the decision function of a linear model).
| kohlerm wrote:
| "Leela Chess Zero is losing with quite a gap" Check
| https://tcec-chess.com/ ATM Leela is in front again. That being
| said it is not clear which approach is better e.g. a very smart
| but relatively slow evaluation function (Leela) or SFs approach
| to use a relatively dumb eval function with a more
| sophisticated search. It is pretty clear that Leelas search can
| be improved (Check https://github.com/dje-dev/Ceres for
| example)
| knuthsat wrote:
| Looks like the tournament is still ongoing. I was referring
| to the last few.
|
| https://en.wikipedia.org/wiki/TCEC_Season_18
|
| https://en.wikipedia.org/wiki/TCEC_Season_19
| kohlerm wrote:
| That being said, assuming you can fully saturate multiple
| GPUs then the SF approach has the disadvantage that it cannot
| use the performance improvements for GPUs/TPUs which still
| are growing fast, whereas CPU performance only grows slowly
| these days.
| stabbles wrote:
| The Stockfish mentality is: not everybody owns a GPU and
| Stockfish should be available for everybody and perform
| well [1]. So they went for a CPU micro-arch optimized
| neural net, which is great. Maybe this is ultimately not
| the best for tournaments.
|
| What I would find interesting is if they could give engines
| an energy budget instead of a time limit. Maybe that makes
| CPU vs GPU games more interesting & fair.
|
| [1] https://github.com/official-
| stockfish/Stockfish/issues/2823
| yters wrote:
| They should also give an energy budget of human vs
| computer.
| kohlerm wrote:
| IMHO they did this because that was the only incremental
| way to improve strength. They always relied an a fast
| search with a relatively simplistic eval function. For
| this approach alpha beta search works well. What is not
| clear whether it works well for the Leela approach. All
| attempts so far at least have not been successful(could
| be for other reasons, such as training approach). SF
| strength seems to be their well optimized search
| function. Rewriting that would IMHO be equivalent to
| creating a new chess engine. That being said I still
| think that the success of SF NNUE is very remarkable.
| nl wrote:
| Generally the GPU vs CPU gap on the evaluation side isn't
| nearly as big as in training. In theory the gap should
| remain large, but in practice things like the inability to
| have data ready for batch evaluation means it is harder to
| saturate the GPU (which as you note is important).
|
| But I'm not sure how much of this applies to chess engines.
| I see some comments noting that the search part makes it
| hard to generate batches, but I'm not an expert in this.
| confuseshrink wrote:
| Interesting point. Nvidia have been improving the int
| performance for quantized inference on their GPUs a lot. It
| might be a lot of work but could it be possible to scale up
| this NNUE approach to the point where it would be
| worthwhile to run on a GPU?
|
| For single-input "batches" (seems like this is what's being
| used now?) it might never be worthwhile but perhaps if
| multiple positions could be searched in parallel and the NN
| evaluation batched this might start to look tempting?
|
| Not sure what the effect of running PVS with multiple
| parallel search threads is. Presumably the payoff of
| searching with less information means you reach the
| performance ceiling quite a lot quicker than MCTS-like
| searches as the pruning is a lot more sensitive to having
| up-to-date information about the principal variation.
|
| Disclaimer: My understanding of PVS is very limited.
| kohlerm wrote:
| Sure if someone can come up with an approach to run an
| NNUE (efficiently updatable) network on GPUs that might
| really be another breakthrough. But at a first glance it
| looks to me that this could be very difficult. Because
| AFAIK the SF search is relatively complicated. Even for
| Leela implementing an efficient batched search on
| multiple GPUs seems to be difficult (some improvements
| coming with Ceres). And Leela is using a much simple MCTS
| search. That doesn't that Leela's search could not be
| improved. It does not give higher priority necessarily
| for forced sequence of moves (at least not explicitly) or
| high risk moves. Which is IMHO why sometimes she does not
| see relatively simple tactics.
| spiantino wrote:
| Great writeup!
|
| One thing I don't understand is why it would be smarter to
| augment the inputs with some of the missing board information -
| particularly the availability of castling. Even though this
| network is a quick-and-dirty evaluation, seems like there's room
| for a few additional bits in the input and being able to castle
| very much changes the evaluation of some common positions.
| mattnewton wrote:
| I am not an expert (and don't have any idea what I am talking
| about), but wouldn't this be captured the same way other future
| positional advantages are when evaluating the next "level" of
| possible board positions? Ie, the fact that a piece can later
| give check, or castle, or anything else if you first move it
| here is a function of that next move's resulting position
| score.
|
| This network is just to signal to the final search algorithm
| how good the board looks in a given spot on tree of possible
| moves.
| Fragoel2 wrote:
| I don't know a lot about chess but I have one question: isn't
| chess a solved game (in the sense that given a board state we
| always can compute the right move)? why use a neural network that
| can introduce, a very small percentage of the time, mistakes? I
| guess it is for performance reasons?
| aliceryhl wrote:
| No, it is not solved. To solve it, you would need to
| investigate all possible continuations of the game, all the way
| until the game ends, but there are so many continuations that
| this is impossible.
| nullc wrote:
| A game could be solved even if its state space was too large
| to enumerate, for example if you could show that there was
| some latent structure that let you prove things about large
| parts of the state space without completely exploring it.
|
| This is, for example, why we have a concrete number for the
| number of legal go positions even though there are far too
| many to enumerate. https://en.wikipedia.org/wiki/Go_and_mathe
| matics#Legal_posit...
|
| The rules of chess are so irregular that it seems likely that
| any latent structure would be tremendously complex, but you
| never know what some clever new analysis technique will do.
| hiq wrote:
| https://en.wikipedia.org/wiki/Solving_chess
| elcomet wrote:
| Chess is definitely not a solved game. The end game is solved
| though, when you have only 7 pieces left on the board.
|
| See
| https://en.wikipedia.org/wiki/Endgame_tablebase#Computer_che...
| mvanaltvorst wrote:
| You can imagine playing chess as a tree. You start with a game
| state, and every possible move is an edge to a new game state.
| Unfortunately, the amount of game states grows approximately
| exponentially (e.g. for every game state there are
| approximately 40 moves to play, thus every extra move
| multiplies #(game states) by 40, approximately). This neural
| engine is a trick so that Stockfish does not have to simulate
| all 40 moves every game states. The neural net outputs the
| moves that are most likely to be strong moves, and Stockfish
| will only consider those moves. Of course, this is a very basic
| explanation and there are many more optimisations Stockfish
| uses, though the ultimate goal of almost every optimisation is
| to reduce the amount of simulation Stockfish has to do.
| V-2 wrote:
| _" This neural engine is a trick so that Stockfish does not
| have to simulate all 40 moves every game states."_
|
| This optimization _always_ done by chess engines, it 's
| called pruning (they'd be quite crippled without it).
|
| Maybe the neural network component is now in charge of it,
| but it's not a new thing.
| mvanaltvorst wrote:
| Of course, I had accumulated all other pruning strategies
| under the "other optimisations".
| blt wrote:
| From the article:
|
| > _it's a simple feedforward network with... a single scalar
| output to give a numerical score for the position, indicating
| how favourable it is for the player about to move._
|
| The search algorithm proposes moves, computes the board state
| resulting from them, and evaluates each state.
|
| This may seem like a trivial distinction, but it's important.
| The Stockfish method requires knowing the rules of the game;
| networks that output a move directly do not.
| bbojan wrote:
| >> The neural net outputs the moves that are most likely to
| be strong moves, and Stockfish will only consider those
| moves.
|
| I don't think this is correct. According to the article, they
| are only using the neural network for evaluating the value
| (strength) of the positions. The search is still the same.
|
| I believe Leela works in the manner you described.
| dan-robertson wrote:
| Chess is not a solved game (though I think the stalemate rules
| make it finite, there are far too many positions). All computer
| chess programs have to try to determine if positions will be
| good without playing out all possible ways the game could go.
| Usually this is some combination of search strategies and
| scoring positions (the article describes how stockfish scores
| positions).
| V-2 wrote:
| I agree it's finite, but why do you think it's specifically
| owed to the stalemate rule, of all things?
| elcomet wrote:
| I believe it's the repetition rule that makes the game
| finite, not the stalemate rule, since number of positions
| is finite.
|
| There is also the 50-move rule of course which limits more
| drastically the length of a game.
| dan-robertson wrote:
| I think this is what I was referring to.
| V-2 wrote:
| OK, so that's not a stalemate. A stalemate is a situation
| in which one of the players has no legal moves (but isn't
| in check, which would be checkmate).
|
| It's a typical mistake to equate chess draws with a
| stalemate, possibly due to the broader, idiomatic meaning
| of "stalemate" in common speech.
| V-2 wrote:
| Right, that would make more sense to me. Although it
| should be stressed that neither threefold repetition nor
| the 50-move rule work automatically. They need to be
| claimed by one of the players. If neither player does,
| the game can go on.
| jstanley wrote:
| The 75-move rule does work automatically though, so the
| game is still finite.
| V-2 wrote:
| Frankly, I wasn't aware of the 75-move rule! This indeed
| caps the length of possible games.
|
| The threefold repetition rule doesn't have an equivalent
| (there's no, say, "fivefold repetition" draw that could
| be enforced despite the players' will), but it's really
| irrelevant from the perspective of solving chess, because
| there's no point in analyzing what happens if game goes
| through the same loop 1, 10 or 100 times.
|
| And the 75-move rule sets a hard limit anyway, so you
| couldn't loop infinitely, even if it mattered.
| anandoza wrote:
| Lichess.org actually does have a 5fold repetition rule
| that applies without players claiming it --
| https://lichess.org/faq#threefold
|
| Kind of funny that your hypothetical example was exactly
| reality in this case.
| harryposner wrote:
| > there's no, say, "fivefold repetition" draw that could
| be enforced despite the players' will
|
| There is exactly such a rule. See the FIDE Laws of Chess
| [0], rule 9.6.1.
|
| [0] https://handbook.fide.com/chapter/E012018
| danielbarla wrote:
| Chess is estimated to have 10^120 possible games. Observable
| universe is estimated to have between 10^78 to 10^82 atoms. So,
| unless you had some way of dramatically compressing or
| simplifying this search space, you'd be needing to store 38
| bits+ information, if you had every atom in the observable
| universe as your memory. Suffice to say, this has not yet been
| achieved.
| altvali wrote:
| You don't need to care about possible games, but game states,
| what's the best result and the move to achieve it in each.
| Another user computed an upper bound at 8.7E+45 positions:
| https://github.com/lechmazur/ChessCounter . I pointed out
| that our planet has about 1E+50 atoms. And we can further
| compress the database by perhaps two orders of magnitude if
| we don't store symmetric positions or positions close to
| mate. A Kardashev 2 civilization could play perfect chess.
| goatlover wrote:
| > A Kardashev 2 civilization could play perfect chess.
|
| Assuming they would want to waste part of a planet's worth
| of computing resources to play perfect chess. In any case,
| it's far beyond anything we can compute.
| kevincox wrote:
| You don't need to consider every state individually to solve
| a game. For example if you have a game on a 100x100 board and
| the only goal is to move your piece to the other side you can
| know you have found the optimal solution if you can do it in
| 99 moves (assuming that you move 1 per turn).
|
| In the chess scenario if you can prove that there is a set of
| moves where you win no matter what the other colour does then
| you have solved the game. You don't need to consider every
| possible game, just the games reachable within this set of
| moves.
|
| Solving chess would be proving that "From starting position
| black can win every time" (or the same for white). You don't
| need to prove how to win from every possible board state.
| pretendscholar wrote:
| My money is on perfect play drawing every time.
| kevincox wrote:
| It would be interesting to take bets on the solution :)
| wizzwizz4 wrote:
| But you won't find a set of moves. You have to find a
| _tree_ of moves, which is substantially larger.
| kevincox wrote:
| This is true. It is still a massive problem space but my
| point is that you don't need to consider every possible
| game or state.
| wizzwizz4 wrote:
| Not to _prove_ it always-winning; you only have to go
| down every possible branch for the other player for that
| (i.e. [?] of the number of states you 'd otherwise need).
| To _find_ it, though? Well, unless you hit lucky, you 're
| going to be considering pretty much every possible game
| or state, except where you can find shortcuts.
| kevincox wrote:
| It is a bit more than that as it is unlikely that the
| always win sequence would be a fixed list of moves. You
| would need to show that for you each move the other
| player makes you have a reaction that also wins.
| danielbarla wrote:
| I agree, and kind of included this in the "so, unless you
| had some way of dramatically compressing or simplifying
| this search space..." disclaimer.
|
| Having said this, the nature of many chess endgames
| suggests that such a proof is not really possible, or at
| least would not be "simple". As an example, tempo /
| opposition flips games from draws to wins, etc.
| zone411 wrote:
| No, it's not close to being solved. I made a new estimate
| recently at there are about 8.7E+45 unique positions:
| https://github.com/lechmazur/ChessCounter. You don't need to
| analyze all of them to generate the proof but it's still far
| beyond our computational and storage capacities to solve chess.
| altvali wrote:
| For comparison, there are about 10E+49 atoms making up our
| planet.
| T-hawk wrote:
| You're correct, in that given a board state the right move can
| always be exhaustively computed, because the game has no
| randomness or hidden information. But enough computing power
| doesn't exist on the planet to iterate through all the possible
| states.
|
| The word you're looking for is "deterministic", rather than
| solved. Solved is usually used to mean the exhaustive
| computation _has_ been done. Deterministic would mean it
| _could_ be if enough computing power existed, but for chess it
| doesn 't.
| tspduck wrote:
| I have to disagree with the others. Since AlphaZero I think
| chess is closed to being solved. Technically it is not, but
| practically is is.
|
| Chess is a hard game to solve completely, and I think one
| reason is that there are many states in chess where there is
| not one superior move, but only a probable optimal move, in the
| sense that the game-tree for winning against the move is
| smaller than other moves. Then the agent/player has to guess
| what the opponents strategy is given that move, and that
| depends on the opponent.
|
| I plays are truly creative, and its results speak for itself.
|
| From wiki: "In AlphaZero's chess match against Stockfish 8
| (2016 TCEC world champion), each program was given one minute
| per move. Stockfish was allocated 64 threads and a hash size of
| 1 GB,[1] a setting that Stockfish's Tord Romstad later
| criticized as suboptimal. AlphaZero was trained on chess for a
| total of nine hours before the match. During the match,
| AlphaZero ran on a single machine with four application-
| specific TPUs. In 100 games from the normal starting position,
| AlphaZero won 25 games as White, won 3 as Black, and drew the
| remaining 72. In a series of twelve, 100-game matches (of
| unspecified time or resource constraints) against Stockfish
| starting from the 12 most popular human openings, AlphaZero won
| 290, drew 886 and lost 24."
|
| source: https://en.wikipedia.org/wiki/AlphaZero#Chess
| zone411 wrote:
| You're making a jump from "AlphaZero plays better than the
| best programs of its time" to "chess is close to being
| solved" but that's not justified without much more evidence.
| There are still a lot of improvements and new ideas added to
| the top programs and even if they reach a plateau, you'd have
| to show that increasing time per move 100x doesn't result in
| a significant improvement in playing strength. We don't know
| what would be the Elo difference between a 32-piece tablebase
| and the best existing program but I don't think anybody would
| bet that it's less than 200 Elo.
|
| Here is a graph of Stockfish's progress since the end of
| 2015: https://camo.githubusercontent.com/f169f774996346ad146f
| 96f74.... Self-play exaggerates strength differences but it's
| been a very good rate of progress, even without hardware
| improvements in the meantime.
| tialaramex wrote:
| Right!
|
| A _solved_ game is categorically different from merely
| teaching a machine to be better at it than humans are
| presently.
|
| Cepheus [0] is approximately a solution to Heads-Up (two
| player) Limit (ie the amount you can bet is specified in
| the game, you don't get to just bet arbitrary money) Texas
| Hold 'em poker.
|
| In contrast Libratus is an AI that plays Heads-Up _No
| Limit_ Hold 'em very well against humans.
|
| You can examine the Cepheus strategy for yourself, if you
| could memorize it (it's too complicated) and were capable
| of making truly random decisions (Poker is a game of chance
| and so your strategy needs random action) you could
| reproduce it and be exactly as good at the game as Cepheus
| is. You can examine individual strategy elements and reason
| about them. For example if Cepheus gets an 8 and a 3 and
| you open with a bet, it will call your bet if they're the
| same suit, otherwise it will fold. On the other hand if
| those suited cards were an 8 and a Queen, it would raise a
| bit more than nine times out of ten.
|
| You can't do anything about it (you might be thinking
| surely knowing that e.g. Cepheus folds 8-3 off here is
| valuable so I can benefit from that, er, no, "hiding" that
| by sometimes playing it loses more money than Cepheus gives
| up by folding it that's why this is a perfect strategy), if
| playing Cepheus, even with an incrementally "more"
| approximate solution you're not going to win a significant
| amount of money reliably in reasonable time, that's why
| it's approximately solved.
|
| But Libratus isn't like that, it's playing some strategy
| that we know beat world class human players, but a further
| incrementally better AI might crush Libratus just as badly.
|
| [0] http://poker.srv.ualberta.ca/preflop
| kohlerm wrote:
| No it is not solved. E.g. SFs recent inclusion of a NN has
| shown that it can beat Leela (which should be superior to
| AlphaZero).
| hiq wrote:
| > there are many states in chess where there is not one
| superior move, but only a probable optimal move
|
| I don't think this statement is true given [0], but maybe I
| understood you wrong.
|
| [0]: https://en.wikipedia.org/wiki/Zermelo%27s_theorem_(game_
| theo...
| magicalhippo wrote:
| I think you misunderstood, and that the point was that in
| chess, especially during the mid-game phase, there's very
| often not a single clearly superior move but rather several
| moves that apparently improves your position by roughly the
| same amount.
|
| The uncertainty is due to the inability to scan the entire
| sub-tree for each move.
|
| The theorem you linked to would only be applicable in
| certain endgame situations.
| V-2 wrote:
| The same could be argued for tic-tac-toe (which is
| obviously and trivially solved). There are many positions
| in which there's more than one move that is perfectly
| playable.
| hiq wrote:
| > there's very often not a single clearly superior move
|
| Maybe it's not clear to human players or current engines,
| but that does not mean it doesn't exist, and indeed the
| theorem mentioned above states that at least one player
| has an optimal strategy (either forcing a draw or
| winning). Obviously, that does not necessarily mean that
| we'll ever be able to design an engine that can compute
| this strategy in a reasonable time.
|
| I just think that OP meant something other than the usual
| definition of "solved", maybe they meant that engines are
| plateauing, but as others have pointed out, there is also
| little evidence for that for now.
| magicalhippo wrote:
| I think I got a big hung up on the following from
| Wikipedia:
|
| _It says that if the game cannot end in a draw, then one
| of the two players must have a winning strategy (i.e.
| force a win)._
|
| It made me think it didn't apply in situations where the
| other player could force a draw.
|
| Reading the linked article[1] though made it more clear
| what it's about, and I agree with what you said.
|
| [1]:
| http://www.math.harvard.edu/~elkies/FS23j.03/zermelo.pdf
| benlivengood wrote:
| > The uncertainty is due to the inability to scan the
| entire sub-tree for each move.
|
| That uncertainty is specifically why chess is unsolved.
| magicalhippo wrote:
| Indeed, I was just being a bit pedantic by pointing it
| out.
|
| edit: so to make it crystal clear, I didn't try to argue
| chess was solved, just to point out why I think that
| theorem doesn't really apply to most of chess.
| LittlePeter wrote:
| Leela played stockfish 200 games and won with 106 - 94 [1]. Not
| sure which version of stockfish was used.
|
| Some of the Leela-Stockfish games are analyzed by agadmator on
| YouTube [2].
|
| [1] https://www.chess.com/news/view/13th-computer-chess-
| champion...
|
| [2] https://www.youtube.com/watch?v=YtXZjKItuC8
| zone411 wrote:
| This match was played before the NNUE version of Stockfish was
| introduced. Stockfish NNUE beat LC0 in TCEC season 19:
| https://www.chessprogramming.org/TCEC_Season_19
| thomasahle wrote:
| The chess.com version is the old Stockfish from mid 2020. The
| NUE architecture was only put in place around August. If you
| see https://github.com/glinscott/fishtest/wiki/Regression-Tests
| you'll notice that gave a very significant (100+ ELO) boost.
|
| There is a current tournament going on at tcec-chess.com/ which
| stockfish has been leading so far, but I see Leela has just
| caught up in the head to head.
|
| Of course Leela also keeps evolving.
| kohlerm wrote:
| FYI Leela is in the lead ATM
| dmurray wrote:
| It's difficult to organise Leela-Stockfish as a fair fight,
| because they run on different hardware (CPU vs GPU) and both
| get substantial improvements by playing on better hardware.
|
| Traditionally this wasn't a big problem as every engine was
| more or less optimised for a fast Intel CPU with a moderate to
| large amount of RAM. The organisers would decree the specs of
| the championship hardware some time in advance. Now, (at least
| for TCEC, the other major engine tournament) they pick two
| hardware configurations, one CPU-heavy and one GPU-heavy, and
| give each team the choice.
|
| How do you balance those? It's been suggested you should make
| them equal in terms of watts of power, or dollar cost to buy,
| but neither of those are obviously best. In practice the TCEC
| organisers pick something close to what they picked last time
| but shade it against the winning engines, making the contest
| more even. Chess.com likely do something similar though they're
| less rigorous about the details.
| criddell wrote:
| > It's been suggested you should make them equal in terms of
| watts of power, or dollar cost to buy, but neither of those
| are obviously best.
|
| Do they give these computers a rating like they do human
| players?
| dmurray wrote:
| Yes, there are several computer chess rating lists, but you
| run into the same problems once you want to compare a CPU
| engine with a GPU one: how do you account for the hardware?
| Really a rating, as a measure of playing strength, should
| apply to a hardware/software combination, but we almost
| always want to compare the software. The most respected
| lists used to be the SSDF [0] and CCRL [1] lists but they
| both ignore the GPU issue. There's now the CEGT list [2]
| which I don't know anything about, but does seem to use
| heterogenous hardware.
|
| Note that the numbers aren't calibrated to human players,
| so there's no claim that an engine with a rating of 3100 on
| some particular hardware should score 85% against a human
| rated 2800, or whatever.
|
| [0] http://ssdf.bosjo.net/list.htm [1]
| https://ccrl.chessdom.com/ccrl/404FRC/ [2] http://www.cegt.
| net/40_40%20Rating%20List/40_40%20SingleVers...
| dan-robertson wrote:
| It seems that the strategy is to use the neural network to score
| various moves and then a search strategy to try to find moves
| that result in a favourable score. And this post focuses on some
| of the technical engineering details to design such a scoring
| network. In particular the scoring is split into two parts: 1. A
| matrix multiplication by a sparse input vector to get a dense
| representation of a position, and 2. Some nonlinear and further
| layers after this first step. And it seems that step 1 is
| considered more expensive.
|
| The way this is made cheap is by making it incremental: given
| some board s and output of the first layer b + Ws, it is cheap to
| compute b + Wt where a t is a board that is similar to s (the
| difference is W(t-s) but the vector t-s is 0 in almost every
| element.)
|
| This motivates some of the engineering choices like using
| integers instead of floats. If you used floats then this
| incremental update wouldn't work.
|
| It seems to me that a lot of the smarts of stockfish will be in
| the search algorithm getting good results, but I don't know if
| that just requires a bit of parallelism (surely some kind of
| work-stealing scheduler) and brute force or if it mostly relies
| on some more clever strategies. And maybe I'm wrong and the key
| is really in the scoring of positions.
| eutectic wrote:
| I don't think the integer weights are neccessary for sparsity;
| they are just faster because they allow for low precision.
|
| Of course floats aren't strictly associative so you wouldn't
| get bitwise equivalence between the incremental and non-
| incremental updates, but I don't see how that would matter in
| this context.
| dan-robertson wrote:
| The point is that integer weights allow for incremental
| updates to be correct. If you used floats then they would
| drift away from correct as you applied more incremental
| updates. And neural networks can be quite sensitive to small
| errors
| eutectic wrote:
| With 32 bits of precision I don't see it being a big
| problem over a maximum of say 50 moves. Neural networks
| which are not overfit should be tolerant of a small amount
| of random (not crafted) noise.
| [deleted]
| recursive wrote:
| 64-bit floats have 53 bits of precision. 32-bit floats
| have 24.
|
| The rest goes into sign and exponent.
| dan-robertson wrote:
| Maybe the network uses integers for other reasons.
| Perhaps due to the history of the program. I don't know
| how the search works but I would imagine it would
| consider more than 50 moves with lots of backtracking and
| trying different possible moves.
| thomasahle wrote:
| > It seems that the strategy is to use the neural network to
| score various moves and then a search strategy to try to find
| moves that result in a favourable score
|
| The neural net is only for scoring positions, not moves. Check
| the article on Stockfish NNUE at the chess programming wiki:
| https://www.chessprogramming.org/Stockfish_NNUE
|
| There is a LOT of cleverness built into the stockfish search as
| well.
|
| In fact, while neural networks have now won the position
| evolution game, it is still an open discussion in the chess
| programming community if the Alpha Zero / Leela search (NN +
| Monte Carlo) is as good as Stockfish's PVS.
| ogogmad wrote:
| PVS?
| zone411 wrote:
| https://www.chessprogramming.org/Principal_Variation_Search
| zirkonit wrote:
| I presume, Principal Variation Search (https://www.chesspro
| gramming.org/Principal_Variation_Search)
| brilee wrote:
| Ironically, a lot of the tricks Stockfish is using here are
| reminiscent of tricks that were used in the original AlphaGo and
| later discarded in AlphaGoZero.
|
| In particular, the AlphaGo paper mentioned four neural networks
| of significance:
|
| - a policy network trained on human pro games. - a RL-enhanced
| policy network improving on the original SL-trained policy
| network. - a value network trained on games generated by the RL-
| enhanced policy network - a cheap policy network trained on human
| pro games, used only for rapid rollout simulations.
|
| The cheap rollout policy network was discarded because DeepMind
| found that a "slow evaluations of the right positions" was better
| than "rapid evaluations of questionable positions". The
| independently trained value network was discarded because co-
| training a value and policy head on a shared trunk saved a
| significant amount of compute, and helped regularize both
| objectives against each other. The RL-enhanced policy network was
| discarded in favor of training the policy network to directly
| replicate MCTS search statistics.
|
| The depth and branching factor in chess and Go are different, so
| I won't say the solutions ought to be the same, but it's
| interesting nonetheless to see the original AlphaGo ideas be
| resurrected in this form.
|
| The incremental updates are also related to Zobrist Hashing,
| which the Stockfish authors are certainly aware of.
| mattalex wrote:
| It's also different because Stockfish uses Alpha-beta
| treesearch instead of MCTS: MCTS tries to deal with an
| explosion in search-space by only sampling very small parts of
| the searchspace and relying on a very good heuristic to guid
| that search process. In this case it is crucial to find the
| most relevant subset of the tree to explore it, so spending
| more time on your policy makes sense.
|
| Alpha-beta pruning however always explores the entire
| searchtree systematically (up to a certain depth using itd) and
| prunes the searchtree by discarding bad moves. In this case you
| don't need as good of an evaluation function because you search
| the entire tree anyways. Rather you need the function to be
| fast, as it is evaluated on many more states.
|
| In general AB-pruning only needs the heuristic for estimating
| the tail-end of the tree and for sorting the states based on
| usefulness, while MCTS uses all the above plus guiding the
| whole search process. Spending tons of time on the heuristic is
| not useful as even the optimal search order can only double
| your searchdepth. Don't get me wrong that still a lot
| (especially considering exponential blowup) but MCTS can
| surpass this depth easily. The disadvantage is that MCTS loses
| a lot of the guarantees of AB-pruning and tends to "play down
| to his opponent" when trained using self-play because the
| exploration order is entirely determined by the policy.
| pcwelder wrote:
| Using a previous Stockfish scorer they trained an NN without any
| labelling effort. This is also similar to how unsupervised
| translation is done in some methods. They start from word->word
| dictionary results and iteratively train lang1->lang2 and
| lang2->lang1 models feeding on each other's output.
| glinscott wrote:
| If anyone wants to experiment with training these nets, it's a
| great way to get exposed to a nice mix of chess and machine
| learning.
|
| There are two trainers currently, the original one, which runs on
| CPU: https://github.com/nodchip/Stockfish, and a pytorch one
| which runs on GPU: https://github.com/glinscott/nnue-pytorch.
|
| The SF Discord is where all of the discussion/development is
| happening: https://discord.gg/KGfhSJd.
|
| Right now there is a lot of experimentation to try adjusting the
| network architecture. The current leading approach is a much
| larger net which takes in attack information per square (eg. is
| this piece attacked by more pieces than it's defended by?). That
| network is a little slower, but the additional information seems
| to be enough to be stronger than the current architecture.
|
| Btw, the original Shogi developers really did something amazing.
| The nodchip trainer is all custom code, and trains extremely
| strong nets. There are all sorts of subtle tricks embedded in
| there as well that led to stronger nets. Not to mention, getting
| the quantization (float32 -> int16/int8) working gracefully is a
| huge challenge.
| pk2200 wrote:
| Just wanted to say thanks for many years of fantastic work on
| both Stockfish and Leela. The computer chess community owes you
| a huge debt of gratitude!
___________________________________________________________________
(page generated 2021-01-13 23:01 UTC)