[HN Gopher] The neural network of the Stockfish chess engine
       ___________________________________________________________________
        
       The neural network of the Stockfish chess engine
        
       Author : c1ccccc1
       Score  : 328 points
       Date   : 2021-01-13 08:01 UTC (14 hours ago)
        
 (HTM) web link (cp4space.hatsya.com)
 (TXT) w3m dump (cp4space.hatsya.com)
        
       | billiam wrote:
       | Analyses like this are a great indication that the main effect of
       | refining neural networks around chess will make neural networks
       | more exciting and ultimately make chess more boring.
        
         | gelert wrote:
         | Why do you believe that neural networks being better at chess
         | would make it more boring? Genuinely I just don't follow the
         | logic.
        
       | zetazzed wrote:
       | The gap between white and black piece performance is massive in
       | these top engines if I'm reading it right. LCZero won 0/ lost 4
       | as black and won 24 / lost zero as white (with lots of draws)? I
       | had no idea the split was so big now. Do human tournaments look
       | like this too these days? (From https://tcec-chess.com/)
        
         | bonzini wrote:
         | Not really---similar for sure, but not as bad for black.
         | Consider that computer chess tournaments do not start with the
         | pieces in their initial positions. The first 5-15 moves are
         | predetermined by human master players to achieve positions that
         | are imbalanced while not having a side with a clearer
         | advantage. Otherwise it would be even more of a draw fest.
         | 
         | This makes things a bit worse for black, typically, because
         | playing for the win as black means making some positional
         | concession (that white can exploit after successfully
         | defending) in exchange for the initiative and a chance for an
         | attack.
         | 
         | In human play, black has a little more ability than white to
         | steer the game towards their opening preparation[1]. If they
         | want to win, they will try to get have a position that they
         | know in and out from (computer-assisted) home preparation. But
         | there are still a lot of draws in classical (many hours per
         | game) chess.
         | 
         | [1] Chess openings that white chooses are typically called
         | "attack" (King's Indian Attack) or "game" (Scotch Game), while
         | those that black chooses are called "defense" (Sicilian
         | defense). You'll find that there are a lot more "defenses" that
         | black can choose from than "attacks".
        
           | tayo42 wrote:
           | > if they want to win, they will try
           | 
           | Why would someone not want to win?
        
             | Germanika wrote:
             | Chess matches usually involve multiple games, so you don't
             | need to win every game to win the match. It's not uncommon
             | to play trying to avoid a loss (going for a draw), instead
             | of playing riskier moves in an attempt to force a win.
        
             | bonzini wrote:
             | Because they may be content with drawing as black and seek
             | an (ever so slightly) easier win as white. It depends on
             | many factors: relative player strength, duration of the
             | match, how players are paired in tournament rounds, current
             | standings in the tournament, and so on. A computer won't do
             | this kind of reasoning.
        
         | dsjoerg wrote:
         | Overwhelmingly the result is a draw. White can win sometimes,
         | and wins for black are very rare. It's similar for top
         | grandmasters, but not quite as stark.
        
       | knuthsat wrote:
       | This is very nice. Reminds me a lot of tricks used for simple
       | linear models. And it seems to work, given that Leela Chess Zero
       | is losing with quite a gap.
       | 
       | Most of the times one would learn a model by just changing a
       | single feature and then doing the whole sum made no sense.
       | 
       | A good example is learning a sequence of decisions, where each
       | decision might have a cost associated to it, you can then say
       | that the current decision depends on a previous one and vary the
       | previous one to learn to recover from errors. If previous
       | decision was bad, then you'd still like to make the best decision
       | for current state.).
       | 
       | So even if your training data does not have this error-recovery
       | example, you can just iterate through all previous decisions and
       | make the model learn to recover.
       | 
       | An optimization in that case would be to just not redo the whole
       | sum (for computing the decision function of a linear model).
        
         | kohlerm wrote:
         | "Leela Chess Zero is losing with quite a gap" Check
         | https://tcec-chess.com/ ATM Leela is in front again. That being
         | said it is not clear which approach is better e.g. a very smart
         | but relatively slow evaluation function (Leela) or SFs approach
         | to use a relatively dumb eval function with a more
         | sophisticated search. It is pretty clear that Leelas search can
         | be improved (Check https://github.com/dje-dev/Ceres for
         | example)
        
           | knuthsat wrote:
           | Looks like the tournament is still ongoing. I was referring
           | to the last few.
           | 
           | https://en.wikipedia.org/wiki/TCEC_Season_18
           | 
           | https://en.wikipedia.org/wiki/TCEC_Season_19
        
           | kohlerm wrote:
           | That being said, assuming you can fully saturate multiple
           | GPUs then the SF approach has the disadvantage that it cannot
           | use the performance improvements for GPUs/TPUs which still
           | are growing fast, whereas CPU performance only grows slowly
           | these days.
        
             | stabbles wrote:
             | The Stockfish mentality is: not everybody owns a GPU and
             | Stockfish should be available for everybody and perform
             | well [1]. So they went for a CPU micro-arch optimized
             | neural net, which is great. Maybe this is ultimately not
             | the best for tournaments.
             | 
             | What I would find interesting is if they could give engines
             | an energy budget instead of a time limit. Maybe that makes
             | CPU vs GPU games more interesting & fair.
             | 
             | [1] https://github.com/official-
             | stockfish/Stockfish/issues/2823
        
               | yters wrote:
               | They should also give an energy budget of human vs
               | computer.
        
               | kohlerm wrote:
               | IMHO they did this because that was the only incremental
               | way to improve strength. They always relied an a fast
               | search with a relatively simplistic eval function. For
               | this approach alpha beta search works well. What is not
               | clear whether it works well for the Leela approach. All
               | attempts so far at least have not been successful(could
               | be for other reasons, such as training approach). SF
               | strength seems to be their well optimized search
               | function. Rewriting that would IMHO be equivalent to
               | creating a new chess engine. That being said I still
               | think that the success of SF NNUE is very remarkable.
        
             | nl wrote:
             | Generally the GPU vs CPU gap on the evaluation side isn't
             | nearly as big as in training. In theory the gap should
             | remain large, but in practice things like the inability to
             | have data ready for batch evaluation means it is harder to
             | saturate the GPU (which as you note is important).
             | 
             | But I'm not sure how much of this applies to chess engines.
             | I see some comments noting that the search part makes it
             | hard to generate batches, but I'm not an expert in this.
        
             | confuseshrink wrote:
             | Interesting point. Nvidia have been improving the int
             | performance for quantized inference on their GPUs a lot. It
             | might be a lot of work but could it be possible to scale up
             | this NNUE approach to the point where it would be
             | worthwhile to run on a GPU?
             | 
             | For single-input "batches" (seems like this is what's being
             | used now?) it might never be worthwhile but perhaps if
             | multiple positions could be searched in parallel and the NN
             | evaluation batched this might start to look tempting?
             | 
             | Not sure what the effect of running PVS with multiple
             | parallel search threads is. Presumably the payoff of
             | searching with less information means you reach the
             | performance ceiling quite a lot quicker than MCTS-like
             | searches as the pruning is a lot more sensitive to having
             | up-to-date information about the principal variation.
             | 
             | Disclaimer: My understanding of PVS is very limited.
        
               | kohlerm wrote:
               | Sure if someone can come up with an approach to run an
               | NNUE (efficiently updatable) network on GPUs that might
               | really be another breakthrough. But at a first glance it
               | looks to me that this could be very difficult. Because
               | AFAIK the SF search is relatively complicated. Even for
               | Leela implementing an efficient batched search on
               | multiple GPUs seems to be difficult (some improvements
               | coming with Ceres). And Leela is using a much simple MCTS
               | search. That doesn't that Leela's search could not be
               | improved. It does not give higher priority necessarily
               | for forced sequence of moves (at least not explicitly) or
               | high risk moves. Which is IMHO why sometimes she does not
               | see relatively simple tactics.
        
       | spiantino wrote:
       | Great writeup!
       | 
       | One thing I don't understand is why it would be smarter to
       | augment the inputs with some of the missing board information -
       | particularly the availability of castling. Even though this
       | network is a quick-and-dirty evaluation, seems like there's room
       | for a few additional bits in the input and being able to castle
       | very much changes the evaluation of some common positions.
        
         | mattnewton wrote:
         | I am not an expert (and don't have any idea what I am talking
         | about), but wouldn't this be captured the same way other future
         | positional advantages are when evaluating the next "level" of
         | possible board positions? Ie, the fact that a piece can later
         | give check, or castle, or anything else if you first move it
         | here is a function of that next move's resulting position
         | score.
         | 
         | This network is just to signal to the final search algorithm
         | how good the board looks in a given spot on tree of possible
         | moves.
        
       | Fragoel2 wrote:
       | I don't know a lot about chess but I have one question: isn't
       | chess a solved game (in the sense that given a board state we
       | always can compute the right move)? why use a neural network that
       | can introduce, a very small percentage of the time, mistakes? I
       | guess it is for performance reasons?
        
         | aliceryhl wrote:
         | No, it is not solved. To solve it, you would need to
         | investigate all possible continuations of the game, all the way
         | until the game ends, but there are so many continuations that
         | this is impossible.
        
           | nullc wrote:
           | A game could be solved even if its state space was too large
           | to enumerate, for example if you could show that there was
           | some latent structure that let you prove things about large
           | parts of the state space without completely exploring it.
           | 
           | This is, for example, why we have a concrete number for the
           | number of legal go positions even though there are far too
           | many to enumerate. https://en.wikipedia.org/wiki/Go_and_mathe
           | matics#Legal_posit...
           | 
           | The rules of chess are so irregular that it seems likely that
           | any latent structure would be tremendously complex, but you
           | never know what some clever new analysis technique will do.
        
         | hiq wrote:
         | https://en.wikipedia.org/wiki/Solving_chess
        
         | elcomet wrote:
         | Chess is definitely not a solved game. The end game is solved
         | though, when you have only 7 pieces left on the board.
         | 
         | See
         | https://en.wikipedia.org/wiki/Endgame_tablebase#Computer_che...
        
         | mvanaltvorst wrote:
         | You can imagine playing chess as a tree. You start with a game
         | state, and every possible move is an edge to a new game state.
         | Unfortunately, the amount of game states grows approximately
         | exponentially (e.g. for every game state there are
         | approximately 40 moves to play, thus every extra move
         | multiplies #(game states) by 40, approximately). This neural
         | engine is a trick so that Stockfish does not have to simulate
         | all 40 moves every game states. The neural net outputs the
         | moves that are most likely to be strong moves, and Stockfish
         | will only consider those moves. Of course, this is a very basic
         | explanation and there are many more optimisations Stockfish
         | uses, though the ultimate goal of almost every optimisation is
         | to reduce the amount of simulation Stockfish has to do.
        
           | V-2 wrote:
           | _" This neural engine is a trick so that Stockfish does not
           | have to simulate all 40 moves every game states."_
           | 
           | This optimization _always_ done by chess engines, it 's
           | called pruning (they'd be quite crippled without it).
           | 
           | Maybe the neural network component is now in charge of it,
           | but it's not a new thing.
        
             | mvanaltvorst wrote:
             | Of course, I had accumulated all other pruning strategies
             | under the "other optimisations".
        
           | blt wrote:
           | From the article:
           | 
           | > _it's a simple feedforward network with... a single scalar
           | output to give a numerical score for the position, indicating
           | how favourable it is for the player about to move._
           | 
           | The search algorithm proposes moves, computes the board state
           | resulting from them, and evaluates each state.
           | 
           | This may seem like a trivial distinction, but it's important.
           | The Stockfish method requires knowing the rules of the game;
           | networks that output a move directly do not.
        
           | bbojan wrote:
           | >> The neural net outputs the moves that are most likely to
           | be strong moves, and Stockfish will only consider those
           | moves.
           | 
           | I don't think this is correct. According to the article, they
           | are only using the neural network for evaluating the value
           | (strength) of the positions. The search is still the same.
           | 
           | I believe Leela works in the manner you described.
        
         | dan-robertson wrote:
         | Chess is not a solved game (though I think the stalemate rules
         | make it finite, there are far too many positions). All computer
         | chess programs have to try to determine if positions will be
         | good without playing out all possible ways the game could go.
         | Usually this is some combination of search strategies and
         | scoring positions (the article describes how stockfish scores
         | positions).
        
           | V-2 wrote:
           | I agree it's finite, but why do you think it's specifically
           | owed to the stalemate rule, of all things?
        
             | elcomet wrote:
             | I believe it's the repetition rule that makes the game
             | finite, not the stalemate rule, since number of positions
             | is finite.
             | 
             | There is also the 50-move rule of course which limits more
             | drastically the length of a game.
        
               | dan-robertson wrote:
               | I think this is what I was referring to.
        
               | V-2 wrote:
               | OK, so that's not a stalemate. A stalemate is a situation
               | in which one of the players has no legal moves (but isn't
               | in check, which would be checkmate).
               | 
               | It's a typical mistake to equate chess draws with a
               | stalemate, possibly due to the broader, idiomatic meaning
               | of "stalemate" in common speech.
        
               | V-2 wrote:
               | Right, that would make more sense to me. Although it
               | should be stressed that neither threefold repetition nor
               | the 50-move rule work automatically. They need to be
               | claimed by one of the players. If neither player does,
               | the game can go on.
        
               | jstanley wrote:
               | The 75-move rule does work automatically though, so the
               | game is still finite.
        
               | V-2 wrote:
               | Frankly, I wasn't aware of the 75-move rule! This indeed
               | caps the length of possible games.
               | 
               | The threefold repetition rule doesn't have an equivalent
               | (there's no, say, "fivefold repetition" draw that could
               | be enforced despite the players' will), but it's really
               | irrelevant from the perspective of solving chess, because
               | there's no point in analyzing what happens if game goes
               | through the same loop 1, 10 or 100 times.
               | 
               | And the 75-move rule sets a hard limit anyway, so you
               | couldn't loop infinitely, even if it mattered.
        
               | anandoza wrote:
               | Lichess.org actually does have a 5fold repetition rule
               | that applies without players claiming it --
               | https://lichess.org/faq#threefold
               | 
               | Kind of funny that your hypothetical example was exactly
               | reality in this case.
        
               | harryposner wrote:
               | > there's no, say, "fivefold repetition" draw that could
               | be enforced despite the players' will
               | 
               | There is exactly such a rule. See the FIDE Laws of Chess
               | [0], rule 9.6.1.
               | 
               | [0] https://handbook.fide.com/chapter/E012018
        
         | danielbarla wrote:
         | Chess is estimated to have 10^120 possible games. Observable
         | universe is estimated to have between 10^78 to 10^82 atoms. So,
         | unless you had some way of dramatically compressing or
         | simplifying this search space, you'd be needing to store 38
         | bits+ information, if you had every atom in the observable
         | universe as your memory. Suffice to say, this has not yet been
         | achieved.
        
           | altvali wrote:
           | You don't need to care about possible games, but game states,
           | what's the best result and the move to achieve it in each.
           | Another user computed an upper bound at 8.7E+45 positions:
           | https://github.com/lechmazur/ChessCounter . I pointed out
           | that our planet has about 1E+50 atoms. And we can further
           | compress the database by perhaps two orders of magnitude if
           | we don't store symmetric positions or positions close to
           | mate. A Kardashev 2 civilization could play perfect chess.
        
             | goatlover wrote:
             | > A Kardashev 2 civilization could play perfect chess.
             | 
             | Assuming they would want to waste part of a planet's worth
             | of computing resources to play perfect chess. In any case,
             | it's far beyond anything we can compute.
        
           | kevincox wrote:
           | You don't need to consider every state individually to solve
           | a game. For example if you have a game on a 100x100 board and
           | the only goal is to move your piece to the other side you can
           | know you have found the optimal solution if you can do it in
           | 99 moves (assuming that you move 1 per turn).
           | 
           | In the chess scenario if you can prove that there is a set of
           | moves where you win no matter what the other colour does then
           | you have solved the game. You don't need to consider every
           | possible game, just the games reachable within this set of
           | moves.
           | 
           | Solving chess would be proving that "From starting position
           | black can win every time" (or the same for white). You don't
           | need to prove how to win from every possible board state.
        
             | pretendscholar wrote:
             | My money is on perfect play drawing every time.
        
               | kevincox wrote:
               | It would be interesting to take bets on the solution :)
        
             | wizzwizz4 wrote:
             | But you won't find a set of moves. You have to find a
             | _tree_ of moves, which is substantially larger.
        
               | kevincox wrote:
               | This is true. It is still a massive problem space but my
               | point is that you don't need to consider every possible
               | game or state.
        
               | wizzwizz4 wrote:
               | Not to _prove_ it always-winning; you only have to go
               | down every possible branch for the other player for that
               | (i.e. [?] of the number of states you 'd otherwise need).
               | To _find_ it, though? Well, unless you hit lucky, you 're
               | going to be considering pretty much every possible game
               | or state, except where you can find shortcuts.
        
               | kevincox wrote:
               | It is a bit more than that as it is unlikely that the
               | always win sequence would be a fixed list of moves. You
               | would need to show that for you each move the other
               | player makes you have a reaction that also wins.
        
             | danielbarla wrote:
             | I agree, and kind of included this in the "so, unless you
             | had some way of dramatically compressing or simplifying
             | this search space..." disclaimer.
             | 
             | Having said this, the nature of many chess endgames
             | suggests that such a proof is not really possible, or at
             | least would not be "simple". As an example, tempo /
             | opposition flips games from draws to wins, etc.
        
         | zone411 wrote:
         | No, it's not close to being solved. I made a new estimate
         | recently at there are about 8.7E+45 unique positions:
         | https://github.com/lechmazur/ChessCounter. You don't need to
         | analyze all of them to generate the proof but it's still far
         | beyond our computational and storage capacities to solve chess.
        
           | altvali wrote:
           | For comparison, there are about 10E+49 atoms making up our
           | planet.
        
         | T-hawk wrote:
         | You're correct, in that given a board state the right move can
         | always be exhaustively computed, because the game has no
         | randomness or hidden information. But enough computing power
         | doesn't exist on the planet to iterate through all the possible
         | states.
         | 
         | The word you're looking for is "deterministic", rather than
         | solved. Solved is usually used to mean the exhaustive
         | computation _has_ been done. Deterministic would mean it
         | _could_ be if enough computing power existed, but for chess it
         | doesn 't.
        
         | tspduck wrote:
         | I have to disagree with the others. Since AlphaZero I think
         | chess is closed to being solved. Technically it is not, but
         | practically is is.
         | 
         | Chess is a hard game to solve completely, and I think one
         | reason is that there are many states in chess where there is
         | not one superior move, but only a probable optimal move, in the
         | sense that the game-tree for winning against the move is
         | smaller than other moves. Then the agent/player has to guess
         | what the opponents strategy is given that move, and that
         | depends on the opponent.
         | 
         | I plays are truly creative, and its results speak for itself.
         | 
         | From wiki: "In AlphaZero's chess match against Stockfish 8
         | (2016 TCEC world champion), each program was given one minute
         | per move. Stockfish was allocated 64 threads and a hash size of
         | 1 GB,[1] a setting that Stockfish's Tord Romstad later
         | criticized as suboptimal. AlphaZero was trained on chess for a
         | total of nine hours before the match. During the match,
         | AlphaZero ran on a single machine with four application-
         | specific TPUs. In 100 games from the normal starting position,
         | AlphaZero won 25 games as White, won 3 as Black, and drew the
         | remaining 72. In a series of twelve, 100-game matches (of
         | unspecified time or resource constraints) against Stockfish
         | starting from the 12 most popular human openings, AlphaZero won
         | 290, drew 886 and lost 24."
         | 
         | source: https://en.wikipedia.org/wiki/AlphaZero#Chess
        
           | zone411 wrote:
           | You're making a jump from "AlphaZero plays better than the
           | best programs of its time" to "chess is close to being
           | solved" but that's not justified without much more evidence.
           | There are still a lot of improvements and new ideas added to
           | the top programs and even if they reach a plateau, you'd have
           | to show that increasing time per move 100x doesn't result in
           | a significant improvement in playing strength. We don't know
           | what would be the Elo difference between a 32-piece tablebase
           | and the best existing program but I don't think anybody would
           | bet that it's less than 200 Elo.
           | 
           | Here is a graph of Stockfish's progress since the end of
           | 2015: https://camo.githubusercontent.com/f169f774996346ad146f
           | 96f74.... Self-play exaggerates strength differences but it's
           | been a very good rate of progress, even without hardware
           | improvements in the meantime.
        
             | tialaramex wrote:
             | Right!
             | 
             | A _solved_ game is categorically different from merely
             | teaching a machine to be better at it than humans are
             | presently.
             | 
             | Cepheus [0] is approximately a solution to Heads-Up (two
             | player) Limit (ie the amount you can bet is specified in
             | the game, you don't get to just bet arbitrary money) Texas
             | Hold 'em poker.
             | 
             | In contrast Libratus is an AI that plays Heads-Up _No
             | Limit_ Hold  'em very well against humans.
             | 
             | You can examine the Cepheus strategy for yourself, if you
             | could memorize it (it's too complicated) and were capable
             | of making truly random decisions (Poker is a game of chance
             | and so your strategy needs random action) you could
             | reproduce it and be exactly as good at the game as Cepheus
             | is. You can examine individual strategy elements and reason
             | about them. For example if Cepheus gets an 8 and a 3 and
             | you open with a bet, it will call your bet if they're the
             | same suit, otherwise it will fold. On the other hand if
             | those suited cards were an 8 and a Queen, it would raise a
             | bit more than nine times out of ten.
             | 
             | You can't do anything about it (you might be thinking
             | surely knowing that e.g. Cepheus folds 8-3 off here is
             | valuable so I can benefit from that, er, no, "hiding" that
             | by sometimes playing it loses more money than Cepheus gives
             | up by folding it that's why this is a perfect strategy), if
             | playing Cepheus, even with an incrementally "more"
             | approximate solution you're not going to win a significant
             | amount of money reliably in reasonable time, that's why
             | it's approximately solved.
             | 
             | But Libratus isn't like that, it's playing some strategy
             | that we know beat world class human players, but a further
             | incrementally better AI might crush Libratus just as badly.
             | 
             | [0] http://poker.srv.ualberta.ca/preflop
        
           | kohlerm wrote:
           | No it is not solved. E.g. SFs recent inclusion of a NN has
           | shown that it can beat Leela (which should be superior to
           | AlphaZero).
        
           | hiq wrote:
           | > there are many states in chess where there is not one
           | superior move, but only a probable optimal move
           | 
           | I don't think this statement is true given [0], but maybe I
           | understood you wrong.
           | 
           | [0]: https://en.wikipedia.org/wiki/Zermelo%27s_theorem_(game_
           | theo...
        
             | magicalhippo wrote:
             | I think you misunderstood, and that the point was that in
             | chess, especially during the mid-game phase, there's very
             | often not a single clearly superior move but rather several
             | moves that apparently improves your position by roughly the
             | same amount.
             | 
             | The uncertainty is due to the inability to scan the entire
             | sub-tree for each move.
             | 
             | The theorem you linked to would only be applicable in
             | certain endgame situations.
        
               | V-2 wrote:
               | The same could be argued for tic-tac-toe (which is
               | obviously and trivially solved). There are many positions
               | in which there's more than one move that is perfectly
               | playable.
        
               | hiq wrote:
               | > there's very often not a single clearly superior move
               | 
               | Maybe it's not clear to human players or current engines,
               | but that does not mean it doesn't exist, and indeed the
               | theorem mentioned above states that at least one player
               | has an optimal strategy (either forcing a draw or
               | winning). Obviously, that does not necessarily mean that
               | we'll ever be able to design an engine that can compute
               | this strategy in a reasonable time.
               | 
               | I just think that OP meant something other than the usual
               | definition of "solved", maybe they meant that engines are
               | plateauing, but as others have pointed out, there is also
               | little evidence for that for now.
        
               | magicalhippo wrote:
               | I think I got a big hung up on the following from
               | Wikipedia:
               | 
               |  _It says that if the game cannot end in a draw, then one
               | of the two players must have a winning strategy (i.e.
               | force a win)._
               | 
               | It made me think it didn't apply in situations where the
               | other player could force a draw.
               | 
               | Reading the linked article[1] though made it more clear
               | what it's about, and I agree with what you said.
               | 
               | [1]:
               | http://www.math.harvard.edu/~elkies/FS23j.03/zermelo.pdf
        
               | benlivengood wrote:
               | > The uncertainty is due to the inability to scan the
               | entire sub-tree for each move.
               | 
               | That uncertainty is specifically why chess is unsolved.
        
               | magicalhippo wrote:
               | Indeed, I was just being a bit pedantic by pointing it
               | out.
               | 
               | edit: so to make it crystal clear, I didn't try to argue
               | chess was solved, just to point out why I think that
               | theorem doesn't really apply to most of chess.
        
       | LittlePeter wrote:
       | Leela played stockfish 200 games and won with 106 - 94 [1]. Not
       | sure which version of stockfish was used.
       | 
       | Some of the Leela-Stockfish games are analyzed by agadmator on
       | YouTube [2].
       | 
       | [1] https://www.chess.com/news/view/13th-computer-chess-
       | champion...
       | 
       | [2] https://www.youtube.com/watch?v=YtXZjKItuC8
        
         | zone411 wrote:
         | This match was played before the NNUE version of Stockfish was
         | introduced. Stockfish NNUE beat LC0 in TCEC season 19:
         | https://www.chessprogramming.org/TCEC_Season_19
        
         | thomasahle wrote:
         | The chess.com version is the old Stockfish from mid 2020. The
         | NUE architecture was only put in place around August. If you
         | see https://github.com/glinscott/fishtest/wiki/Regression-Tests
         | you'll notice that gave a very significant (100+ ELO) boost.
         | 
         | There is a current tournament going on at tcec-chess.com/ which
         | stockfish has been leading so far, but I see Leela has just
         | caught up in the head to head.
         | 
         | Of course Leela also keeps evolving.
        
           | kohlerm wrote:
           | FYI Leela is in the lead ATM
        
         | dmurray wrote:
         | It's difficult to organise Leela-Stockfish as a fair fight,
         | because they run on different hardware (CPU vs GPU) and both
         | get substantial improvements by playing on better hardware.
         | 
         | Traditionally this wasn't a big problem as every engine was
         | more or less optimised for a fast Intel CPU with a moderate to
         | large amount of RAM. The organisers would decree the specs of
         | the championship hardware some time in advance. Now, (at least
         | for TCEC, the other major engine tournament) they pick two
         | hardware configurations, one CPU-heavy and one GPU-heavy, and
         | give each team the choice.
         | 
         | How do you balance those? It's been suggested you should make
         | them equal in terms of watts of power, or dollar cost to buy,
         | but neither of those are obviously best. In practice the TCEC
         | organisers pick something close to what they picked last time
         | but shade it against the winning engines, making the contest
         | more even. Chess.com likely do something similar though they're
         | less rigorous about the details.
        
           | criddell wrote:
           | > It's been suggested you should make them equal in terms of
           | watts of power, or dollar cost to buy, but neither of those
           | are obviously best.
           | 
           | Do they give these computers a rating like they do human
           | players?
        
             | dmurray wrote:
             | Yes, there are several computer chess rating lists, but you
             | run into the same problems once you want to compare a CPU
             | engine with a GPU one: how do you account for the hardware?
             | Really a rating, as a measure of playing strength, should
             | apply to a hardware/software combination, but we almost
             | always want to compare the software. The most respected
             | lists used to be the SSDF [0] and CCRL [1] lists but they
             | both ignore the GPU issue. There's now the CEGT list [2]
             | which I don't know anything about, but does seem to use
             | heterogenous hardware.
             | 
             | Note that the numbers aren't calibrated to human players,
             | so there's no claim that an engine with a rating of 3100 on
             | some particular hardware should score 85% against a human
             | rated 2800, or whatever.
             | 
             | [0] http://ssdf.bosjo.net/list.htm [1]
             | https://ccrl.chessdom.com/ccrl/404FRC/ [2] http://www.cegt.
             | net/40_40%20Rating%20List/40_40%20SingleVers...
        
       | dan-robertson wrote:
       | It seems that the strategy is to use the neural network to score
       | various moves and then a search strategy to try to find moves
       | that result in a favourable score. And this post focuses on some
       | of the technical engineering details to design such a scoring
       | network. In particular the scoring is split into two parts: 1. A
       | matrix multiplication by a sparse input vector to get a dense
       | representation of a position, and 2. Some nonlinear and further
       | layers after this first step. And it seems that step 1 is
       | considered more expensive.
       | 
       | The way this is made cheap is by making it incremental: given
       | some board s and output of the first layer b + Ws, it is cheap to
       | compute b + Wt where a t is a board that is similar to s (the
       | difference is W(t-s) but the vector t-s is 0 in almost every
       | element.)
       | 
       | This motivates some of the engineering choices like using
       | integers instead of floats. If you used floats then this
       | incremental update wouldn't work.
       | 
       | It seems to me that a lot of the smarts of stockfish will be in
       | the search algorithm getting good results, but I don't know if
       | that just requires a bit of parallelism (surely some kind of
       | work-stealing scheduler) and brute force or if it mostly relies
       | on some more clever strategies. And maybe I'm wrong and the key
       | is really in the scoring of positions.
        
         | eutectic wrote:
         | I don't think the integer weights are neccessary for sparsity;
         | they are just faster because they allow for low precision.
         | 
         | Of course floats aren't strictly associative so you wouldn't
         | get bitwise equivalence between the incremental and non-
         | incremental updates, but I don't see how that would matter in
         | this context.
        
           | dan-robertson wrote:
           | The point is that integer weights allow for incremental
           | updates to be correct. If you used floats then they would
           | drift away from correct as you applied more incremental
           | updates. And neural networks can be quite sensitive to small
           | errors
        
             | eutectic wrote:
             | With 32 bits of precision I don't see it being a big
             | problem over a maximum of say 50 moves. Neural networks
             | which are not overfit should be tolerant of a small amount
             | of random (not crafted) noise.
        
               | [deleted]
        
               | recursive wrote:
               | 64-bit floats have 53 bits of precision. 32-bit floats
               | have 24.
               | 
               | The rest goes into sign and exponent.
        
               | dan-robertson wrote:
               | Maybe the network uses integers for other reasons.
               | Perhaps due to the history of the program. I don't know
               | how the search works but I would imagine it would
               | consider more than 50 moves with lots of backtracking and
               | trying different possible moves.
        
         | thomasahle wrote:
         | > It seems that the strategy is to use the neural network to
         | score various moves and then a search strategy to try to find
         | moves that result in a favourable score
         | 
         | The neural net is only for scoring positions, not moves. Check
         | the article on Stockfish NNUE at the chess programming wiki:
         | https://www.chessprogramming.org/Stockfish_NNUE
         | 
         | There is a LOT of cleverness built into the stockfish search as
         | well.
         | 
         | In fact, while neural networks have now won the position
         | evolution game, it is still an open discussion in the chess
         | programming community if the Alpha Zero / Leela search (NN +
         | Monte Carlo) is as good as Stockfish's PVS.
        
           | ogogmad wrote:
           | PVS?
        
             | zone411 wrote:
             | https://www.chessprogramming.org/Principal_Variation_Search
        
             | zirkonit wrote:
             | I presume, Principal Variation Search (https://www.chesspro
             | gramming.org/Principal_Variation_Search)
        
       | brilee wrote:
       | Ironically, a lot of the tricks Stockfish is using here are
       | reminiscent of tricks that were used in the original AlphaGo and
       | later discarded in AlphaGoZero.
       | 
       | In particular, the AlphaGo paper mentioned four neural networks
       | of significance:
       | 
       | - a policy network trained on human pro games. - a RL-enhanced
       | policy network improving on the original SL-trained policy
       | network. - a value network trained on games generated by the RL-
       | enhanced policy network - a cheap policy network trained on human
       | pro games, used only for rapid rollout simulations.
       | 
       | The cheap rollout policy network was discarded because DeepMind
       | found that a "slow evaluations of the right positions" was better
       | than "rapid evaluations of questionable positions". The
       | independently trained value network was discarded because co-
       | training a value and policy head on a shared trunk saved a
       | significant amount of compute, and helped regularize both
       | objectives against each other. The RL-enhanced policy network was
       | discarded in favor of training the policy network to directly
       | replicate MCTS search statistics.
       | 
       | The depth and branching factor in chess and Go are different, so
       | I won't say the solutions ought to be the same, but it's
       | interesting nonetheless to see the original AlphaGo ideas be
       | resurrected in this form.
       | 
       | The incremental updates are also related to Zobrist Hashing,
       | which the Stockfish authors are certainly aware of.
        
         | mattalex wrote:
         | It's also different because Stockfish uses Alpha-beta
         | treesearch instead of MCTS: MCTS tries to deal with an
         | explosion in search-space by only sampling very small parts of
         | the searchspace and relying on a very good heuristic to guid
         | that search process. In this case it is crucial to find the
         | most relevant subset of the tree to explore it, so spending
         | more time on your policy makes sense.
         | 
         | Alpha-beta pruning however always explores the entire
         | searchtree systematically (up to a certain depth using itd) and
         | prunes the searchtree by discarding bad moves. In this case you
         | don't need as good of an evaluation function because you search
         | the entire tree anyways. Rather you need the function to be
         | fast, as it is evaluated on many more states.
         | 
         | In general AB-pruning only needs the heuristic for estimating
         | the tail-end of the tree and for sorting the states based on
         | usefulness, while MCTS uses all the above plus guiding the
         | whole search process. Spending tons of time on the heuristic is
         | not useful as even the optimal search order can only double
         | your searchdepth. Don't get me wrong that still a lot
         | (especially considering exponential blowup) but MCTS can
         | surpass this depth easily. The disadvantage is that MCTS loses
         | a lot of the guarantees of AB-pruning and tends to "play down
         | to his opponent" when trained using self-play because the
         | exploration order is entirely determined by the policy.
        
       | pcwelder wrote:
       | Using a previous Stockfish scorer they trained an NN without any
       | labelling effort. This is also similar to how unsupervised
       | translation is done in some methods. They start from word->word
       | dictionary results and iteratively train lang1->lang2 and
       | lang2->lang1 models feeding on each other's output.
        
       | glinscott wrote:
       | If anyone wants to experiment with training these nets, it's a
       | great way to get exposed to a nice mix of chess and machine
       | learning.
       | 
       | There are two trainers currently, the original one, which runs on
       | CPU: https://github.com/nodchip/Stockfish, and a pytorch one
       | which runs on GPU: https://github.com/glinscott/nnue-pytorch.
       | 
       | The SF Discord is where all of the discussion/development is
       | happening: https://discord.gg/KGfhSJd.
       | 
       | Right now there is a lot of experimentation to try adjusting the
       | network architecture. The current leading approach is a much
       | larger net which takes in attack information per square (eg. is
       | this piece attacked by more pieces than it's defended by?). That
       | network is a little slower, but the additional information seems
       | to be enough to be stronger than the current architecture.
       | 
       | Btw, the original Shogi developers really did something amazing.
       | The nodchip trainer is all custom code, and trains extremely
       | strong nets. There are all sorts of subtle tricks embedded in
       | there as well that led to stronger nets. Not to mention, getting
       | the quantization (float32 -> int16/int8) working gracefully is a
       | huge challenge.
        
         | pk2200 wrote:
         | Just wanted to say thanks for many years of fantastic work on
         | both Stockfish and Leela. The computer chess community owes you
         | a huge debt of gratitude!
        
       ___________________________________________________________________
       (page generated 2021-01-13 23:01 UTC)