[HN Gopher] Compressing Chess Moves for Fun and Profit
       ___________________________________________________________________
        
       Compressing Chess Moves for Fun and Profit
        
       Author : thunderbong
       Score  : 123 points
       Date   : 2024-03-15 16:35 UTC (6 hours ago)
        
 (HTM) web link (mbuffett.com)
 (TXT) w3m dump (mbuffett.com)
        
       | sfeng wrote:
       | I would love to see this benchmarked against just running LZW
       | compression on a list of the moves. I think the repeated nature
       | of the moves means the entropy is actually rather low. It should
       | be possible to come up with a compression dictionary which can
       | save a lot more space than just a tight packing like this.
        
         | kondro wrote:
         | Or a trained zstd dictionary.
        
         | sedatk wrote:
         | > the repeated nature of the moves
         | 
         | On a single game? I don't think the goal is to pack all plays
         | together and unpack them every time. Every game must be
         | individually accessible as I understand.
        
         | vardump wrote:
         | LZW would do very badly on a list of binary chess moves.
         | 
         | Huffman (and even more so arithmetic encoding or ANS
         | (asymmetric numeral systems [0]) would be significantly better,
         | if you're careful how you encode data.
         | 
         | [0]: https://en.wikipedia.org/wiki/Asymmetric_numeral_systems
        
       | sshine wrote:
       | Related, compressing bingo cards:
       | 
       | https://github.com/diku-dk/openbanko
       | 
       | Among other things, contains:
       | 
       | Theoretically optimal compression of excessive amounts of bingo
       | cards
       | 
       | GPU-accelerated massively parallel bingo simulation using Futhark
       | 
       | I suspect the research was triggered by a late professor of
       | Computer Science who unironically calculated how many (Danish)
       | bingo cards there are: https://sprutskalle.dk/blog/wp-
       | content/uploads/bankoplader.p... -- to see the mathematical
       | rigour played out on such a casual question is nothing but
       | inspirational.
        
       | srameshc wrote:
       | > If you're into chess, I've made a repertoire builder embeds the
       | link chessbook.com, It's a great way to learn and practice chess.
        
       | velcrovan wrote:
       | Was discussed recent-ish-ly (Sep 2023) -- this was positions, not
       | moves:
       | 
       | https://news.ycombinator.com/item?id=37525348
       | 
       | Site linked there seems to be offline, so here's the archive
       | copy:
       | https://web.archive.org/web/20230928072950/https://www.ezzer...
        
       | tromp wrote:
       | The optimum compression seems to be an index into the list of all
       | legal moves.
       | 
       | But one can do better on average, by assuming that moves are not
       | uniformly distributed. Let an engine assign each legal move m
       | some probability p of being played. The worse the move, the lower
       | the probability we assign to it. Then a more optimal code for
       | move m is -log p bits, corresponding to an entropy (expected
       | surprise) of sum -p log p bits for the distribution over legal
       | moves.
       | 
       | Related: accurately estimating the number of legal positions of
       | chess [1].
       | 
       | [1] https://github.com/tromp/ChessPositionRanking
        
         | porphyra wrote:
         | Indeed, compression is just the dual problem of prediction, so
         | the better you are at predicting the next move, the better you
         | can compress.
        
         | bicijay wrote:
         | Sometimes i wonder, how i got so far im my software developer
         | career...
        
           | Operyl wrote:
           | Because there are so many fields of software development, and
           | you don't need to be good in all of them. Don't put yourself
           | down for not understanding, you can always put some time into
           | algorithms again if you need to :).
        
           | vecter wrote:
           | "Software", like "science", is a broad term. Just like there
           | are many kinds of scientists and you could be, say, a
           | biologist or a physicist, there are many entirely disparate
           | fields of software engineering. At the end of the day,
           | software itself is just a means to an ends: solving a
           | problem. Solving some of these problems like the one that GP
           | discussed have more to do with math than programming. The
           | programming is mostly the implementation of the math. So if
           | you're not versed in that math, of course you wouldn't
           | understand the code that's being written to implement it.
           | 
           | Building machine learning systems is vastly different from
           | building operating systems which is vastly different from
           | embedded systems which is vastly different from networking
           | which is vastly different what most of us do (myself
           | included), which is building CRUD apps. We're just solving
           | different problems. Of course there are common themes to
           | writing good code, just like there are common themes to
           | performing science, but the code and solutions will look
           | almost nothing alike.
        
           | acomar wrote:
           | most of the optimizations discussed are actually kind of
           | obvious if you know how chess notation works in the first
           | place. like the reason the starting move "Nf3" makes any
           | sense is because only 1 knight of all 4 on the board can
           | possibly move to f3 on the first move. what the OP is doing
           | is trying to take that kind of information density and trying
           | to represent it, first by assigning bits to each part of a
           | possible move and then by paring it down where the space of
           | legal moves excludes entire categories of possibilities.
           | there's another article that goes much further into how much
           | optimization is possible here:
           | https://lichess.org/@/lichess/blog/developer-
           | update-275-impr...
        
         | sedatk wrote:
         | So, Huffman encoding but for chess moves?
        
           | vardump wrote:
           | Kinda, but with a different dynamically determined dictionary
           | for each successive position.
        
             | transitionnel wrote:
             | This is the kind of discussion that will get us micro-
             | cybernetics in that ~1 year window.
        
         | dmurray wrote:
         | With a different engine for weaker players, whose moves would
         | generally need a longer representation to encode. For most use
         | cases where you're encoding a ton of PGNs, you also encode some
         | information about the playing strength of the players, so you
         | get that for free.
        
         | perihelions wrote:
         | Though, that engine evaluation is going to be absurdly
         | expensive, when you're compressing/decompressing billions of
         | games.
         | 
         | I think everyone's optimizing the wrong thing! The size of a
         | useful, real-world chess database would be dominated by its
         | search indexes--indexes that allow fast, random-access lookup
         | of every position that occurs in every game. (Or even better:
         | fuzzy searching for nearby, "similar" positions). This is the
         | difficult, open-ended algorithmic problem. Disk space is cheap!
         | 
         | (Out of curiosity, does anyone here actually _know_ how to
         | solve the  "fuzzy search of chess positions" problem?)
        
           | ant6n wrote:
           | You just need to enumerate for all possible states, the
           | possible transitions and evaluate the probabilities for them.
           | How many could there possibly be? Store in a small database
           | and look em up on the fly. Easy peasy.
        
             | perihelions wrote:
             | - _" and evaluate the probabilities for them"_
             | 
             | I don't know any good, _fast_ heuristics for predicting
             | this (likelihood of a human choosing a specific chess
             | move). Do you? Chess engine evaluation is computationally
             | quite heavy.
        
               | svnt wrote:
               | Parent was making a joke. The best upper bound found for
               | the number of chess positions is 4.8x10^44.
               | 
               | https://tromp.github.io/chess/chess.html
        
               | ant6n wrote:
               | You don't need to make perfect predictions. There could
               | actually be a lookup for a few common board states (say a
               | couple million, to cover the first couple moves). Beyond
               | that just use a simple scoring function, for example the
               | value of your vs opponent pieces, whether any of your
               | pieces (notably king) are in danger, whether u won, etc.
               | This means u get better scores for the more likely
               | winning moves, for capturing valuable pieces and for
               | avoiding the loss of your own.
               | 
               | U could also play two or so turns of minimax, or perhaps
               | use a neural network to evaluate the various reachable
               | board states.
               | 
               | So for a given state, enumerate possible transitions,
               | score the resulting states, and map to some sort of
               | probability distribution, then use some prefix code
               | (think Huffman tree[+]) based on the distribution of
               | probabilities to encode the transition.
               | 
               | It's perhaps not super fast, and not super accurate but
               | if you can weed out the 50% of dumb moves, that already
               | saves a bit.
               | 
               | [+] an easier and better approach is to map the
               | probabilities into an interval between 0 and 1, and keep
               | using fractional bits to ,,zoom in" until one of the
               | subintervals is uniquely defined, then recurse on that.
               | Some of the common compression algorithms uses that (but
               | I don't remember the specifics, those intro courses were
               | a long time ago).
        
             | anonymous-panda wrote:
             | I know you're joking, but it wouldn't really be that hard
             | in terms of a lookup for played positions which is what OP
             | asked for - you could have Bitboards for the relevant info
             | so finding a state would be a refined search starting with
             | the busy/occupied Bitboard. Since you're only storing games
             | that have been played rather than all positions, the size
             | of the index shouldn't be outrageously large compared with
             | other things already being stored.
        
             | crystaln wrote:
             | Realizing this is a joke, however you only move one piece
             | at a time and there are limited valid moves. Standard chess
             | notation depends on this.
        
           | alfalfasprout wrote:
           | Agreed. Ultimately, the most common reason for needing to
           | store a bunch of games is you want a chess database so you
           | can look up a given position.
           | 
           | So probably some kind of tree structure so you can look up
           | games by sequence of moves since opening. And something like
           | zobrist hashing for looking up a given position (regardless
           | of sequence).
           | 
           | Some of these ultra-small encodings would be
           | counterproductive for this purpose.
        
             | perihelions wrote:
             | - _" And something like zobrist hashing for looking up a
             | given position"_
             | 
             | Which gets particularly good if you combine it with some
             | hack like a Bloom filter. There's a trick, I don't remember
             | if it has a name, where you can effect a time/space
             | tradeoff by partitioning a search set and building a
             | separate Bloom filter for each partition. A lookup is a
             | constant-time query of each Bloom filter, plus a linear
             | scan of each partition that signaled a match.
             | 
             | (This is a reasonable thing to do, because (0) the size of
             | chess databases can get extremely large and (1) the queries
             | are human UX interactions that need to be responsive).
        
               | alfalfasprout wrote:
               | I mean, chess engines use transposition tables to store
               | encountered positions. In this case you can use the
               | zobrist hash as the index to a hash table and it's an
               | O(1) lookup for a given position.
               | 
               | Typically that's going to be the best way to index into
               | full games. Since you can point to an entry via a game
               | tree search (eg; each node of the tree is a position that
               | you can use as an index to a big transposition table).
               | 
               | You can probably use a bloom filter to check if a given
               | position has ever been encountered (eg; as a layer on top
               | of the transposition table) though. For something like
               | storing 10 billion positions w/ a 10^-6 false positive
               | probability (w/ reasonable number of hashes) you're
               | looking at 30+GiB bloom filters.
        
               | perihelions wrote:
               | Right! The hash table you're describing is O(1) lookup,
               | but its memory size is a constant multiple larger than
               | the Bloom filter. A Bloom filter doesn't tell you
               | anything about where the needle is-only whether the
               | needle exists; so, if used for _lookup_ , it reduces to
               | O(n) brute-force search. In between the two, you can
               | create an array of _M_ partitions of the haystack, each
               | with their own Bloom filter: this structure has something
               | like O(M + n /M) lookup, but with the smaller memory size
               | of the Bloom filter.
               | 
               | When I implemented this, I deliberately avoided
               | constructing the hash table you're describing, because
               | (to my recollection) it was larger than my remaining disk
               | space! :)
        
           | koolba wrote:
           | > Out of curiosity, does anyone here actually know how to
           | solve the "fuzzy search of chess positions" problem?
           | 
           | A classification based indexing would help here. The data
           | points of the index would be not just the position of the
           | pieces, but their relative as well. Things like the squares
           | under attack, pinned weaker pieces, and maybe some evaluation
           | of control of the board (again maybe via some attacking based
           | metric).
           | 
           | This seems like the kind of thing that could be done by
           | analyzing a larger dataset of games to create a set of
           | functional categorizations.
           | 
           | Once the data points themselves have been identified,
           | actually indexing them is the easy part.
           | 
           | Solving this more generally for the " _find me games like
           | X..._ " is much trickier without locking down that X to
           | something concrete.
        
             | perihelions wrote:
             | For example: "this position, but a few of the pieces can be
             | in different places". (Maybe penalizing pawn moves with a
             | higher weight--a pawn move is in practice a "larger change"
             | to a position than a piece move).
             | 
             | This version is easy to formalize: it's just querying a
             | bitstring x \in {0,1}^N, a bitboard vector representation
             | of a chess position, and asking if there's any other
             | needles within a short Hamming distance. I haven't a clue
             | how to solve that (or if it's solvable at all).
        
         | andoando wrote:
         | Im confused, how does probability help here when you need to
         | encode the possibility of all legal moves?
        
           | perihelions wrote:
           | You use variable-length encodings--shorter encodings for more
           | likely moves, longer encodings for ones that happen
           | infrequently.
           | 
           | This is the thing the parent comment was referring to,
           | 
           | https://en.wikipedia.org/wiki/Huffman_coding
        
           | jprete wrote:
           | Using the probability distribution of moves at any given
           | position, so that more likely moves take less space, will
           | result in saving space on average.
           | 
           | This is the same as Huffman coding, just applied to a
           | different domain.
           | 
           | For that matter, it might also resemble LLM compression of
           | input text, although LLM compression is intentionally lossy
           | and the above schemes are not.
        
           | neon5077 wrote:
           | Look up Huffman codes on Wikipedia.
           | 
           | In short, you put the most likely or most common values at
           | the front of the dictionary. Values 0-255 take up one byte,
           | 256-65535 take two bytes, etc. The lower your index, the
           | fewer bits are needed to represent the corresponding state.
        
             | angry_albatross wrote:
             | I think you mean Huffman codes, not Hamming codes.
        
               | neon5077 wrote:
               | Yup.
        
             | vitus wrote:
             | You're missing an important detail of Huffman codes: they
             | need to be prefix-free so you can actually tell when one
             | value ends and another begins.
             | 
             | A simplified example: suppose you have four letters in your
             | alphabet that you want to represent in binary. You can't
             | just represent these as A=0, B=1, C=10, and D=01, since
             | it's impossible to tell whether "01011" is meant to
             | represent ABABB or DDA or ACBB.
             | 
             | (Hamming codes on the other hand are intended for error
             | correction, where you want maximally distant code words to
             | minimize the chance that bitflips cause decoding errors.)
        
               | neon5077 wrote:
               | I don't really think that's too relevant to the core
               | concept of using fewer bits for the most common codes.
               | 
               | That's really a problem inherent to binary streams in
               | general, not just Huffman encoding.
        
               | vitus wrote:
               | It was a minor point re: this claim:
               | 
               | > Values 0-255 take up one byte, 256-65535 take two bytes
               | 
               | If you wanted to encode more than 256 values, then at
               | best you'd be able to specify values 0-254 taking one
               | byte, e.g. if you used 0xff as a special prefix to
               | represent "use another byte for all other values", but
               | that particular encoding means that you'd only be able to
               | encode another 256 values with a second byte.
        
         | groby_b wrote:
         | Lichess covered that :)
         | 
         | https://lichess.org/@/lichess/blog/developer-update-275-impr...
        
           | joshka wrote:
           | That's neat, so it looks like ln2(216) =~ 7.7 bits to encode
           | any valid move, and then compresses to even better!
        
         | jstanley wrote:
         | If you turn an arbitrary byte string into a list of indexes
         | into legal chess moves, then you can encode arbitrary data as
         | chess games!
         | 
         | (My implementation: https://incoherency.co.uk/chess-steg/ and
         | explanation at https://incoherency.co.uk/blog/stories/chess-
         | steg.html )
        
         | adrianmonk wrote:
         | > _Let an engine assign each legal move m some probability p of
         | being played._
         | 
         | This assumes you're storing moves of players who are good at
         | chess!
        
         | programjames wrote:
         | Yep, make an LLM but for chess moves.
        
       | Someone wrote:
       | FTA: _"But they don't account for the majority of what I'm
       | storing, which are PGNs."_
       | 
       | If they're storing chess games, why do they try to compress
       | individual moves?
       | 
       | If you compress games, you can get much better compression. For
       | example, in any board position, deterministically order all legal
       | moves (they're already ignoring some illegal moves in their
       | setup, so I think ignoring all is within the rules of the game),
       | and write down the number of the move made.
       | 
       | At game start, that will be an integer between 1 and 20 (16
       | different pawn moves, 4 different knight moves). Black's reply
       | similarly will be an integer between 1 and 20. For white's second
       | move, the maximum will depend on the first move.
       | 
       | To compress an entire game, use arithmetic encoding
       | (https://en.wikipedia.org/wiki/Arithmetic_coding). If there are,
       | on average, _n_ legal moves per position, that will use _2log n_
       | bits per move.
       | 
       | https://www.chessprogramming.org/Chess_Position: _"The maximum
       | number of moves per chess position seems 218"_ , so that will
       | give you less than 8 bits per move. In real life, it probably
       | will be less than 6.
       | 
       | The only reason I see why this wouldn't be an option is
       | performance. Decoding would require move generation and thus be
       | more complex.
       | 
       | To improve on that, make a deterministic predictor of how likely
       | moves are and use that to improve on the arithmetic encoder. That
       | probably isn't worth it, though because of the increased decoding
       | complexity.
        
         | marcusbuffett wrote:
         | I did end up doing something like this:
         | https://mbuffett.com/posts/compressing-chess-moves-even-furt...
         | 
         | You're spot on about the performance reason I didn't want to do
         | this originally, but I did some testing and turns out move
         | generation in the library I use is Blazing Fast and wouldn't be
         | a bottleneck
        
           | crdrost wrote:
           | It's also that PGN allows you to encode illegal moves, which
           | is important if your dataset contains over-the-board games.
           | 
           | That you only got to about 10-12 bits per move is actually
           | kind of sad in a way, because it means you're not doing
           | substantially better than the approach where you record a
           | move as just (delete-piece-at ____, recreate-that-piece-at
           | ____) 12-bit pairs, where castles are implicitly only
           | recorded by moving the king further than usual and
           | underpromotion has to be explicitly recorded with an extra
           | 12-bit move that is otherwise semantically impossible.
        
           | joshka wrote:
           | I just added your blog to my RSS reader and noticed that your
           | blog title in the RSS is a bit weird (in case you were
           | unaware):                   <title>Posts on A blog</title>
           | <link>https://mbuffett.com/posts/</link>
           | <description>Recent content in Posts on A blog</description>
        
       | davguerrero wrote:
       | For optimal compression, look into Huffman coding:
       | https://en.wikipedia.org/wiki/Huffman_coding
        
         | andruby wrote:
         | I think Huffman would be good, but I don't think it would be
         | optimal (as in, the smallest possible amount of bits)
        
           | canucker2016 wrote:
           | Huffman requires minimum one bit IIRC whereas Arithmetic
           | coding or ANS (
           | https://en.wikipedia.org/wiki/Asymmetric_numeral_systems )
           | can use fewer if there's a skewed distribution.
           | 
           | Arithmetic coding has been avoided in practical situations
           | due to patents, but many of those have expired ( see the
           | patent section of
           | https://en.wikipedia.org/wiki/Arithmetic_coding or maybe not,
           | lawyers tend to advise not even reading about patent
           | specifics )
        
             | ska wrote:
             | The core tech patents have expired. They did have some
             | interesting ones around hardware implementations, iirc.
             | 
             | Overall the chilling effect was interesting - I found that
             | many people doing academic work around compression didn't
             | really know much about it, didn't teach it, just because it
             | was a mess.
             | 
             | The core idea is elegant, and easily implemented.
        
         | marcusbuffett wrote:
         | I mention Huffman in my follow-up post:
         | https://mbuffett.com/posts/compressing-chess-moves-even-furt...
         | , but chose arithmetic encoding instead.
        
       | nmilo wrote:
       | Not sure the author's use case the naive solution seems like just
       | using 6 bits for the piece's original position, 6 bits for the
       | destination, and 2 bits to designate promotion type, for 14 total
       | bits. Everything else can be found from the board state, and I
       | can't imagine a scenario where you have a move, want to convert
       | to algebraic notation, and also don't know what the board is.
        
         | andruby wrote:
         | I was thinking the same thing. Often both squares will be close
         | to each other, so I'm sure that a generic compression (eg zstd)
         | can even reduce it even further by taking advantage of that
         | proximity.
        
         | jprete wrote:
         | I had very similar thoughts but I think one can do much better,
         | even.
         | 
         | Each move only needs four bits to identify the piece because
         | each side only has 16 pieces at maximum. The game replayer
         | would need to keep track of each pawn's starting file though
         | (including after promotions).
         | 
         | Variable-length encodings of the move options help a lot. Pawns
         | only need two bits because there are never more than four legal
         | moves for a pawn - except for promotions, but just handle those
         | specially for those exact cases (if a pawn moves to the last
         | rank then encode the promotion with two bits, otherwise encode
         | the next move). Knights and kings each need three bits for the
         | move - encode each king-castling option as a move to the
         | appropriate rear diagonal (normally off the board so not
         | otherwise legal). Bishops and rooks need four bits (rank and
         | direction). Queens only need five (three bits for rank, two for
         | direction).
         | 
         | This way you can get down to between six and nine bits per
         | move.
        
           | amluto wrote:
           | > The game replayer would need to keep track of each pawn's
           | starting file though (including after promotions).
           | 
           | That seems unnecessary. Don't index the pieces based on their
           | type and where they started -- index them based on their
           | current location. So the piece the lexicographically first
           | (rank, file) is piece 0, and 4 bits trivially identifies a
           | piece once you know the color.
        
             | jprete wrote:
             | True, I hadn't thought of that.
        
         | groby_b wrote:
         | The problem is that the board state requirement makes searching
         | for specific moves hard.
         | 
         | But if you allow board state, you can do less than 14. 3 bits
         | for the type of piece, 6 for the destination, 2 extra bits.
         | (Promotion if destination is last/first line, disambiguation
         | for pawns in en passant)
         | 
         | You're still at about twice the theoretical minimum, I think. I
         | vaguely recall an upper bound of 6 bits?
        
       | Karellen wrote:
       | It's kinda complex.
       | 
       | I think my initial stab would be to encoding the source position
       | (6 bits) and end position (6 bits) for a constant 12 bits per
       | move.
       | 
       | You don't need to store whether it's a capture, you can figure
       | that out from the game state. You don't need to store
       | disambiguations, there are none in that format. You don't need to
       | store check/mate, you can figure that out from the game state.
       | 
       | The only wrinkles (that I can tell) are castles and promotions.
       | But you can get around this by the fact that kings and pawns have
       | limited movement, so their legal destination squares are highly
       | constrained.
       | 
       | So you could signal a promotion by encoding the destination with
       | opposite rank, and using the file to encode which piece.
       | Promoting to a rook on c8 gets its destination encoded as a1 -
       | "a" for a rook, and "1" to indicate a promotion.
       | 
       | Similarly, you could encode a castle by encoding the king's move
       | with opposite destination rank. Castling the king to f1 gets
       | encoded as f8, and to g1 as g8.
        
         | porkbrain wrote:
         | As for promotion: in a valid chess position you could promote a
         | single pawn at three files simultaneously. One by moving
         | forward and then left and right capture. Additionally, you
         | could have two pawns promoting on the same square, again with
         | capture. And you can have a combination of both.
         | 
         | Encode a white's move exd8=N with white pawns on e7 and c7 and
         | three black queens on f8, d8 and b8.
        
           | Karellen wrote:
           | > you could promote a single pawn at three files
           | simultaneously. One by moving forward and then left and right
           | capture.
           | 
           | Ooh, didn't think of that, thanks. Still there's enough
           | constraint in the legal destination squares to work around
           | that. There's at least half the board that's inaccessible to
           | a pawn about to promote, which should be enough to encode any
           | of the 3 legal destination squares and 5 possible promotion
           | targets.
           | 
           | Edit: maybe keep the destination file as-is, and use the
           | destination rank to encode the promotion piece?
           | 
           | > Additionally, you could have two pawns promoting on the
           | same square
           | 
           | The source square should disambiguate between those though,
           | right?
        
           | nmilo wrote:
           | You can encode a promotion as the destination file plus rank
           | 1-4 for the 4 different kinds, since pawns can't move
           | backwards
        
         | tzs wrote:
         | > Similarly, you could encode a castle by encoding the king's
         | move with opposite destination rank. Castling the king to f1
         | gets encoded as f8, and to g1 as g8.
         | 
         | You are overcomplicating this. For castling just record it as
         | the King's source and destination. E.g., White kingside
         | castling is e1g1, White castling queenside is e1c1, Black
         | castling kingside is e8g8, and white castling queenside is
         | e8c8.
         | 
         | All king moves other than castling move the king at most one
         | file, so when you see a king on e1 and the move is e1g1 or e1c2
         | which is a move of two files you can infer that it must be
         | castling.
         | 
         | For promotion, I suggest splitting it into two cases: promotion
         | to a queen and promotion to something else. I saw a post once
         | on Reddit from someone who analyzed promotions from all games
         | in the Lichess database, and 98.7% were to queens, so we'll
         | make that case the simplest.
         | 
         | I suggest that pawns that promote to a queen are simply
         | recorded as moving to the promotion square. It is implicit that
         | they promote to a queen. For example a pawn at b7 that moves
         | straight forward and promotes to a queen would be recorded as
         | b7b8. A pawn at b7 that captures on a8 and promotes to a queen
         | would be recorded as b7a8.
         | 
         | For pawns that promote to a rook, record the move as a move to
         | a square one rank back from the promotion square. For example
         | b7b8 promoting to rook would be recorded as b7b7, and b7a8
         | promoting to a rook would be recorded as b7a7.
         | 
         | Similarly for promotions to bishops. Record the destination two
         | ranks back. So b7b8 promoting to bishop would be b7b6. Similar
         | for knights but three ranks back, so b7a5 for a pawn at b7 that
         | captures on a8 and promotes to a knight.
        
           | Karellen wrote:
           | > All king moves other than castling move the king at most
           | one file, so when you see a king on e1 and the move is e1g1
           | or e1c2 which is a move of two files you can infer that it
           | must be castling.
           | 
           | Yeah. For some reason I had a brain fart and thought that the
           | two castles moved the king 1 and 2 files, instead of 2 and 3
           | files, and that made me think you needed to disambiguate a 1
           | file castle with a 1 file move.
           | 
           | Which is clearly dumb. I blame insufficient coffee.
        
       | evertedsphere wrote:
       | Some thoughts I had while reading this, probably not very
       | original: the fact that chess moves are not random and usually
       | adhere to opening theory to some degree means you could get some
       | value out of using something like a trie of some small finite
       | depth, trained on a database of games, to compress common opening
       | lines much better, at the cost of making things like 1. a4 a5 2.
       | h4 h5 that aren't in the trie much costlier to represent. A
       | practical way to actually do this would likely look like applying
       | arithmetic coding to all of the possible length-N prefixes of the
       | game.
       | 
       | The mid- or lategame are also far from random and could probably
       | be compressed in small "chunks" via a coder that would
       | (effectively) learn to predict patterns of e.g. capture being
       | followed by a recapture and cause such chunks to require fewer
       | bits to represent.
       | 
       | I'm not very knowledgeable about compression algorithms, though;
       | I'm sure others will be able to provide corrections or
       | references.
        
         | Spark_Ed wrote:
         | You can reference specific move numbers from specific master
         | games to compress. But you still need to compress how you store
         | those master games. What you're describing for mid/late game
         | might make more sense like this: you use a fixed version of
         | stockfish to suggest moves (depth and version should be locked
         | to static values so it reproduces the same engine move each
         | time). If it's within the first 8 suggestions, you can flag
         | that it's an engine move with 4 bits. Decompression time is
         | significantly larger, but it maximizes the space for cpu trade-
         | off.
        
         | andruby wrote:
         | That's exactly what the author seems to have done and describes
         | in their subsequent post:
         | https://mbuffett.com/posts/compressing-chess-moves-even-furt...
         | (published yesterday)
        
       | ChrisArchitect wrote:
       | Related follow-up from same author:
       | 
       |  _Compressing Chess Moves Even Further, To 3.7 Bits Per Move_
       | 
       | https://mbuffett.com/posts/compressing-chess-moves-even-furt...
        
       | modeless wrote:
       | I wonder how this compares to using zstd with a dictionary.
        
       | zitterbewegung wrote:
       | Right now I think you could compress chess games for fun and
       | profit .
       | 
       | Since chess databases have come along higher level chess players
       | can look pretty far ahead. We are reaching the point where if you
       | study your opponent you can pregame it but studying what openings
       | they make and prepping for openings.
       | 
       | But now we have come to the point if you can memorize large
       | portions of chess databases you know optimal play given the
       | database because if anyone has played the game and gets any
       | advantage out of it people will play the line. It would be
       | interesting to take the chess databases see their compression
       | ratio and then how long would it take to lose against swordfish.
        
       | sdenton4 wrote:
       | Use StockFish to predict the next move, and only store the diff
       | between the actual move and the prediction.
       | 
       | Or, better, get a ranked list of all possible moves from
       | stockfish, and use a variable-length integer to encode the
       | position in the list. Then the best move takes ~1 bit, and worse
       | moves take more bits. (And we can do fun things like compute how
       | bad a player is by how big their game is after compression.)
        
         | gus_massa wrote:
         | I almost agree, but as other comment says, it's probaby better
         | to use the StocFish points to generate a probability and then
         | use Huffman encoding.
        
           | Retr0id wrote:
           | You'd probably want to use something like CABAC rather than
           | Huffman, since it deals better with dynamic probabilities.
           | https://en.wikipedia.org/wiki/Context-
           | adaptive_binary_arithm...
        
             | gus_massa wrote:
             | It looks like a very interesting comparison. I still not
             | sure how to tranform StockFish points to probabilities.
             | (Perhaps proportional to exp(-difference/temperature),
             | where temperature is a magical number or perhaps it's like
             | something/elo??? There are too many parameters to tweak.)
        
               | Retr0id wrote:
               | Given OP's dataset, you could probably figure out how to
               | map them, experimentally (e.g. by making a histogram)
        
         | Retr0id wrote:
         | The more advanced your predictor is, the slower your compressor
         | gets. OP has 600 Million moves to encode, how long does it take
         | to ask Stockfish for its opinion on 600M board states? (and
         | then again, every time you want to decompress) (not a
         | rhetorical question btw, I know little about chess engines)
         | 
         | I suspect the sweet spot here would be to use a much worse
         | chess engine for the predictions, giving faster
         | compression/decompression at the expense of the compression
         | ratio.
        
           | marcusbuffett wrote:
           | Your intuition here is right on the money. I wrote a
           | subsequent post with pretty much that exact approach:
           | https://mbuffett.com/posts/compressing-chess-moves-even-
           | furt...
        
         | V-2 wrote:
         | > Use StockFish to predict the next move, and only store the
         | diff between the actual move and the prediction.
         | 
         | This ties the algorithm down to one specific version of
         | Stockfish, and configured identically (stuff like the hashtable
         | size etc.), because all such factors will have an impact on
         | Stockfish's evaluations. One factor changes, and you can't
         | decompress the backup.
        
       | bluedino wrote:
       | Anyone know of a discussion of say, 8 bit chess computers and
       | their techniques and tricks?
        
         | omoikane wrote:
         | Is this what you are looking for?
         | 
         | https://news.ycombinator.com/item?id=36431917 - Video Chess
         | disassembled and commented (2023-06-22)
         | 
         | https://nanochess.org/video_chess.html
         | 
         | Video Chess for Atari 2600 worked with just 128 bytes of
         | memory.
        
       | andjd wrote:
       | I would be curious how this compares to a more-or-less off-the-
       | shelf text compression algorithm like gzip. My guess is that over
       | the entire database, this would be more efficient than the OP's
       | ad-hoc implementation or any alternative mentioned here.
        
         | pimlottc wrote:
         | That's my first thought as well. Plain old gzip should do
         | pretty well and provides a baseline to beat.
        
         | ska wrote:
         | Unlikely. Gzip and the like will do well with getting rid of
         | the inherent redundancy of ascii coding but it's a general
         | algorithm and can't take advantage of known structure.
        
       | infogulch wrote:
       | Has anyone tried columnar compression of full game states instead
       | of bit packing moves?
       | 
       | 64 squares * (Empty | (Black | White) * (King | Queen | Bishop |
       | Knight | Rook | Pawn)) + (Black | White) * (Can castle queenside
       | | Can castle kingside)
       | 
       | It could be encoded as 64 4-bit int columns + 4 bool columns =
       | 260 bits, most of which don't change between moves. Normal
       | columnar encoding and compression strategies could probably
       | reduce that to a few bits per board state.
        
         | marcusbuffett wrote:
         | I have actually, alongside my work of compressing moves. My
         | compression scheme averages about 150 bits per position, or
         | about 35% of the standard text notation in EPD format.
         | 
         | The thing I optimized for is that there's very often repeated
         | blank spaces or repeated pawns on the same rank.
         | 
         | Also instead of storing castling status separately, it's stored
         | as a separate piece on the appropriate rook.
         | 
         | These take advantage of the fact that there's 6 pieces, so 3
         | bits to encode them leaves two options remaining. One is
         | PawnRepeating and the other is RookCastleAvailable, in my
         | scheme.
         | 
         | There's probably improvements to be made. I'll write a post on
         | it when it's finalized.
        
           | infogulch wrote:
           | Using the extra available piece indexes for "rook that can
           | castle" and "pawn that moved two squares" is a great idea. So
           | e.g. moving the king would not only change the squares where
           | the king moved from and to, but would also convert both of
           | the rooks from "rook that can castle" to "rook that can't
           | castle". Same for a pawn that moved two squares and can be
           | captured by en passant.
           | 
           | I guess you also need some way to encode which player's turn
           | it is. Though maybe you could eliminate that by flipping the
           | board on black's move and always encoding from the
           | perspective of the last player's move?
           | 
           | I'm curious about whether a naive columnar encoding scheme
           | could beat a more complex encoding scheme, after compression.
           | Not columnar in the sense of ranks and files, but columnar in
           | the sense of data science storage formats (e.g. parquet),
           | where each 'column' is the state of a specific square across
           | different board states. Given 64 such columns, a game state
           | is a single row. The hypothesis being that if you're encoding
           | all the game states of a large number of games (say 1000),
           | after RLE and compression etc you would see a net compression
           | better than more complex encoding schemes. Given a big block
           | of game states like this, then a single 'game' would be a
           | list of offsets into the game states. This would probably
           | also compress very nicely in columnar format.
           | 
           | Now I want to try it...
        
       | bee_rider wrote:
       | Thinking about this has made me realize I really don't know
       | anything about compression other than which arguments to pass to
       | tar to get it to compress, haha.
       | 
       | How would a typical compression algorithm do, losslessly
       | compressing the whole game?
       | 
       | Can you lossily compress the game? If so, you would end up, after
       | decompressing, with a game that had some ambiguous moves, right?
       | But, only one is valid all the way through. Why not lossily
       | compress and then step through each game, checking if each move
       | is valid? Is that even still considered lossy?
       | 
       | Hypothetically you could end up with multiple valid games I
       | guess... but they are just games, haha, who cares if a few are
       | wrong?
        
         | ska wrote:
         | It's not lossy if you can always recover the correct data.
         | 
         | Rather than use a general algorithm, your compressor would have
         | to reason: I'd have to spend a bit to disambiguate x & y, but
         | only x is a valid board state, so I won't bother and the
         | decompressor also knows I won't bother so it will all work out.
         | 
         | This sort of implicit coding can work, but it is fragile.
        
       | robrenaud wrote:
       | Only tangentially related, but I think the mid/late game is a
       | much more interesting/useful/hard avenue for chess tools.
       | 
       | Chess.com's opening explorer is pretty good, which it seems like
       | this is kind of trying to compete against.
       | 
       | What I struggle with is the post game review, where I make a
       | dubious move but I don't actually understand the reason why it's
       | bad. Sure, the engine shows better moves and the refutation to my
       | move, but I often don't know why. The automatically generated
       | explanation is usually pretty poor.
       | 
       | I wish I had even a poor copy of Danya giving me commentary.
       | 
       | Perhaps I should just lean in and try to implement something like
       | this, but focus it more on coaching than commentary.
       | 
       | https://arxiv.org/pdf/2212.08195.pdf
        
         | jaryd wrote:
         | You could check to see if one of his games has that position
         | using https://fenfinder.com (site I built). Of course it's a
         | long shot given the huge possibility-space of unique positions,
         | but hey, one can hope :).
        
         | V-2 wrote:
         | I think that AI, overhyped as it may be, will truly get to
         | shine when it comes to this aspect of chess - basically,
         | tutoring.
        
       | dhsysusbsjsi wrote:
       | What I haven't seen discussed here much is optimising for
       | developer maintenance. I think the author's solution is great for
       | this; it's easy to understand and each move has a bit pattern.
       | Easy to debug. Somebody taking over in the future will understand
       | this compression.
       | 
       | If on the other hand you can squeeze another 10% storage from
       | Huffman encoded inverse tree lookup tables that only neckbeards
       | understand, you're limiting your pool of people able to do
       | maintenance on this system in the future when the author is long
       | gone.
        
       | adrianmonk wrote:
       | > _I run a site that stores a ton of chess lines, something like
       | 100 million in total._
       | 
       | Because there are so many, one approach would be to sort the list
       | of lines lexicographically to group similar ones together, then
       | compress the result with a sliding window compression algorithm.
       | 
       | The sliding window compression will avoid storing repeated parts
       | a second time, and the first part of the lines will be repeated a
       | lot because of the sorting. There may also be some other repeated
       | sequences.
       | 
       | This assumes that it's OK to sort the lines, though, which might
       | or might not be true.
        
       ___________________________________________________________________
       (page generated 2024-03-15 23:00 UTC)