[HN Gopher] Compressing Chess Moves for Fun and Profit
___________________________________________________________________
Compressing Chess Moves for Fun and Profit
Author : thunderbong
Score : 123 points
Date : 2024-03-15 16:35 UTC (6 hours ago)
(HTM) web link (mbuffett.com)
(TXT) w3m dump (mbuffett.com)
| sfeng wrote:
| I would love to see this benchmarked against just running LZW
| compression on a list of the moves. I think the repeated nature
| of the moves means the entropy is actually rather low. It should
| be possible to come up with a compression dictionary which can
| save a lot more space than just a tight packing like this.
| kondro wrote:
| Or a trained zstd dictionary.
| sedatk wrote:
| > the repeated nature of the moves
|
| On a single game? I don't think the goal is to pack all plays
| together and unpack them every time. Every game must be
| individually accessible as I understand.
| vardump wrote:
| LZW would do very badly on a list of binary chess moves.
|
| Huffman (and even more so arithmetic encoding or ANS
| (asymmetric numeral systems [0]) would be significantly better,
| if you're careful how you encode data.
|
| [0]: https://en.wikipedia.org/wiki/Asymmetric_numeral_systems
| sshine wrote:
| Related, compressing bingo cards:
|
| https://github.com/diku-dk/openbanko
|
| Among other things, contains:
|
| Theoretically optimal compression of excessive amounts of bingo
| cards
|
| GPU-accelerated massively parallel bingo simulation using Futhark
|
| I suspect the research was triggered by a late professor of
| Computer Science who unironically calculated how many (Danish)
| bingo cards there are: https://sprutskalle.dk/blog/wp-
| content/uploads/bankoplader.p... -- to see the mathematical
| rigour played out on such a casual question is nothing but
| inspirational.
| srameshc wrote:
| > If you're into chess, I've made a repertoire builder embeds the
| link chessbook.com, It's a great way to learn and practice chess.
| velcrovan wrote:
| Was discussed recent-ish-ly (Sep 2023) -- this was positions, not
| moves:
|
| https://news.ycombinator.com/item?id=37525348
|
| Site linked there seems to be offline, so here's the archive
| copy:
| https://web.archive.org/web/20230928072950/https://www.ezzer...
| tromp wrote:
| The optimum compression seems to be an index into the list of all
| legal moves.
|
| But one can do better on average, by assuming that moves are not
| uniformly distributed. Let an engine assign each legal move m
| some probability p of being played. The worse the move, the lower
| the probability we assign to it. Then a more optimal code for
| move m is -log p bits, corresponding to an entropy (expected
| surprise) of sum -p log p bits for the distribution over legal
| moves.
|
| Related: accurately estimating the number of legal positions of
| chess [1].
|
| [1] https://github.com/tromp/ChessPositionRanking
| porphyra wrote:
| Indeed, compression is just the dual problem of prediction, so
| the better you are at predicting the next move, the better you
| can compress.
| bicijay wrote:
| Sometimes i wonder, how i got so far im my software developer
| career...
| Operyl wrote:
| Because there are so many fields of software development, and
| you don't need to be good in all of them. Don't put yourself
| down for not understanding, you can always put some time into
| algorithms again if you need to :).
| vecter wrote:
| "Software", like "science", is a broad term. Just like there
| are many kinds of scientists and you could be, say, a
| biologist or a physicist, there are many entirely disparate
| fields of software engineering. At the end of the day,
| software itself is just a means to an ends: solving a
| problem. Solving some of these problems like the one that GP
| discussed have more to do with math than programming. The
| programming is mostly the implementation of the math. So if
| you're not versed in that math, of course you wouldn't
| understand the code that's being written to implement it.
|
| Building machine learning systems is vastly different from
| building operating systems which is vastly different from
| embedded systems which is vastly different from networking
| which is vastly different what most of us do (myself
| included), which is building CRUD apps. We're just solving
| different problems. Of course there are common themes to
| writing good code, just like there are common themes to
| performing science, but the code and solutions will look
| almost nothing alike.
| acomar wrote:
| most of the optimizations discussed are actually kind of
| obvious if you know how chess notation works in the first
| place. like the reason the starting move "Nf3" makes any
| sense is because only 1 knight of all 4 on the board can
| possibly move to f3 on the first move. what the OP is doing
| is trying to take that kind of information density and trying
| to represent it, first by assigning bits to each part of a
| possible move and then by paring it down where the space of
| legal moves excludes entire categories of possibilities.
| there's another article that goes much further into how much
| optimization is possible here:
| https://lichess.org/@/lichess/blog/developer-
| update-275-impr...
| sedatk wrote:
| So, Huffman encoding but for chess moves?
| vardump wrote:
| Kinda, but with a different dynamically determined dictionary
| for each successive position.
| transitionnel wrote:
| This is the kind of discussion that will get us micro-
| cybernetics in that ~1 year window.
| dmurray wrote:
| With a different engine for weaker players, whose moves would
| generally need a longer representation to encode. For most use
| cases where you're encoding a ton of PGNs, you also encode some
| information about the playing strength of the players, so you
| get that for free.
| perihelions wrote:
| Though, that engine evaluation is going to be absurdly
| expensive, when you're compressing/decompressing billions of
| games.
|
| I think everyone's optimizing the wrong thing! The size of a
| useful, real-world chess database would be dominated by its
| search indexes--indexes that allow fast, random-access lookup
| of every position that occurs in every game. (Or even better:
| fuzzy searching for nearby, "similar" positions). This is the
| difficult, open-ended algorithmic problem. Disk space is cheap!
|
| (Out of curiosity, does anyone here actually _know_ how to
| solve the "fuzzy search of chess positions" problem?)
| ant6n wrote:
| You just need to enumerate for all possible states, the
| possible transitions and evaluate the probabilities for them.
| How many could there possibly be? Store in a small database
| and look em up on the fly. Easy peasy.
| perihelions wrote:
| - _" and evaluate the probabilities for them"_
|
| I don't know any good, _fast_ heuristics for predicting
| this (likelihood of a human choosing a specific chess
| move). Do you? Chess engine evaluation is computationally
| quite heavy.
| svnt wrote:
| Parent was making a joke. The best upper bound found for
| the number of chess positions is 4.8x10^44.
|
| https://tromp.github.io/chess/chess.html
| ant6n wrote:
| You don't need to make perfect predictions. There could
| actually be a lookup for a few common board states (say a
| couple million, to cover the first couple moves). Beyond
| that just use a simple scoring function, for example the
| value of your vs opponent pieces, whether any of your
| pieces (notably king) are in danger, whether u won, etc.
| This means u get better scores for the more likely
| winning moves, for capturing valuable pieces and for
| avoiding the loss of your own.
|
| U could also play two or so turns of minimax, or perhaps
| use a neural network to evaluate the various reachable
| board states.
|
| So for a given state, enumerate possible transitions,
| score the resulting states, and map to some sort of
| probability distribution, then use some prefix code
| (think Huffman tree[+]) based on the distribution of
| probabilities to encode the transition.
|
| It's perhaps not super fast, and not super accurate but
| if you can weed out the 50% of dumb moves, that already
| saves a bit.
|
| [+] an easier and better approach is to map the
| probabilities into an interval between 0 and 1, and keep
| using fractional bits to ,,zoom in" until one of the
| subintervals is uniquely defined, then recurse on that.
| Some of the common compression algorithms uses that (but
| I don't remember the specifics, those intro courses were
| a long time ago).
| anonymous-panda wrote:
| I know you're joking, but it wouldn't really be that hard
| in terms of a lookup for played positions which is what OP
| asked for - you could have Bitboards for the relevant info
| so finding a state would be a refined search starting with
| the busy/occupied Bitboard. Since you're only storing games
| that have been played rather than all positions, the size
| of the index shouldn't be outrageously large compared with
| other things already being stored.
| crystaln wrote:
| Realizing this is a joke, however you only move one piece
| at a time and there are limited valid moves. Standard chess
| notation depends on this.
| alfalfasprout wrote:
| Agreed. Ultimately, the most common reason for needing to
| store a bunch of games is you want a chess database so you
| can look up a given position.
|
| So probably some kind of tree structure so you can look up
| games by sequence of moves since opening. And something like
| zobrist hashing for looking up a given position (regardless
| of sequence).
|
| Some of these ultra-small encodings would be
| counterproductive for this purpose.
| perihelions wrote:
| - _" And something like zobrist hashing for looking up a
| given position"_
|
| Which gets particularly good if you combine it with some
| hack like a Bloom filter. There's a trick, I don't remember
| if it has a name, where you can effect a time/space
| tradeoff by partitioning a search set and building a
| separate Bloom filter for each partition. A lookup is a
| constant-time query of each Bloom filter, plus a linear
| scan of each partition that signaled a match.
|
| (This is a reasonable thing to do, because (0) the size of
| chess databases can get extremely large and (1) the queries
| are human UX interactions that need to be responsive).
| alfalfasprout wrote:
| I mean, chess engines use transposition tables to store
| encountered positions. In this case you can use the
| zobrist hash as the index to a hash table and it's an
| O(1) lookup for a given position.
|
| Typically that's going to be the best way to index into
| full games. Since you can point to an entry via a game
| tree search (eg; each node of the tree is a position that
| you can use as an index to a big transposition table).
|
| You can probably use a bloom filter to check if a given
| position has ever been encountered (eg; as a layer on top
| of the transposition table) though. For something like
| storing 10 billion positions w/ a 10^-6 false positive
| probability (w/ reasonable number of hashes) you're
| looking at 30+GiB bloom filters.
| perihelions wrote:
| Right! The hash table you're describing is O(1) lookup,
| but its memory size is a constant multiple larger than
| the Bloom filter. A Bloom filter doesn't tell you
| anything about where the needle is-only whether the
| needle exists; so, if used for _lookup_ , it reduces to
| O(n) brute-force search. In between the two, you can
| create an array of _M_ partitions of the haystack, each
| with their own Bloom filter: this structure has something
| like O(M + n /M) lookup, but with the smaller memory size
| of the Bloom filter.
|
| When I implemented this, I deliberately avoided
| constructing the hash table you're describing, because
| (to my recollection) it was larger than my remaining disk
| space! :)
| koolba wrote:
| > Out of curiosity, does anyone here actually know how to
| solve the "fuzzy search of chess positions" problem?
|
| A classification based indexing would help here. The data
| points of the index would be not just the position of the
| pieces, but their relative as well. Things like the squares
| under attack, pinned weaker pieces, and maybe some evaluation
| of control of the board (again maybe via some attacking based
| metric).
|
| This seems like the kind of thing that could be done by
| analyzing a larger dataset of games to create a set of
| functional categorizations.
|
| Once the data points themselves have been identified,
| actually indexing them is the easy part.
|
| Solving this more generally for the " _find me games like
| X..._ " is much trickier without locking down that X to
| something concrete.
| perihelions wrote:
| For example: "this position, but a few of the pieces can be
| in different places". (Maybe penalizing pawn moves with a
| higher weight--a pawn move is in practice a "larger change"
| to a position than a piece move).
|
| This version is easy to formalize: it's just querying a
| bitstring x \in {0,1}^N, a bitboard vector representation
| of a chess position, and asking if there's any other
| needles within a short Hamming distance. I haven't a clue
| how to solve that (or if it's solvable at all).
| andoando wrote:
| Im confused, how does probability help here when you need to
| encode the possibility of all legal moves?
| perihelions wrote:
| You use variable-length encodings--shorter encodings for more
| likely moves, longer encodings for ones that happen
| infrequently.
|
| This is the thing the parent comment was referring to,
|
| https://en.wikipedia.org/wiki/Huffman_coding
| jprete wrote:
| Using the probability distribution of moves at any given
| position, so that more likely moves take less space, will
| result in saving space on average.
|
| This is the same as Huffman coding, just applied to a
| different domain.
|
| For that matter, it might also resemble LLM compression of
| input text, although LLM compression is intentionally lossy
| and the above schemes are not.
| neon5077 wrote:
| Look up Huffman codes on Wikipedia.
|
| In short, you put the most likely or most common values at
| the front of the dictionary. Values 0-255 take up one byte,
| 256-65535 take two bytes, etc. The lower your index, the
| fewer bits are needed to represent the corresponding state.
| angry_albatross wrote:
| I think you mean Huffman codes, not Hamming codes.
| neon5077 wrote:
| Yup.
| vitus wrote:
| You're missing an important detail of Huffman codes: they
| need to be prefix-free so you can actually tell when one
| value ends and another begins.
|
| A simplified example: suppose you have four letters in your
| alphabet that you want to represent in binary. You can't
| just represent these as A=0, B=1, C=10, and D=01, since
| it's impossible to tell whether "01011" is meant to
| represent ABABB or DDA or ACBB.
|
| (Hamming codes on the other hand are intended for error
| correction, where you want maximally distant code words to
| minimize the chance that bitflips cause decoding errors.)
| neon5077 wrote:
| I don't really think that's too relevant to the core
| concept of using fewer bits for the most common codes.
|
| That's really a problem inherent to binary streams in
| general, not just Huffman encoding.
| vitus wrote:
| It was a minor point re: this claim:
|
| > Values 0-255 take up one byte, 256-65535 take two bytes
|
| If you wanted to encode more than 256 values, then at
| best you'd be able to specify values 0-254 taking one
| byte, e.g. if you used 0xff as a special prefix to
| represent "use another byte for all other values", but
| that particular encoding means that you'd only be able to
| encode another 256 values with a second byte.
| groby_b wrote:
| Lichess covered that :)
|
| https://lichess.org/@/lichess/blog/developer-update-275-impr...
| joshka wrote:
| That's neat, so it looks like ln2(216) =~ 7.7 bits to encode
| any valid move, and then compresses to even better!
| jstanley wrote:
| If you turn an arbitrary byte string into a list of indexes
| into legal chess moves, then you can encode arbitrary data as
| chess games!
|
| (My implementation: https://incoherency.co.uk/chess-steg/ and
| explanation at https://incoherency.co.uk/blog/stories/chess-
| steg.html )
| adrianmonk wrote:
| > _Let an engine assign each legal move m some probability p of
| being played._
|
| This assumes you're storing moves of players who are good at
| chess!
| programjames wrote:
| Yep, make an LLM but for chess moves.
| Someone wrote:
| FTA: _"But they don't account for the majority of what I'm
| storing, which are PGNs."_
|
| If they're storing chess games, why do they try to compress
| individual moves?
|
| If you compress games, you can get much better compression. For
| example, in any board position, deterministically order all legal
| moves (they're already ignoring some illegal moves in their
| setup, so I think ignoring all is within the rules of the game),
| and write down the number of the move made.
|
| At game start, that will be an integer between 1 and 20 (16
| different pawn moves, 4 different knight moves). Black's reply
| similarly will be an integer between 1 and 20. For white's second
| move, the maximum will depend on the first move.
|
| To compress an entire game, use arithmetic encoding
| (https://en.wikipedia.org/wiki/Arithmetic_coding). If there are,
| on average, _n_ legal moves per position, that will use _2log n_
| bits per move.
|
| https://www.chessprogramming.org/Chess_Position: _"The maximum
| number of moves per chess position seems 218"_ , so that will
| give you less than 8 bits per move. In real life, it probably
| will be less than 6.
|
| The only reason I see why this wouldn't be an option is
| performance. Decoding would require move generation and thus be
| more complex.
|
| To improve on that, make a deterministic predictor of how likely
| moves are and use that to improve on the arithmetic encoder. That
| probably isn't worth it, though because of the increased decoding
| complexity.
| marcusbuffett wrote:
| I did end up doing something like this:
| https://mbuffett.com/posts/compressing-chess-moves-even-furt...
|
| You're spot on about the performance reason I didn't want to do
| this originally, but I did some testing and turns out move
| generation in the library I use is Blazing Fast and wouldn't be
| a bottleneck
| crdrost wrote:
| It's also that PGN allows you to encode illegal moves, which
| is important if your dataset contains over-the-board games.
|
| That you only got to about 10-12 bits per move is actually
| kind of sad in a way, because it means you're not doing
| substantially better than the approach where you record a
| move as just (delete-piece-at ____, recreate-that-piece-at
| ____) 12-bit pairs, where castles are implicitly only
| recorded by moving the king further than usual and
| underpromotion has to be explicitly recorded with an extra
| 12-bit move that is otherwise semantically impossible.
| joshka wrote:
| I just added your blog to my RSS reader and noticed that your
| blog title in the RSS is a bit weird (in case you were
| unaware): <title>Posts on A blog</title>
| <link>https://mbuffett.com/posts/</link>
| <description>Recent content in Posts on A blog</description>
| davguerrero wrote:
| For optimal compression, look into Huffman coding:
| https://en.wikipedia.org/wiki/Huffman_coding
| andruby wrote:
| I think Huffman would be good, but I don't think it would be
| optimal (as in, the smallest possible amount of bits)
| canucker2016 wrote:
| Huffman requires minimum one bit IIRC whereas Arithmetic
| coding or ANS (
| https://en.wikipedia.org/wiki/Asymmetric_numeral_systems )
| can use fewer if there's a skewed distribution.
|
| Arithmetic coding has been avoided in practical situations
| due to patents, but many of those have expired ( see the
| patent section of
| https://en.wikipedia.org/wiki/Arithmetic_coding or maybe not,
| lawyers tend to advise not even reading about patent
| specifics )
| ska wrote:
| The core tech patents have expired. They did have some
| interesting ones around hardware implementations, iirc.
|
| Overall the chilling effect was interesting - I found that
| many people doing academic work around compression didn't
| really know much about it, didn't teach it, just because it
| was a mess.
|
| The core idea is elegant, and easily implemented.
| marcusbuffett wrote:
| I mention Huffman in my follow-up post:
| https://mbuffett.com/posts/compressing-chess-moves-even-furt...
| , but chose arithmetic encoding instead.
| nmilo wrote:
| Not sure the author's use case the naive solution seems like just
| using 6 bits for the piece's original position, 6 bits for the
| destination, and 2 bits to designate promotion type, for 14 total
| bits. Everything else can be found from the board state, and I
| can't imagine a scenario where you have a move, want to convert
| to algebraic notation, and also don't know what the board is.
| andruby wrote:
| I was thinking the same thing. Often both squares will be close
| to each other, so I'm sure that a generic compression (eg zstd)
| can even reduce it even further by taking advantage of that
| proximity.
| jprete wrote:
| I had very similar thoughts but I think one can do much better,
| even.
|
| Each move only needs four bits to identify the piece because
| each side only has 16 pieces at maximum. The game replayer
| would need to keep track of each pawn's starting file though
| (including after promotions).
|
| Variable-length encodings of the move options help a lot. Pawns
| only need two bits because there are never more than four legal
| moves for a pawn - except for promotions, but just handle those
| specially for those exact cases (if a pawn moves to the last
| rank then encode the promotion with two bits, otherwise encode
| the next move). Knights and kings each need three bits for the
| move - encode each king-castling option as a move to the
| appropriate rear diagonal (normally off the board so not
| otherwise legal). Bishops and rooks need four bits (rank and
| direction). Queens only need five (three bits for rank, two for
| direction).
|
| This way you can get down to between six and nine bits per
| move.
| amluto wrote:
| > The game replayer would need to keep track of each pawn's
| starting file though (including after promotions).
|
| That seems unnecessary. Don't index the pieces based on their
| type and where they started -- index them based on their
| current location. So the piece the lexicographically first
| (rank, file) is piece 0, and 4 bits trivially identifies a
| piece once you know the color.
| jprete wrote:
| True, I hadn't thought of that.
| groby_b wrote:
| The problem is that the board state requirement makes searching
| for specific moves hard.
|
| But if you allow board state, you can do less than 14. 3 bits
| for the type of piece, 6 for the destination, 2 extra bits.
| (Promotion if destination is last/first line, disambiguation
| for pawns in en passant)
|
| You're still at about twice the theoretical minimum, I think. I
| vaguely recall an upper bound of 6 bits?
| Karellen wrote:
| It's kinda complex.
|
| I think my initial stab would be to encoding the source position
| (6 bits) and end position (6 bits) for a constant 12 bits per
| move.
|
| You don't need to store whether it's a capture, you can figure
| that out from the game state. You don't need to store
| disambiguations, there are none in that format. You don't need to
| store check/mate, you can figure that out from the game state.
|
| The only wrinkles (that I can tell) are castles and promotions.
| But you can get around this by the fact that kings and pawns have
| limited movement, so their legal destination squares are highly
| constrained.
|
| So you could signal a promotion by encoding the destination with
| opposite rank, and using the file to encode which piece.
| Promoting to a rook on c8 gets its destination encoded as a1 -
| "a" for a rook, and "1" to indicate a promotion.
|
| Similarly, you could encode a castle by encoding the king's move
| with opposite destination rank. Castling the king to f1 gets
| encoded as f8, and to g1 as g8.
| porkbrain wrote:
| As for promotion: in a valid chess position you could promote a
| single pawn at three files simultaneously. One by moving
| forward and then left and right capture. Additionally, you
| could have two pawns promoting on the same square, again with
| capture. And you can have a combination of both.
|
| Encode a white's move exd8=N with white pawns on e7 and c7 and
| three black queens on f8, d8 and b8.
| Karellen wrote:
| > you could promote a single pawn at three files
| simultaneously. One by moving forward and then left and right
| capture.
|
| Ooh, didn't think of that, thanks. Still there's enough
| constraint in the legal destination squares to work around
| that. There's at least half the board that's inaccessible to
| a pawn about to promote, which should be enough to encode any
| of the 3 legal destination squares and 5 possible promotion
| targets.
|
| Edit: maybe keep the destination file as-is, and use the
| destination rank to encode the promotion piece?
|
| > Additionally, you could have two pawns promoting on the
| same square
|
| The source square should disambiguate between those though,
| right?
| nmilo wrote:
| You can encode a promotion as the destination file plus rank
| 1-4 for the 4 different kinds, since pawns can't move
| backwards
| tzs wrote:
| > Similarly, you could encode a castle by encoding the king's
| move with opposite destination rank. Castling the king to f1
| gets encoded as f8, and to g1 as g8.
|
| You are overcomplicating this. For castling just record it as
| the King's source and destination. E.g., White kingside
| castling is e1g1, White castling queenside is e1c1, Black
| castling kingside is e8g8, and white castling queenside is
| e8c8.
|
| All king moves other than castling move the king at most one
| file, so when you see a king on e1 and the move is e1g1 or e1c2
| which is a move of two files you can infer that it must be
| castling.
|
| For promotion, I suggest splitting it into two cases: promotion
| to a queen and promotion to something else. I saw a post once
| on Reddit from someone who analyzed promotions from all games
| in the Lichess database, and 98.7% were to queens, so we'll
| make that case the simplest.
|
| I suggest that pawns that promote to a queen are simply
| recorded as moving to the promotion square. It is implicit that
| they promote to a queen. For example a pawn at b7 that moves
| straight forward and promotes to a queen would be recorded as
| b7b8. A pawn at b7 that captures on a8 and promotes to a queen
| would be recorded as b7a8.
|
| For pawns that promote to a rook, record the move as a move to
| a square one rank back from the promotion square. For example
| b7b8 promoting to rook would be recorded as b7b7, and b7a8
| promoting to a rook would be recorded as b7a7.
|
| Similarly for promotions to bishops. Record the destination two
| ranks back. So b7b8 promoting to bishop would be b7b6. Similar
| for knights but three ranks back, so b7a5 for a pawn at b7 that
| captures on a8 and promotes to a knight.
| Karellen wrote:
| > All king moves other than castling move the king at most
| one file, so when you see a king on e1 and the move is e1g1
| or e1c2 which is a move of two files you can infer that it
| must be castling.
|
| Yeah. For some reason I had a brain fart and thought that the
| two castles moved the king 1 and 2 files, instead of 2 and 3
| files, and that made me think you needed to disambiguate a 1
| file castle with a 1 file move.
|
| Which is clearly dumb. I blame insufficient coffee.
| evertedsphere wrote:
| Some thoughts I had while reading this, probably not very
| original: the fact that chess moves are not random and usually
| adhere to opening theory to some degree means you could get some
| value out of using something like a trie of some small finite
| depth, trained on a database of games, to compress common opening
| lines much better, at the cost of making things like 1. a4 a5 2.
| h4 h5 that aren't in the trie much costlier to represent. A
| practical way to actually do this would likely look like applying
| arithmetic coding to all of the possible length-N prefixes of the
| game.
|
| The mid- or lategame are also far from random and could probably
| be compressed in small "chunks" via a coder that would
| (effectively) learn to predict patterns of e.g. capture being
| followed by a recapture and cause such chunks to require fewer
| bits to represent.
|
| I'm not very knowledgeable about compression algorithms, though;
| I'm sure others will be able to provide corrections or
| references.
| Spark_Ed wrote:
| You can reference specific move numbers from specific master
| games to compress. But you still need to compress how you store
| those master games. What you're describing for mid/late game
| might make more sense like this: you use a fixed version of
| stockfish to suggest moves (depth and version should be locked
| to static values so it reproduces the same engine move each
| time). If it's within the first 8 suggestions, you can flag
| that it's an engine move with 4 bits. Decompression time is
| significantly larger, but it maximizes the space for cpu trade-
| off.
| andruby wrote:
| That's exactly what the author seems to have done and describes
| in their subsequent post:
| https://mbuffett.com/posts/compressing-chess-moves-even-furt...
| (published yesterday)
| ChrisArchitect wrote:
| Related follow-up from same author:
|
| _Compressing Chess Moves Even Further, To 3.7 Bits Per Move_
|
| https://mbuffett.com/posts/compressing-chess-moves-even-furt...
| modeless wrote:
| I wonder how this compares to using zstd with a dictionary.
| zitterbewegung wrote:
| Right now I think you could compress chess games for fun and
| profit .
|
| Since chess databases have come along higher level chess players
| can look pretty far ahead. We are reaching the point where if you
| study your opponent you can pregame it but studying what openings
| they make and prepping for openings.
|
| But now we have come to the point if you can memorize large
| portions of chess databases you know optimal play given the
| database because if anyone has played the game and gets any
| advantage out of it people will play the line. It would be
| interesting to take the chess databases see their compression
| ratio and then how long would it take to lose against swordfish.
| sdenton4 wrote:
| Use StockFish to predict the next move, and only store the diff
| between the actual move and the prediction.
|
| Or, better, get a ranked list of all possible moves from
| stockfish, and use a variable-length integer to encode the
| position in the list. Then the best move takes ~1 bit, and worse
| moves take more bits. (And we can do fun things like compute how
| bad a player is by how big their game is after compression.)
| gus_massa wrote:
| I almost agree, but as other comment says, it's probaby better
| to use the StocFish points to generate a probability and then
| use Huffman encoding.
| Retr0id wrote:
| You'd probably want to use something like CABAC rather than
| Huffman, since it deals better with dynamic probabilities.
| https://en.wikipedia.org/wiki/Context-
| adaptive_binary_arithm...
| gus_massa wrote:
| It looks like a very interesting comparison. I still not
| sure how to tranform StockFish points to probabilities.
| (Perhaps proportional to exp(-difference/temperature),
| where temperature is a magical number or perhaps it's like
| something/elo??? There are too many parameters to tweak.)
| Retr0id wrote:
| Given OP's dataset, you could probably figure out how to
| map them, experimentally (e.g. by making a histogram)
| Retr0id wrote:
| The more advanced your predictor is, the slower your compressor
| gets. OP has 600 Million moves to encode, how long does it take
| to ask Stockfish for its opinion on 600M board states? (and
| then again, every time you want to decompress) (not a
| rhetorical question btw, I know little about chess engines)
|
| I suspect the sweet spot here would be to use a much worse
| chess engine for the predictions, giving faster
| compression/decompression at the expense of the compression
| ratio.
| marcusbuffett wrote:
| Your intuition here is right on the money. I wrote a
| subsequent post with pretty much that exact approach:
| https://mbuffett.com/posts/compressing-chess-moves-even-
| furt...
| V-2 wrote:
| > Use StockFish to predict the next move, and only store the
| diff between the actual move and the prediction.
|
| This ties the algorithm down to one specific version of
| Stockfish, and configured identically (stuff like the hashtable
| size etc.), because all such factors will have an impact on
| Stockfish's evaluations. One factor changes, and you can't
| decompress the backup.
| bluedino wrote:
| Anyone know of a discussion of say, 8 bit chess computers and
| their techniques and tricks?
| omoikane wrote:
| Is this what you are looking for?
|
| https://news.ycombinator.com/item?id=36431917 - Video Chess
| disassembled and commented (2023-06-22)
|
| https://nanochess.org/video_chess.html
|
| Video Chess for Atari 2600 worked with just 128 bytes of
| memory.
| andjd wrote:
| I would be curious how this compares to a more-or-less off-the-
| shelf text compression algorithm like gzip. My guess is that over
| the entire database, this would be more efficient than the OP's
| ad-hoc implementation or any alternative mentioned here.
| pimlottc wrote:
| That's my first thought as well. Plain old gzip should do
| pretty well and provides a baseline to beat.
| ska wrote:
| Unlikely. Gzip and the like will do well with getting rid of
| the inherent redundancy of ascii coding but it's a general
| algorithm and can't take advantage of known structure.
| infogulch wrote:
| Has anyone tried columnar compression of full game states instead
| of bit packing moves?
|
| 64 squares * (Empty | (Black | White) * (King | Queen | Bishop |
| Knight | Rook | Pawn)) + (Black | White) * (Can castle queenside
| | Can castle kingside)
|
| It could be encoded as 64 4-bit int columns + 4 bool columns =
| 260 bits, most of which don't change between moves. Normal
| columnar encoding and compression strategies could probably
| reduce that to a few bits per board state.
| marcusbuffett wrote:
| I have actually, alongside my work of compressing moves. My
| compression scheme averages about 150 bits per position, or
| about 35% of the standard text notation in EPD format.
|
| The thing I optimized for is that there's very often repeated
| blank spaces or repeated pawns on the same rank.
|
| Also instead of storing castling status separately, it's stored
| as a separate piece on the appropriate rook.
|
| These take advantage of the fact that there's 6 pieces, so 3
| bits to encode them leaves two options remaining. One is
| PawnRepeating and the other is RookCastleAvailable, in my
| scheme.
|
| There's probably improvements to be made. I'll write a post on
| it when it's finalized.
| infogulch wrote:
| Using the extra available piece indexes for "rook that can
| castle" and "pawn that moved two squares" is a great idea. So
| e.g. moving the king would not only change the squares where
| the king moved from and to, but would also convert both of
| the rooks from "rook that can castle" to "rook that can't
| castle". Same for a pawn that moved two squares and can be
| captured by en passant.
|
| I guess you also need some way to encode which player's turn
| it is. Though maybe you could eliminate that by flipping the
| board on black's move and always encoding from the
| perspective of the last player's move?
|
| I'm curious about whether a naive columnar encoding scheme
| could beat a more complex encoding scheme, after compression.
| Not columnar in the sense of ranks and files, but columnar in
| the sense of data science storage formats (e.g. parquet),
| where each 'column' is the state of a specific square across
| different board states. Given 64 such columns, a game state
| is a single row. The hypothesis being that if you're encoding
| all the game states of a large number of games (say 1000),
| after RLE and compression etc you would see a net compression
| better than more complex encoding schemes. Given a big block
| of game states like this, then a single 'game' would be a
| list of offsets into the game states. This would probably
| also compress very nicely in columnar format.
|
| Now I want to try it...
| bee_rider wrote:
| Thinking about this has made me realize I really don't know
| anything about compression other than which arguments to pass to
| tar to get it to compress, haha.
|
| How would a typical compression algorithm do, losslessly
| compressing the whole game?
|
| Can you lossily compress the game? If so, you would end up, after
| decompressing, with a game that had some ambiguous moves, right?
| But, only one is valid all the way through. Why not lossily
| compress and then step through each game, checking if each move
| is valid? Is that even still considered lossy?
|
| Hypothetically you could end up with multiple valid games I
| guess... but they are just games, haha, who cares if a few are
| wrong?
| ska wrote:
| It's not lossy if you can always recover the correct data.
|
| Rather than use a general algorithm, your compressor would have
| to reason: I'd have to spend a bit to disambiguate x & y, but
| only x is a valid board state, so I won't bother and the
| decompressor also knows I won't bother so it will all work out.
|
| This sort of implicit coding can work, but it is fragile.
| robrenaud wrote:
| Only tangentially related, but I think the mid/late game is a
| much more interesting/useful/hard avenue for chess tools.
|
| Chess.com's opening explorer is pretty good, which it seems like
| this is kind of trying to compete against.
|
| What I struggle with is the post game review, where I make a
| dubious move but I don't actually understand the reason why it's
| bad. Sure, the engine shows better moves and the refutation to my
| move, but I often don't know why. The automatically generated
| explanation is usually pretty poor.
|
| I wish I had even a poor copy of Danya giving me commentary.
|
| Perhaps I should just lean in and try to implement something like
| this, but focus it more on coaching than commentary.
|
| https://arxiv.org/pdf/2212.08195.pdf
| jaryd wrote:
| You could check to see if one of his games has that position
| using https://fenfinder.com (site I built). Of course it's a
| long shot given the huge possibility-space of unique positions,
| but hey, one can hope :).
| V-2 wrote:
| I think that AI, overhyped as it may be, will truly get to
| shine when it comes to this aspect of chess - basically,
| tutoring.
| dhsysusbsjsi wrote:
| What I haven't seen discussed here much is optimising for
| developer maintenance. I think the author's solution is great for
| this; it's easy to understand and each move has a bit pattern.
| Easy to debug. Somebody taking over in the future will understand
| this compression.
|
| If on the other hand you can squeeze another 10% storage from
| Huffman encoded inverse tree lookup tables that only neckbeards
| understand, you're limiting your pool of people able to do
| maintenance on this system in the future when the author is long
| gone.
| adrianmonk wrote:
| > _I run a site that stores a ton of chess lines, something like
| 100 million in total._
|
| Because there are so many, one approach would be to sort the list
| of lines lexicographically to group similar ones together, then
| compress the result with a sliding window compression algorithm.
|
| The sliding window compression will avoid storing repeated parts
| a second time, and the first part of the lines will be repeated a
| lot because of the sorting. There may also be some other repeated
| sequences.
|
| This assumes that it's OK to sort the lines, though, which might
| or might not be true.
___________________________________________________________________
(page generated 2024-03-15 23:00 UTC)