[HN Gopher] How to Build a Chess Engine and Fail
___________________________________________________________________
How to Build a Chess Engine and Fail
Author : xlinux
Score : 100 points
Date : 2024-11-19 06:38 UTC (16 hours ago)
(HTM) web link (obrhubr.org)
(TXT) w3m dump (obrhubr.org)
| zelphirkalt wrote:
| All this I guess comes after laying the ground work, like
| implementing a bitboard representation or something else, and
| implementing the logic for being able to tell, whether a move is
| valid or invalid. That in itself is already a lot of work. iiuc
| the idea here is "merely" writing the engine, and take everything
| else as a given.
| obrhubr wrote:
| The game's implementation itself was furnished with the
| competition by Sebastian Lague. I completely agree that writing
| the move logic, validation, etc... is a difficult undertaking
| especially when it comes to optimisation which is what allows
| the bots built on top to perform well.
| zelphirkalt wrote:
| That makes sense for such a competition. Thanks for making
| this even clearer!
|
| Of course another interesting competition could be to develop
| _all_ of a chess game and engine and see how low in code
| people can go. But that would be an entirely different
| competition.
| james_marks wrote:
| Is there not an open standard for this? I glanced and didn't
| see anything, but seems crazy every chess project would need
| to define the game.
| qsort wrote:
| There are open components for generic purposes, e.g. draw
| the board given a PGN, show the moves, check if moves are
| legal and so on, but they aren't suitable for engine
| internals. There are several tricks with bits to store
| positions, hashing schemes for transposition tables and the
| likes that are key to unlock the maximum performance
| possible, and they are usually tied to the engine
| implementation.
| vunderba wrote:
| As part of my undergrad work, I used similar principles to the
| article (steady state genetic algorithms) to create a bot capable
| of playing Reversi where the fitness function was loosely defined
| as a set of "weights" on each square across the 8x8 board. These
| were used as part of the evaluation function in the classic
| minimax algorithm.
|
| We trained over the course of 5-6 days, and the end generation
| was capable of beating an intermediate player but got completely
| destroyed by experts. It was a fun project!
| janalsncm wrote:
| I am curious why the author chose a genetic algorithm rather than
| standard backprop to distill the evil. Logistic regression seems
| like a pretty reasonable choice and it'll be a lot faster than a
| genetic algorithm. Add an L1 penalty for sparsity.
|
| In the past I've tried distilling not just the eval function but
| the outcome of the search (something like depth 20) in a neural
| net. It kind of works, but not very well until the net is pretty
| big. Deepmind did something similar.
| obrhubr wrote:
| I didn't think of that, but I guess the conclusion as to
| performance of the model would have remained the same.
| janalsncm wrote:
| Reading over your article again, it seems like you made the
| same "mistake" as I did. In other words, the evals you're
| seeing in the Lichess PGN aren't the raw outputs of the
| Stockfish evaluation function, they're the output of the
| search, which is a highly nonlinear function of millions of
| evals. If I recall from their docs, it's something like depth
| 18, so your chances of distilling it with 20k parameters is
| essentially zero.
|
| (I put mistake in quotes here because Deepmind showed that
| with a server farm of TPUs it is possible to distill the
| search as well. So it's not _impossible_ per se.)
|
| But that's ok! You've created a super fast evaluation
| function. Instead of trying to distill the depth 18 output,
| it will be much more realistic to distill the depth zero
| output, the NNUE. If you rerun those FENs in Stockfish you
| can pretty quickly create an easier dataset.
| bob1029 wrote:
| > To be exact every participant has a maximum of 1024 tokens at
| his disposal to craft the best chess bot they can.
|
| I'd be quite tempted to try linear genetic programming with a
| variant of a Brainfuck-style UTM for this kind of constraint.
| Assuming 1024 tokens = 1024 bytes = 1024 instructions, I think
| there is _some_ degree of performance that is feasible.
|
| This is really interesting to me because hand-coding BF to do
| something like chess is absolutely beyond me. I wouldn't even
| know where to begin. A LGP-coded BF program would almost
| certainly outperform anything I could design with my own hands.
| obrhubr wrote:
| This is highly specific to C# which was the language imposed on
| all participants. But I agree that some languages might be
| especially adept to these kinds of tasks and it would be
| interesting to see which.
| senthilnayagam wrote:
| I built a list of simple games in pygame with code and solver
| generated by AI.
|
| Have implemented 2048, Minesweeper, Sudoku and chess. First three
| are single player games have made good progress.
|
| I still don't understand UCI interface and have not thought
| through chess solving. Hope will take another try this weekend
|
| https://github.com/muonium-ai/simplegames
| snickerbockers wrote:
| This is something that's been on my bucket list for a while now,
| although I'd do it using "normal" programming instead of DL.
|
| i feel like if you can bruteforce every possible board
| configuration for the next 3 turns and then make the move that
| leads to more desirable outcomes, that ought to be enough to
| thrash most amateurs. Probably not good enough to beat somebody
| who actually understands chess tactics on a deeper level, but I
| expect most non-masters are just "winging it" and making things
| up as they go along, so a machine that can think farther ahead
| than you would win more often than not.
|
| But like I said, this is all just me fantasizing about a project
| I haven't even started yet so my hypothesis could easily be
| wrong.
| immibis wrote:
| I did exactly this when the Laser chess variant was posted on
| Hacker News about last year. It does indeed work pretty well.
| Players eventually learn patterns that look good in 3 moves but
| not with a greater depth, though.
|
| It's how real chess engines like Stockfish work, but they are
| highly optimized so they can search extreme depths.
| vunderba wrote:
| > bruteforce every possible board configuration for the next 3
| turns and then make the move that leads to more desirable
| outcomes
|
| I'd recommend familiarizing yourself with the minimax algorithm
| - since that's exactly what it does where "desirable outcome"
| is essentially the _tunable_ evaluation aspect of the equation.
|
| https://en.wikipedia.org/wiki/Minimax
| janalsncm wrote:
| > make the move that leads to more desirable outcomes
|
| This is kind of begging the question, right? How do you know
| what a "desirable" outcome is, other than mate-in-N positions?
| (In your case, N<=3.) The point of an evaluation function is to
| take a position and return a value which describes how
| desirable it is.
| vikhack wrote:
| I don't really know how to do that but I'm really putting in my
| effort
| skeltoac wrote:
| I especially enjoyed the link to The Bitter Lesson by Rich
| Sutton, which I hadn't read before. Now I wonder what
| "discoveries" have been built into today's AI models and how they
| might come to be detrimental.
|
| http://www.incompleteideas.net/IncIdeas/BitterLesson.html
| PaulHoule wrote:
| Has there ever been a serious effort to play chess by "rule-
| based" methods as opposed to search?
|
| (... other than the evaluation function being based on
| handwritten or learned rules)
| janalsncm wrote:
| In the past, yes. That was essentially the approach up until
| the 1980s. Computers were too slow to run millions of
| evaluations per move, so you got a lot of efforts in a more
| heuristic-driven direction.
|
| Examples of heuristic engines are Hans Berliner's CAPS-II and
| PARADISE by David Wilkins. PARADISE used a database of 200
| rules.
|
| There are also engines that try to retroactively apply rules
| to Stockfish evals but they're fairly confusing in my
| experience. It's like being told the answer and having to
| come up with a reason after the fact.
| janalsncm wrote:
| Probably LLM maximalist ideas that suggest infinite "scaling
| laws" for LLMs (they are not laws), leading to ridiculous
| conclusions like building a $1 trillion cluster is the fastest
| way to AGI. People like Leopold Aschenbrenner are in this camp.
|
| Imagine if LLMs were the only way we had to play chess. You'd
| need a centralized server and peak performance wouldn't even
| best a grandmaster. You spend $1 trillion building a super
| cluster because that's all you know.
|
| < - - This is where AI is today.
|
| And then some startup creates Stockfish, a chess engine better
| than any LLM or grandmaster and can run on a smartphone.
| deadbabe wrote:
| Why build a chess engine these days, just use an LLM?
| emptybits wrote:
| Why consume when you can create?
|
| Building is often more fun and almost always more educational
| than just using. Even in "failure". :-)
| ziddoap wrote:
| Fun, a learning experience, practicing a skill set, curiosity,
| etc.
|
| Take your pick.
| achierius wrote:
| LLMs are quite bad at chess actually, even compared to human
| players -- and certainly compared to proper modern chess
| engines
| PaulHoule wrote:
| They usually struggle to always generate valid moves.
|
| And that's pivotal.
|
| If you have a program which always makes valid moves and
| gives up when it has lost you wrote a proper chess playing
| program. It may play badly, but it plays.
| janalsncm wrote:
| Your question sounds flippant but it's actually quite deep. Why
| aren't LLMs good at chess? The answer is likely that to be good
| at chess at some level requires search (and probably some
| ordering constraint on your evaluation function too that I'm
| not smart enough to figure out).
|
| LLMs aren't searching. They are memorizing. This explains their
| extremely poor performance on out of domain positions, whereas
| Stockfish can easily understand them.
| binarymax wrote:
| Doesn't look like failure to me - looks like a fun project that
| disproved your distillation hypothesis!
|
| Great write up and thanks for sharing.
___________________________________________________________________
(page generated 2024-11-19 23:00 UTC)