[HN Gopher] Show HN: Easily train AlphaZero-like agents on any e...
___________________________________________________________________
Show HN: Easily train AlphaZero-like agents on any environment you
want
Author : s-casci
Score : 68 points
Date : 2023-12-20 11:30 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Y_Y wrote:
| Maybe I'll finally be able to train a worthy opponent for
| Carcassonne!
| s-casci wrote:
| If you do that, please submit a PR!
| vermaat wrote:
| Noob here. How is this different than reinforcement learning
| libraries like: OpenAI's Gym TensorFlow's TF-Agents ReAgent by
| Meta DeepMind's OpenSpiel Amazon SageMaker RL
| s-casci wrote:
| There certainly are other projects around AlphaZero, I'd say
| this is simpler and much more basic
| JoeDaDude wrote:
| Whoa, so cool!! You know what would be even cooler? if you could
| have it play any game described by the Game Description Language
| [1]. it looks like the project is most of the way there, since
| the environment methods looks like calls to data that would be
| included in a GDL description.
|
| [1]. https://en.wikipedia.org/wiki/Game_Description_Language
| s-casci wrote:
| Interesting, I didn't know about it... Modifying the existing
| environments' interfaces shouldn't be too difficult. Feel free
| to submit a PR!
| ZiggerZZ wrote:
| Didn't know about this formalism! Are there any Python
| libraries that support GDL?
| JoeDaDude wrote:
| I learned about GDL several years ago from folk working on
| General Game Player AIs, that is AI that could play any well
| described game. They were working primarily in Java at the
| time. A casual search show that, shockingly, there is no
| python library available for GDL (yet), they are still using
| Java.
|
| http://www.ggp.org/
|
| https://github.com/ggp-org/
| Entze wrote:
| It is research code (read very unpolished) but you can get
| inspiration from pyggp [1]. More specifically, the
| game_description_language module. I implemented pyggp for my
| masters thesis. It's a proof of concept and will be iterated
| upon.
|
| [1] github.com/Entze/pyggp
| vldmrs wrote:
| Are there any sample game descriptions for some games ? I have
| checked all the links but couldn't find a single example.
| JoeDaDude wrote:
| At one time there was a Stanford course on the subject of
| General Game Players. The final project was to submit a
| player and see how it performed on a small set of games
| described by GDL. These were variations on Backgammon,
| Othello, and similar, with some rule changes.
|
| While much of the course material survives [1], those
| rulesets do not. The only GDL example I could find was the
| somewhat trivial example of Tic Tac Toe, see section 2.6 Tic
| Tac Toe Game Rules here [2].
|
| [1]. http://ggp.stanford.edu/public/lessons.php
|
| [2]. http://ggp.stanford.edu/chapters/chapter_02.html
|
| An email for Michael Genesereth, teacher of the course, is on
| the course website. I might shoot an e-mail and see if he has
| GDL files to share.
| ilc wrote:
| How would this handle games with random or incomplete
| information? Such as UNO, craps, etc. (I'd love to see what this
| thing does with a known losing game, just as a validation.)
| s-casci wrote:
| AlphaZero has been made for perfect information games. That
| said, the Monte Carlo Tree Search in the library can be run
| with any agent that implements a value and policy function. So,
| while the AlphaZeroAgent in agents.py wouldn't fit the problem
| you are describing, implementing something like Meta's ReBeL
| (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-
| bot...) shouldn't be an impossible task. The Monte Carlo Tree
| Search algorithm in mcts.py has been written to be modular from
| the start exactly to do something like this!
| gwern wrote:
| The standard AlphaZero doesn't handle that. For that you'd need
| to graduate to more complex variants like the aforementioned
| ReBeL, AlphaZe*
| https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10213697/ or
| BetaZero https://arxiv.org/abs/2306.00249 or ExIt-OOS
| https://arxiv.org/abs/1808.10120 or Player of Games
| https://arxiv.org/abs/2112.03178#deepmind .
|
| (You could also move straight to MuZero variations:
| https://arxiv.org/abs/2106.04615#deepmind
| https://openreview.net/forum?id=X6D9bAHhBQ1#deepmind
| https://openreview.net/forum?id=QnzSSoqmAvB )
| mdaniel wrote:
| This repo and the code files appear to be missing any licensing
| details
|
| You'll also likely want to mention the "needs python >= 3.8" in
| the readme
| https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f...
| OT1H, I would hope folks are keeping their pythons current, but
| OTOH dev environments are gonna dev environment
| s-casci wrote:
| Good catches, I've added the missing information. Thanks
| tomatovole wrote:
| Do you have evaluations for how well the trained agents do (e.g.
| for chess, go, etc)?
| viraptor wrote:
| I think this glances over don't details here:
|
| > get_legal_actions(): returns a list of legal actions
|
| What's the expectation around your actions? It's not just 0..n
| for current actions with any arbitrary ordering, right? There
| needs to be some consistency between steps for training.
| s-casci wrote:
| The policy function outputs the probability of taking every
| possible (legal or illegal) action. Once you have a way of
| indexing those actions, both the policy and the game need to
| refer to the same thing when indexing the same number
___________________________________________________________________
(page generated 2023-12-20 23:01 UTC)