[HN Gopher] Show HN: Easily train AlphaZero-like agents on any e...
       ___________________________________________________________________
        
       Show HN: Easily train AlphaZero-like agents on any environment you
       want
        
       Author : s-casci
       Score  : 68 points
       Date   : 2023-12-20 11:30 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Y_Y wrote:
       | Maybe I'll finally be able to train a worthy opponent for
       | Carcassonne!
        
         | s-casci wrote:
         | If you do that, please submit a PR!
        
       | vermaat wrote:
       | Noob here. How is this different than reinforcement learning
       | libraries like: OpenAI's Gym TensorFlow's TF-Agents ReAgent by
       | Meta DeepMind's OpenSpiel Amazon SageMaker RL
        
         | s-casci wrote:
         | There certainly are other projects around AlphaZero, I'd say
         | this is simpler and much more basic
        
       | JoeDaDude wrote:
       | Whoa, so cool!! You know what would be even cooler? if you could
       | have it play any game described by the Game Description Language
       | [1]. it looks like the project is most of the way there, since
       | the environment methods looks like calls to data that would be
       | included in a GDL description.
       | 
       | [1]. https://en.wikipedia.org/wiki/Game_Description_Language
        
         | s-casci wrote:
         | Interesting, I didn't know about it... Modifying the existing
         | environments' interfaces shouldn't be too difficult. Feel free
         | to submit a PR!
        
         | ZiggerZZ wrote:
         | Didn't know about this formalism! Are there any Python
         | libraries that support GDL?
        
           | JoeDaDude wrote:
           | I learned about GDL several years ago from folk working on
           | General Game Player AIs, that is AI that could play any well
           | described game. They were working primarily in Java at the
           | time. A casual search show that, shockingly, there is no
           | python library available for GDL (yet), they are still using
           | Java.
           | 
           | http://www.ggp.org/
           | 
           | https://github.com/ggp-org/
        
           | Entze wrote:
           | It is research code (read very unpolished) but you can get
           | inspiration from pyggp [1]. More specifically, the
           | game_description_language module. I implemented pyggp for my
           | masters thesis. It's a proof of concept and will be iterated
           | upon.
           | 
           | [1] github.com/Entze/pyggp
        
         | vldmrs wrote:
         | Are there any sample game descriptions for some games ? I have
         | checked all the links but couldn't find a single example.
        
           | JoeDaDude wrote:
           | At one time there was a Stanford course on the subject of
           | General Game Players. The final project was to submit a
           | player and see how it performed on a small set of games
           | described by GDL. These were variations on Backgammon,
           | Othello, and similar, with some rule changes.
           | 
           | While much of the course material survives [1], those
           | rulesets do not. The only GDL example I could find was the
           | somewhat trivial example of Tic Tac Toe, see section 2.6 Tic
           | Tac Toe Game Rules here [2].
           | 
           | [1]. http://ggp.stanford.edu/public/lessons.php
           | 
           | [2]. http://ggp.stanford.edu/chapters/chapter_02.html
           | 
           | An email for Michael Genesereth, teacher of the course, is on
           | the course website. I might shoot an e-mail and see if he has
           | GDL files to share.
        
       | ilc wrote:
       | How would this handle games with random or incomplete
       | information? Such as UNO, craps, etc. (I'd love to see what this
       | thing does with a known losing game, just as a validation.)
        
         | s-casci wrote:
         | AlphaZero has been made for perfect information games. That
         | said, the Monte Carlo Tree Search in the library can be run
         | with any agent that implements a value and policy function. So,
         | while the AlphaZeroAgent in agents.py wouldn't fit the problem
         | you are describing, implementing something like Meta's ReBeL
         | (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-
         | bot...) shouldn't be an impossible task. The Monte Carlo Tree
         | Search algorithm in mcts.py has been written to be modular from
         | the start exactly to do something like this!
        
         | gwern wrote:
         | The standard AlphaZero doesn't handle that. For that you'd need
         | to graduate to more complex variants like the aforementioned
         | ReBeL, AlphaZe*
         | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10213697/ or
         | BetaZero https://arxiv.org/abs/2306.00249 or ExIt-OOS
         | https://arxiv.org/abs/1808.10120 or Player of Games
         | https://arxiv.org/abs/2112.03178#deepmind .
         | 
         | (You could also move straight to MuZero variations:
         | https://arxiv.org/abs/2106.04615#deepmind
         | https://openreview.net/forum?id=X6D9bAHhBQ1#deepmind
         | https://openreview.net/forum?id=QnzSSoqmAvB )
        
       | mdaniel wrote:
       | This repo and the code files appear to be missing any licensing
       | details
       | 
       | You'll also likely want to mention the "needs python >= 3.8" in
       | the readme
       | https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f...
       | OT1H, I would hope folks are keeping their pythons current, but
       | OTOH dev environments are gonna dev environment
        
         | s-casci wrote:
         | Good catches, I've added the missing information. Thanks
        
       | tomatovole wrote:
       | Do you have evaluations for how well the trained agents do (e.g.
       | for chess, go, etc)?
        
       | viraptor wrote:
       | I think this glances over don't details here:
       | 
       | > get_legal_actions(): returns a list of legal actions
       | 
       | What's the expectation around your actions? It's not just 0..n
       | for current actions with any arbitrary ordering, right? There
       | needs to be some consistency between steps for training.
        
         | s-casci wrote:
         | The policy function outputs the probability of taking every
         | possible (legal or illegal) action. Once you have a way of
         | indexing those actions, both the policy and the game need to
         | refer to the same thing when indexing the same number
        
       ___________________________________________________________________
       (page generated 2023-12-20 23:01 UTC)