[HN Gopher] Solving Probabilistic Tic-Tac-Toe
       ___________________________________________________________________
        
       Solving Probabilistic Tic-Tac-Toe
        
       Author : Labo333
       Score  : 63 points
       Date   : 2024-06-11 14:25 UTC (2 days ago)
        
 (HTM) web link (louisabraham.github.io)
 (TXT) w3m dump (louisabraham.github.io)
        
       | H8crilA wrote:
       | Are you sure there is only one solution to the set of max/min
       | equations? I've been trying to solve the probabilistic tic-tac-
       | toe since it was published, and I believe there are actually
       | multiple solutions here.
       | 
       | In fact I'm not sure that given a board there exists a best
       | solution. I.e. I suspect that for most boards and for most fixed
       | strategies you can create a counter-strategy that outperforms the
       | given strategy (but perhaps then loses to some other strategy).
       | 
       | My code looks completely different, but I suspect you can uncover
       | alternate solutions if you mess with the binary search. For
       | example replace `x = (a + b) / 2` with `x = (a + b) / 3`, or even
       | better pick a random point between a and b.
        
         | GistNoesis wrote:
         | The article give the bellman equation that you need to solve.
         | 
         | V(s)=max c scxV'(s+c) + fcxV'(s-c) + ncxV'(s)
         | 
         | V'(s)=min c scxV(s-c) + fcxV(s+c) + ncxV(s)
         | 
         | You have to solve for all states simultaneously.
         | 
         | One way to solve them is by using fixed point iteration. aka
         | successive approximation.
         | 
         | Pick some small alpha and update the value function in a
         | gradient ascent kind of way.
         | 
         | Vi+1(s) = (1-alpha) _Vi(s)+alpha_ ( max c scxV'i(s+c)+
         | fcxV'i(s-c) + ncxV'i(s) )
         | 
         | V'i+1(s) = (1-alpha) _V 'i(s)+ alpha _ ( min c scxVi(s-c) +
         | fcxVi(s+c) + ncxVi(s) )
         | 
         | https://en.wikipedia.org/wiki/Fixed-point_iteration
        
         | salamo wrote:
         | I think there is. In this case we have to clarify that an
         | optimal strategy can still lose due to random chance, so the
         | optimal strategy will only be clear after some large number of
         | games.
        
         | Ruphin wrote:
         | There is no hidden information and turns are sequentially. This
         | means both players have full access to all information, and
         | that implies there must be a single "best" move. If the board
         | is at a certain state and one player is to play next, the
         | strategy of the opponent is irrelevant for determining the
         | optimal move.
         | 
         | Proof:
         | 
         | Suppose that it is possible that different strategies exist
         | that outperform each other. Lets consider the case that there
         | are three "optimal" strategies that counter each other, call
         | them Rock Paper and Scissors. Since these are the three
         | "optimal" strategies, both players know the exact move produced
         | by these strategies. The first important part to note is that
         | there is no cost to changing strategy during the game. If an
         | opponent has one fixed strategy, say Rock, the "optimal"
         | strategy is to change your strategy to Paper. Since you have
         | full information on the state of the board, you do not have to
         | guess the strategy of the opponent, because there is no hidden
         | information. Now you could consider the opponent strategy to be
         | hidden information, but this doesn't hold for the following
         | reason: If knowing the opponent strategy would alter your
         | choice of move (by for example changing to Rock that
         | specifically counters his hidden Scissors), the opponent can
         | freely change to Paper after you made your move, because he can
         | see that you changed to Rock before making his next move. If it
         | was impossible for the opponent to tell that you changed your
         | strategy, that implies that each strategy produces the same
         | board state, and that implies the strategies are identical.
        
           | blharr wrote:
           | Aren't the same conditions true for chess, though there is no
           | proof that a best strategy exists?
           | 
           | What if it's not just rock/paper/scissors, and there are more
           | than 3 strategies?
        
             | sobellian wrote:
             | At least one of the players has a strategy that cannot be
             | countered though. That is, either white can force a win,
             | black can force a win, or both players can force a draw.
             | There is no situation where every strategy can be
             | countered.
        
       | phoronixrly wrote:
       | I love how the other day when the Show HN[1] post was submitted
       | there were a bunch of 'Oh what a great problem to solve with AI',
       | when it was painfully obvious that it is not necessary... Is
       | there astroturfing here or is AI the current hammer that everyone
       | defaults to?
       | 
       | [1] https://news.ycombinator.com/item?id=40635397
        
         | Ukv wrote:
         | Unless there's a comment I'm missing, I don't see anyone
         | recommending deep learning and neural networks. Comments such
         | as "the AI is making really bad plays" mean in the videogame
         | sense (the CPU player), as far as I can tell - which was just a
         | simple heuristic in the original.
        
         | renewiltord wrote:
         | Interesting. When you have a hammer ("people are talking about
         | AI but you don't need AI") everything (including "I'm trying to
         | make a game AI") looks like a nail. You thought they were
         | talking about LLMs, ML, DL? It's a video game AI, dude. If you
         | have trouble understanding English you can use an LLM to
         | translate for you. Just ask it "Are they talking about LLMs
         | here?" after quoting the thing.
        
       | primitivesuave wrote:
       | I implemented an expectiminimax player without the neutral state
       | optimization that you described 2 days ago:
       | https://keshav.is/coding/pt3
       | 
       | It seems like even a naive search of the game tree in the browser
       | produces a fairly strong computer opponent to a human player. It
       | would be interesting to see if this optimization produces a
       | better computer player to standard expectiminimax as I
       | implemented here:
       | https://github.com/keshavsaharia/pt3/blob/master/src/minimax...
        
       | igpay wrote:
       | Hey all, I'm the author of Probabilistic Tic-Tac-Toe. I'm
       | currently working with Louis to integrate this solver into the
       | game itself so that people can play against it. I'm hoping to
       | have it released by EOD and will update this comment when it's
       | ready.
        
         | 317070 wrote:
         | I'm actually interested in being coached by the solver, so it
         | can tell me when I made a suboptimal move and what the optimal
         | move would have been.
         | 
         | Like in chess, playing against stockfish isn't very fun, but
         | having it as a way to get better is fun.
        
       ___________________________________________________________________
       (page generated 2024-06-13 23:00 UTC)