[HN Gopher] ChatGPT's Chess Elo is 1400
       ___________________________________________________________________
        
       ChatGPT's Chess Elo is 1400
        
       Author : dbrereton
       Score  : 202 points
       Date   : 2023-03-17 17:39 UTC (5 hours ago)
        
 (HTM) web link (dkb.blog)
 (TXT) w3m dump (dkb.blog)
        
       | paxys wrote:
       | ChatGPT is a brilliant piece of tech but random people trying it
       | out for five minutes and making wild claims does it disservice.
       | More so when such clickbait headlines are shared all over the
       | internet as fact.
       | 
       | It is a language model. It cannot play chess. It can't even play
       | tic-tac-toe, which was trivially solved by machines decades ago.
       | Here's the result of a game I just tried:
       | 
       | X (me) - 0,0
       | 
       | O (chatGPT) - 1,1
       | 
       | X - 0,1
       | 
       | ChatGPT instead put my move at 0,2 for some reason
       | 
       | O - 2,2
       | 
       | Regardless of whether my move was at 0,1 or 0,2, I easily won.
       | And after I won it just ignored the outcome and continued
       | playing. Plus this was after telling it to play a perfect game.
        
       | andrepd wrote:
       | ChatGPT playing chess:
       | https://www.youtube.com/watch?v=GneReITaRvs
        
       | liampulles wrote:
       | With respect to the points here on how illegal moves invalidate
       | ChatGPT as a valid Chess AI, consider the following:
       | 
       | AI.v1: Feed ChatCPT the move history and ask for the next move.
       | 
       | AI.v2: Wrap AI.v1 with additional logic to try and translate any
       | invalid response as a similar/close valid move, else treat it as
       | a resignation.
       | 
       | I would say AI.v2 is a "valid" Chess AI since it obeys the rules
       | of chess. Also I think it is reasonable given how minimal this
       | "wrapping" is to attribute the underlying ability to ChatGPT.
        
         | Closi wrote:
         | It's amazing how quick some people on HN are to dismiss how
         | impressive it is for a generic language model to be able to
         | play chess at any level - or indeed other feats of ChatGPT.
         | 
         | It's utterly staggering that it can work as well as it can on
         | problems like this at such an early stage in this technologies
         | lifecycle.
        
       | the_af wrote:
       | This is GPT4, right? Because ChatGPT (GPT-3) still fails to
       | provide a legal game of Tic Tac Toe with this prompt:
       | 
       | > _" Let's play Tic Tac Toe. You are O, I'm X. Display the board
       | in a frame, with references for the axes"_
       | 
       | It failed to recognize that I won.
       | 
       | Then continued playing (past the end), played illegally over a
       | move I had already done, obtained a line of 3 for itself, and
       | still doesn't acknowledge the game has ended.
        
         | sumtechguy wrote:
         | For me when I tried it, it forgot the board half way thru and
         | decided there were no more moves possible (4 left). Then forgot
         | one of my moves.
         | 
         | It is an NLM it does not have the ability to reason. In my case
         | the NLM 'next iteration' was a half a game with a missing move
         | and 'nothing left to do'.
        
         | [deleted]
        
         | yieldcrv wrote:
         | No, it is GPT 3.5
         | 
         | and the real answer to every debate about capabilities comes
         | down to "you're prompting it wrong".
        
           | travisjungroth wrote:
           | If someone says that and shows a working prompt, that's what
           | the debate _should_ come down to. It's over. The person with
           | the working prompt wins.
        
           | dwaltrip wrote:
           | Can you describe how to prompt gpt to play a proper game of
           | tic tac toe? I was disappointed gpt-4 didn't really improve
           | much on this front, at least with how I was prompting it.
           | 
           | Overall I'm very blown away by the generative AI
           | developments. I'm also very curious about which types of
           | prompts are most effective for each task, etc.
        
       | meghan_rain wrote:
       | I too noticed (for a usecase totally unrelated to chess, namely
       | code generation) that ChatGPT3.5 gave better answers than GPT4.
       | The 3.5 answer was exactly what I wanted, GPT4 was wrong.
       | 
       | Does thay mean we have plateaued?
        
         | alfalfasprout wrote:
         | It's inevitable that LLMs will plateau. They'll increase their
         | abilities in certain areas but ultimately core flaws of their
         | architecture and training approach will likely require another
         | rethink. Unclear what that is yet (though Yann Lecun seems to
         | think world models are the path forward).
         | 
         | We've gone through the "hype" phase. Now I suspect the next few
         | years will be a lot of growth finding how to apply LLMs,
         | creating good interfaces for them, and running them cheaply.
         | Paying OpenAI for API access without true fine tuning, etc. is
         | a hard sell.
        
         | imtringued wrote:
         | I think they invested the parameters into supporting multimodal
         | inputs (images).
        
       | amelius wrote:
       | I'd be more interested to know how well it performs on a new
       | game, i.e. one for which it didn't have terabytes of training
       | data.
        
       | ricardo81 wrote:
       | Better than me then.
       | 
       | But does it give credit to who taught it. These models are
       | basically a scrape of the best of humankind and a claim that it's
       | their own.
        
         | spywaregorilla wrote:
         | Do you give credit to people you've played in the past when you
         | play a game of chess?
        
       | andrepd wrote:
       | > Occasionally it does make an illegal move, but I decided to
       | interpret that as ChatGPT flipping the table and saying "this
       | game is impossible, I literally cannot conceive of how to win
       | without breaking the rules of chess." So whenever it wanted to
       | make an illegal move, it resigned.
       | 
       | Lmao. ChatGPT can play chess, as long as you ignore the times
       | when it can't x) Brilliant.
        
       | agolio wrote:
       | Chessnetwork did a video about this a few weeks ago FYI -
       | https://www.youtube.com/watch?v=svlIYFpsWs0
       | 
       | Seems its strength is mostly in memorisation of the book
       | openings. Still impressive nonetheless.
        
       | xkcd1963 wrote:
       | It's like saying the agglomeration of media handling chess has an
       | ELO of 1400
        
       | Reptur wrote:
       | It can't even play tic tac toe without making mistakes. My prompt
       | told it to play a perfect game, predicting all next possible
       | moves, the goal is stalemate or win. It would still make sub-
       | optimal moves and fail at this. Even when I explained what went
       | wrong, it would then apologize and do another wrong move.
        
         | meh8881 wrote:
         | Why would you think telling it to play a perfect game,
         | predicting all next possible moves, would help?
         | 
         | That definitely made it worse.
        
       | Andrew_nenakhov wrote:
       | I tried playing with it and it barely makes any correct move:
       | forgets whos turn to move, doesn't observe the rules, incorrectly
       | displays move results.
       | 
       | So I guess that if we ask it to make a move repeatedly and accept
       | the move only when it occasionally makes a legal move, it'll show
       | some results, but I don't think it is what can be interpreted as
       | a chess engine with some meaningful rating.
        
         | andrepd wrote:
         | Yeah, I call bullshit, i.e. selective deletion of terrible
         | games. Ignore those when it plays catastrophically and it might
         | look alright.
        
       | aroo wrote:
       | I wonder how it would perform if you feed it a comprehensive
       | chess rulebook, and to avoid all the illegal moves possible
        
       | teej wrote:
       | I just deployed a GPT-4 powered chess bot to lichess. You can
       | challenge it here: https://lichess.org/@/oopsallbots-gpt-4
        
         | dsjoerg wrote:
         | Very cool! Are you doing prompt engineering, fine-tuning, both,
         | something else?
         | 
         | I'm wondering if it'd be cool to have a chess contest where all
         | the bots are LLM powered. Seems to me like the contest would
         | have to ban prompt engineering -- would have to have a fixed
         | prompt -- otherwise people would sneak chess engines into their
         | prompt generation.
        
           | teej wrote:
           | I wanted it to be fun and actually complete games. I started
           | with this, with a few minor tweaks: https://github.com/Tmate6
           | /Lichess_ChatGPT_ChatBot/blob/main/...
           | 
           | This approach sends along the list of legal moves in the
           | prompt if it attempts an illegal move. That seems to work
           | well at getting playable moves.
        
       | Animats wrote:
       | A real question is whether it has developed an internal board
       | model of chess. It has a training set of chess games as strings
       | of chess notation. That's enough to play "book" games, and that's
       | apparently working. Can it do as well as it is doing without a
       | board model? Or has it somehow developed an internal model of the
       | game state?
       | 
       | This is a system with no look ahead. Any look ahead took place in
       | the past, during training. It's amazing that this works.
       | 
       | I notice the prompt says "No explanation needed". What happens if
       | you ask for one?
        
       | josh2600 wrote:
       | I want to know what the author's ELO is (forgive me if it was
       | disclosed and I am unobservant).
       | 
       | Maybe ChatGPT trainers will eventually have ELO's.
        
       | latexr wrote:
       | > These people used bad prompts and came to the conclusion that
       | ChatGPT can't play a legal chess game. (...)
       | 
       | > With this prompt ChatGPT almost always plays fully legal games.
       | 
       | > Occasionally it does make an illegal move, but I decided to
       | interpret that as ChatGPT flipping the table (...)
       | 
       | > (...) with GPT4 (...) in the two games I attempted, it made
       | numerous illegal moves.
       | 
       | So you've ostensibly1 found a way to reduce the error rate and
       | then deliberately ignored the cases where it failed. In short: it
       | may play valid chess _under certain conditions_ but can't be
       | trusted to do so. That doesn't contradict previous findings.
       | 
       | 1 19 games is a small sample and the supposedly more advanced
       | system failed in your tries.
        
         | psychphysic wrote:
         | That's how one uses any tool.
        
           | qwytw wrote:
           | The behavior of pretty much every other tool is much easier
           | to interpret though.
        
           | Out_of_Characte wrote:
           | Yes, but it also completely invalidates the measurement of a
           | 1400 elo rating. By comparison, any player making an illegal
           | move is forfeiting the game, almost all people from ~300 elo
           | can play without making illegal moves, chatgpt cant.
        
             | nsxwolf wrote:
             | Why do illegal moves forfeit? In online play, they're
             | validated. You can't make illegal moves. What's the ELO
             | score if ChatGPT is corrected, and chooses a new move?
        
             | renewiltord wrote:
             | The article:
             | 
             | > _So whenever it wanted to make an illegal move, it
             | resigned._
             | 
             | You:
             | 
             | > _By comparison, any player making an illegal move is
             | forfeiting the game..._
             | 
             | By comparison indeed.
        
             | sebzim4500 wrote:
             | ChatGPT did forfeit whenever it made an illegal move, read
             | the article.
        
               | swatcoder wrote:
               | No, the writer arbitrarily decided to _interpret_ illegal
               | moves as resignations in order to support the conclusion
               | they wanted. That 's very different and grossly
               | unscientific.
        
               | mynameisvlad wrote:
               | I mean, that's more lenient than the official
               | "interpretation" (rule) which is that your second illegal
               | move results in a forfeit.
        
               | epups wrote:
               | This is not a scientific paper, and I at least find this
               | decision justified, as he could have been more lenient
               | and grab headlines with a bigger ELO.
        
             | hgsgm wrote:
             | All this above, and people are claiming that ChatGPT lacks
             | human level comprehension of the text it consumes.
             | 
             | In Chess.com, you absolutely can attempt an illegal move,
             | and many players do, and you will not get punished for it,
             | so chatgpt is _better_ then a 1400 human player.
        
             | ncallaway wrote:
             | > almost all people from ~300 elo can play without making
             | illegal moves
             | 
             | I don't believe you. Are you giving those people a
             | restricted move set (i.e. computer chess, where it will
             | _only_ allow legal moves)? Because if you give people an
             | unrestricted board, I _guarantee_ you people will make lots
             | of illegal moves.
             | 
             | Me: Moves pawn
             | 
             | Opponent: You can't do that, you exposed your king to
             | check.
             | 
             | Me: Oops, sorry, you're right.
        
         | Waterluvian wrote:
         | I'm going to float something ridiculous:
         | 
         | An illegal move is a valid play. You might not get caught. I
         | think there are some Magnus games where illegal moves went
         | overlooked and impacted the game.
         | 
         | You could interpret this as "ChatGPT wants to cheat sometimes."
         | But I personally interpret it as "ChatGPT doesn't understand
         | what it's doing. It's just a really really good simulacrum."
        
         | dudeinjapan wrote:
         | Obviously the article should be taken with a giant grain of
         | salt. That being said, not many things what aren't designed to
         | play chess can play chess, with or without coaxing. My dog
         | cannot, for instance, nor can my coffee table.
        
           | hectorlorenzo wrote:
           | > My dog cannot, for instance, nor can my coffee table.
           | 
           | You must be giving them the wrong prompts.
        
             | ogogmad wrote:
             | [redacted]
        
               | AndrewPGameDev wrote:
               | It's a joke
        
         | ballenf wrote:
         | The illegal moves were counted as losses/resignations, not
         | ignored.
        
         | __s wrote:
         | > So whenever it wanted to make an illegal move, it resigned.
         | 
         | Doesn't sound like ignoring the cases where it failed
        
         | vidarh wrote:
         | Fuller context from the article:
         | 
         | > Occasionally it does make an illegal move, but I decided to
         | interpret that as ChatGPT flipping the table and saying "this
         | game is impossible, I literally cannot conceive of how to win
         | without breaking the rules of chess." _So whenever it wanted to
         | make an illegal move, it resigned._
         | 
         | (my emphasis)
         | 
         | So the illegal moves are at least part of the reasons for the 6
         | losses, and factored into the rating. Quickly scanning the
         | game, it seems 3 of the losses ended in checkmate, so that
         | leaves 3 illegal moves in 19 games.
         | 
         | Could be better, but for a system not intentionally built to
         | play chess, it's pretty decent.
        
           | swatcoder wrote:
           | No ELO 1400 player will have that rate of illegal moves, so
           | saying it that it plays with an ELO 1400 rating is
           | disingenuous.
           | 
           | Reinterpreting illegal moves as resignation is absurd when an
           | LLM is formally _capable_ of expressing statements  "I
           | resign" or "I cannot conceive of a winning move from here"
           | just as well as any human player. It just doesn't do so
           | because it's not actually playing chess the way we think of
           | an ELO 1400 player playing chess.
        
             | Pxtl wrote:
             | I'd be interested if it could be coaxed into legal moves
             | after making an illegal one. "That is an illegal move. Can
             | you do something legal with this board?"
        
             | charcircuit wrote:
             | ELO is based off who you win and lose against. The rate of
             | illegal moves has nothing to do with ELO.
        
             | jart wrote:
             | I personally find that makes it more astonishing, that it
             | would slip up on knowing the most basic elements of the
             | game, yet still be able to play better than most humans.
             | Highly smart people sometimes say or do little things when
             | foraying into other fields that causes domain experts think
             | they're not one of them. But that usually doesn't stop
             | smart people from having an impact in making a contribution
             | with their insights. The question of illegal moves is
             | superficial, since most online systems have guardrails in
             | place that prevent them. At worst it's just an
             | embarrassment and I don't think machines care about being
             | embarrassed.
        
               | Jensson wrote:
               | > Highly smart people sometimes say or do little things
               | when foraying into other fields that causes domain
               | experts think they're not one of them
               | 
               | This is the opposite of that, a highly trained but dumb
               | entity that has seen many lifetimes worth of games but is
               | still tripping up on basics. But since it is so highly
               | trained you can mistake it for a master if you squint and
               | don't look into what it is doing.
        
               | ogogmad wrote:
               | > But since it is so highly trained you can mistake it
               | for a master if you squint and don't look into what it is
               | doing.
               | 
               | But it is a master, as has been pointed out repeatedly.
               | If you replace all illegal moves with resignations, and
               | use the same style of prompt as the OP did, then it plays
               | like an expert. I'm objecting because you're making it
               | sound like it's a trivial result.
        
               | Jensson wrote:
               | > you're making this sound like it's a trivial result
               | 
               | I don't think this is a trivial result, emulating a
               | highly trained idiot is still very impressive. But it is
               | very different from an untrained genius.
        
               | ogogmad wrote:
               | You seem to have very rigid and boring definitions of the
               | words "idiot" and "genius". The "AI effect" is real:
               | https://en.wikipedia.org/wiki/AI_effect
               | 
               | Tbh, I don't even know what you're saying.
               | 
               | [edit] OK, I might have misunderstood you. It's not
               | always clear what people mean.
        
               | Jensson wrote:
               | > The "AI effect" is real:
               | https://en.wikipedia.org/wiki/AI_effect
               | 
               | That isn't relevant to my comment, an idiot human is
               | still a human. Your comment here therefore doesn't make
               | sense. The comment I responded to likened it to a genius
               | entering a new field, I objected to that, that is all.
        
             | vidarh wrote:
             | That no _human_ ELO 1400 player will have that rate of
             | illegal moves may be true, but if anything treating the
             | very first illegal move as forfeit appears to be stricter
             | than most rules
        
             | JellyBeanThief wrote:
             | Fine, just wrap the LLM in a simple function that detects
             | illegal moves and replaces them with "I resign" or "I
             | cannot conceive of a winning move from here". Then you
             | aren't "reinterpreting" anymore.
             | 
             | My point is, it sounds like Elo doesn't measure what we
             | want it to measure. If we care about the _way_ an agent
             | wins a game and not just _whether_ it wins a game, then we
             | need an instrument that measures strategy, not outcome.
        
               | illiarian wrote:
               | > Fine, just wrap the LLM in a simple function that
               | detects illegal moves and replaces them with "I resign"
               | or "I cannot conceive of a winning move from here". Then
               | you aren't "reinterpreting" anymore.
               | 
               | Then it's still isn't anywhere near ELO 1400.
        
               | [deleted]
        
               | vidarh wrote:
               | Under FIDE rules it's first a forfeit after the second
               | illegal move, so if anything it would seem that the
               | interpretation used by the article author
               | _underestimates_ its ELO ranking.
        
               | illiarian wrote:
               | Nope, still not even close to what the author claims. If
               | I understand it correctly, it made illegal moves in 3 out
               | of 19 games. That's probably a few orders of magnitude
               | more illegal moves than even a 1400 ELO player would make
               | of their entire lifetime.
        
               | vidarh wrote:
               | The ranking takes into account wins and losses, not
               | illegal moves, and so the fact that it plays in a way
               | where a higher proportion of its losses is down to
               | illegal moves than a human player is not relevant to its
               | ranking. It may suggest that the ranking _ought_ to take
               | that into account, but that 's a separate issue.
        
               | pedrosorio wrote:
               | Repeating what others have said in this thread:
               | 
               | The author claims: chatGPT has a 1400 chess ELO based on
               | games played.
               | 
               | You appear to think author claims: chatGPT plays chess
               | like a human rated 1400.
               | 
               | Your observations do not contradict the authors' claim
               | that based on games won and lost against opponents of a
               | specific strength, the estimated ELO is 1400.
               | 
               | A non-human player can make illegal moves at a much
               | higher rate and make up for that by being stronger when
               | it does not make illegal moves to achieve the same rating
               | as a human player who plays the game in a completely
               | different way.
        
               | ogogmad wrote:
               | There's the "it" which has no post-processing, and
               | there's the "it" where the output is post-processed to
               | announce a resignation when it attempts an illegal move.
               | 
               | Some things about the two "it"s:
               | 
               | - They differ trivially.
               | 
               | - They enable new capabilities, such as the ability to
               | explain why a move got made. Current chess AIs are not
               | good at this.
               | 
               | So I think you're making too much of a big deal from a
               | comparative triviality.
               | 
               | [edit]
               | 
               | We might be talking past each other. And some people
               | above have come to doubt the article's results even with
               | the right prompt engineering.
        
             | arrrg wrote:
             | Does that matter? Seems weird to me to make that argument.
             | I'm honestly quite confused by it.
             | 
             | A bowling bot that threw strikes 9 out of 10 throws and a
             | gutter ball one time out of ten would still be a great
             | bowler even though no human with the ability to make
             | strikes that often would pretty much ever throw a gutter
             | ball.
             | 
             | This is a weird kind of alien intelligence that does not
             | have to behave like humans.
        
             | TheRealPomax wrote:
             | Note that the claim is not that it's an _ELO 1400 human
             | equivalent player_ but that it can play chess at a level
             | that gives it an ELO of 1400, which is not nitpicking: that
             | 's a completely different thing. We're not testing whether
             | it plays like a player with ELO x, we're proving that "it
             | _can 't play chess_" is fallacious. It can, and when
             | prompted properly, it can achieve an ELO of 1400.
             | 
             | ELO allows for illegal moves: as per the rules of chess,
             | you lose the game if you make an illegal move. The end, ELO
             | doesn't care about _why_ you lost a game on purpose.
        
           | jmull wrote:
           | The article also says in one game chatgpt when crazy so they
           | continued the game with a fresh chat. That probably should
           | have been counted a resignation loss too.
        
           | saghm wrote:
           | So it sounds like it can play _some_ legal chess games, but
           | not all; it's unable to consistently complete a game where it
           | loses. Maybe the remaining work shouldn't be focused on
           | trying to teach it chess rules better, but to teach it
           | sportsmanship better. People were so excited about teaching
           | it high-school level academics that we forgot to teach it the
           | basic lessons we learn in kindergarten.
        
             | kmeisthax wrote:
             | Or append "If you wish to resign or you cannot think of a
             | legal move, type 'resign'" to the end of the prompt.
        
               | saghm wrote:
               | That's basically my point; that sort of context is
               | exactly the sort of thing you would not need to say to a
               | person who grew up in a typical social environment. If we
               | focus too much on teaching AI technical skills, we might
               | later find out that some of the social skills we think of
               | as implicit were just as important.
        
             | vidarh wrote:
             | It seems like it plays _mostly_ legal chess games, when not
             | explicitly reminded of the rules. There 's no problem of
             | sportsmanship when it makes mistakes in a game it has not
             | been verified to understand the rules of.
        
               | saghm wrote:
               | I was responding to the conclusion from TFA quoted by the
               | parent comment, that playing an illegal move was it
               | saying "this game is impossible, I literally cannot
               | conceive of how to win without breaking the rules of
               | chess." If you reject that premise, then yes, my response
               | to it will not be particularly relevant to your
               | worldview.
        
               | vidarh wrote:
               | Playing illegal moves is accounted for in rules.
               | Depending on which rules you play by it can be an
               | immediate forfeit, or involves redoing moves and adding
               | time for the opponent, possibly with forfeit if repeated.
               | As such, the article opted for one of the strictest
               | possible rule sets. You can reject the interpretation he
               | gave, and the outcome under those rules would still be
               | the same. If you were to pick a more lenient ruleset,
               | it's possible it would've come out with an even higher
               | ranking.
        
           | nextaccountic wrote:
           | > So whenever it wanted to make an illegal move, it resigned.
           | 
           | Making an illegal move counts as losing by the laws of chess,
           | so this is essentially correct
        
         | whimsicalism wrote:
         | I'm confused. If you read the article, you know that you are
         | wrong - but you are quoting the article?
        
         | hgsgm wrote:
         | Is this the top comment (and not even grey) because more people
         | failed to read the article than read it?
        
           | whimsicalism wrote:
           | A baffling thread.
           | 
           | They quoted the article, so clearly they read it... but not
           | very well?
        
           | sebzim4500 wrote:
           | It does seem that way.
        
       | jonnycat wrote:
       | A lot of the discussion here is about inferring the model's chess
       | capabilities from the lack (or occasional presence) of illegal
       | moves. But we can test it more directly by making an illegal move
       | ourselves - what does the model say if we take its queen on the
       | second move of the game?
       | 
       | Me: You are a chess grandmaster playing as black and your goal is
       | to win in as few moves as possible. I will give you the move
       | sequence, and you will return your next move. No explanation
       | needed. '1. e4'
       | 
       | 1... e5
       | 
       | Me: 1. e4 e5 2. Ngxd8+
       | 
       | 2... Ke7
       | 
       | This is highly repeatable - I can make illegal non-sensical moves
       | and not once does it tell me the move is illegal. It simply
       | provides a (plausible looking?) continuation.
        
       | FartyMcFarter wrote:
       | I just opened a random recent chess game on lichess (
       | https://lichess.org/YpxTUUbO/white#88 ) . I'm pretty sure ChatGPT
       | can't be trained on games that were just played, so this ensures
       | the game is not in its training data.
       | 
       | I gave the position before checkmate to ChatGPT to see if it
       | would produce the checkmating move. It played an illegal move,
       | replying with "Be5#" even there's no bishop of either color in
       | the position.
       | 
       | Unfortunately I'm rate limited at the moment so I can't try other
       | games, but this looks like a solid method to evaluate how often
       | ChatGPT plays legal / good moves.
        
         | hgsgm wrote:
         | OP explained that you need to prompt the whole game, not just a
         | position.
         | 
         | ChatGPT is an LLM, not a game tree engine. It needs the move
         | history to help it create context for it's attention.
        
         | vidarh wrote:
         | I gave it this prompt, and pasted the game. Its reply:
         | 
         | "45. Qxg6# This is a checkmate!"
        
       | varelse wrote:
       | [dead]
        
       | Nifty3929 wrote:
       | My conversation with ChatGPT on this:
       | 
       |  _What is the best way to prompt you to provide the best chess
       | move?_
       | 
       | To prompt me to provide the best chess move, you can provide the
       | current position of the chessboard and any relevant information
       | about the game, such as whose turn it is, the time control, and
       | any important features of the position (e.g. whether any pieces
       | are under attack or there are any potential tactics to consider).
       | Additionally, providing any specific criteria or goals for the
       | move, such as aiming to control the center or launch an attack on
       | a particular part of the board, can also help me provide the best
       | move.
       | 
       |  _In what format should I provide the current position?_
       | 
       | To provide the current position, you can use algebraic notation.
       | Algebraic notation is a standard notation used to describe the
       | moves and positions on a chessboard.
       | 
       | Each square on the chessboard is assigned a unique coordinate,
       | consisting of a letter and a number. The letters range from "a"
       | to "h" and refer to the files (columns) of the board, while the
       | numbers range from 1 to 8 and refer to the ranks (rows) of the
       | board.
       | 
       | To describe a position, you can list the pieces on each square,
       | starting from the top-left corner of the board and moving from
       | left to right along each rank. You can use uppercase letters to
       | represent white pieces and lowercase letters to represent black
       | pieces. Here is an example of a starting position:
       | 
       | rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP
       | RNBQKBNR
       | 
       | In this position, "r" represents a black rook, "n" represents a
       | black knight, "b" represents a black bishop, "q" represents a
       | black queen, "k" represents a black king, "p" represents a black
       | pawn, "R" represents a white rook, "N" represents a white knight,
       | "B" represents a white bishop, "Q" represents a white queen, "K"
       | represents a white king, and "." represents an empty square.
       | 
       | You can also include additional information such as whose turn it
       | is to move, the castling rights, and the en passant square. For
       | example, the starting position for white to move with full
       | castling rights and no en passant square would be:
       | 
       | rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP
       | RNBQKBNR w KQkq -
       | 
       | In this case, "w" indicates that it is white's turn to move,
       | "KQkq" indicates that both sides have full castling rights, and
       | "-" indicates that there is no en passant square.
       | 
       |  _What about just providing a sequence of moves from the starting
       | position? Would this be better or worse than your suggested
       | approach?_
       | 
       | Providing a sequence of moves from the starting position can also
       | be a valid way to specify a position in chess. However, it may
       | not always be the best approach, depending on the context.
       | 
       | One potential disadvantage of providing a sequence of moves is
       | that it requires more effort and time to enter the moves and
       | verify that the position is correct. Additionally, if there are
       | any errors or typos in the sequence, it could lead to an
       | incorrect position and subsequently an incorrect move suggestion.
       | 
       | On the other hand, if you have a specific position in mind that
       | is not the starting position, providing a sequence of moves may
       | be the most straightforward way to specify it. In this case, it
       | is important to ensure that the sequence is accurate and that all
       | relevant information (such as castling rights and en passant
       | squares) is included.
       | 
       | In summary, both algebraic notation and a sequence of moves can
       | be used to specify a position in chess, and the choice between
       | them depends on the context and the specific needs of the
       | situation.
        
       | oezi wrote:
       | Elo 1400 seems to equate to a top 30% player.
       | 
       | Source: https://chessgrandmonkey.com/chess-rating-percentile-
       | calcula...
        
         | LudwigNagasena wrote:
         | That's a bit meaningless without implied population. And it
         | doesn't tell you much because it depends so much on the
         | particulars (how many years people in that group spent on
         | chess, how serious they are). I would guess that's quite
         | different for FIDE and online chess.
         | 
         | Intuitively, I would guess 1400 is the rating of an average kid
         | who spent a year studying chess seriously.
        
           | xapata wrote:
           | Not the ELO of the average kid who spends a year studying
           | chess seriously? It's very difficult to define the average
           | kid without context.
           | 
           | It's been a while since I read a global demographics update,
           | but not all that long ago the mode child would have been
           | living on $2 a day.
        
             | LudwigNagasena wrote:
             | Sure, it's tricky, but I feel it's more coherent than "all
             | chess players".
        
         | dwmbt wrote:
         | slight nit: this is true amongst _all_ chess players. amongst
         | professionals and amateurs, 1400 is technically Class C - which
         | is on the lower end of amateurs.
         | 
         | nonetheless, it's cool to see a model 'generally intelligent'
         | enough to beat most players. what's more interesting to me is
         | that according to OP's analysis, GPT4 is worse at Chess than
         | GPT3.5...
         | 
         | source: https://en.wikipedia.org/wiki/Chess_rating_system
        
           | qwytw wrote:
           | It would depend how often did this happen:
           | 
           | > Occasionally it does make an illegal move
           | 
           | Since it would lose every game in which it made more than a
           | single illegal move.
        
             | sebzim4500 wrote:
             | Well it can't have happened more than 6 times since illegal
             | moves counted as a forfeit and it only lost 6 games.
        
       | jackphilson wrote:
       | Calling it: chess games in the future will be a battle between
       | two prompts.
        
         | qwytw wrote:
         | Well we already had much better chess bots years ago to be
         | fair.. People still play chess for some reason (not sure why)
        
           | sebzim4500 wrote:
           | Are you confused at the idea that someone might do something
           | for fun?
        
       | Tepix wrote:
       | I tried to play Othello with ChatGPT in the past (after ChatGPT
       | confidently claimed it could play Othello) but it didn't work
       | (ChatGPT refused to play).
       | 
       | I tried it with this prompt and it works a little bit but
       | produces repeated illegal moves as early as move 4.
       | 
       | Bummer. Perhaps GPT-4 will be more capable.
        
       | Upvoter33 wrote:
       | chatgpt does not really know how to play chess, or understand
       | chess. it understands how to sound like it plays chess.
        
       | keskival wrote:
       | I gave it more time to think about its move by adding empty
       | tokens to the end, and it played a lot better:
       | https://www.linkedin.com/posts/terokeskivalkama_chatgpt-acti...
        
       | fancyfredbot wrote:
       | I would be interested to see an argument based on computational
       | complexity that puts a bound on how well a transformer based llm
       | can play chess. Although it has access to a library of
       | precomputed results, that library is finite and the amount of
       | compute it can do on any prompt is limited by the the length of
       | the context window so it can't possibly "think" more than N moves
       | ahead.
        
       | silveira wrote:
       | Ask him to play at a higher Elo.
        
         | sebzim4500 wrote:
         | It's already been told to play like a grandmaster, how much
         | higher do you want to go?
        
           | dwmbt wrote:
           | technically, you could supply it a specific player, a model
           | like 'Stockfish' or 'super-grandmaster'[0]
           | 
           | [0] https://saintlouischessclub.org/blog/how-face-super-
           | grandmas...
        
           | nonethewiser wrote:
           | It should be compared to a simpler prompt with less
           | instruction ("what move should be played in this chess game:
           | ..." or something). I've found that additional constraints
           | sometimes aren't applied very well and actually result in
           | worse output.
        
       | juice_bus wrote:
       | When I tried it at v3.0 i found after 5-10 moves it started
       | moving illegally.
        
         | PoignardAzur wrote:
         | Did you use the same prompt method as OP?
        
         | mjr00 wrote:
         | The AI has simply, and correctly, identified that cheating is
         | the best way to win at something.
        
           | skywal_l wrote:
           | The best move is not to play.
        
           | orbital-decay wrote:
           | It doesn't even know the rules, let alone cheat. It predicts
           | the notation from the massive amount of games seen during
           | training.
           | 
           | Edit: although thinking of it, it probably anazyled a
           | shitload of chess books too. It might have a lot of knowledge
           | compressed into the internal representation. So yeah, maybe
           | it knows rules in some form and even some heuristics, after
           | all. It just doesn't understand the importance of making
           | legal moves, and can't apply complex theory (requiring it to
           | run stateful algorithms).
        
             | rzzzt wrote:
             | If you have played with ye olde flip phone's T9 predictive
             | feature as a child, trying to compose entire messages just
             | by accepting the next word that comes to the phone's
             | mind... that's ChatGPT, with the small difference of giving
             | waaay better suggestions for the next word. But other than
             | that, there is no understanding in the black box
             | whatsoever.
        
               | orbital-decay wrote:
               | That heavily depends on your definition of understanding,
               | which is not easy to define. The vague definition I imply
               | here is "the ability to make predictions based on higher
               | order correlations extracted from the training data".
        
           | rzzzt wrote:
           | Tom 7's NES play function paused the game when it encountered
           | an insurmountable problem: https://youtu.be/xOCurBYI_gY?t=950
        
       | chpatrick wrote:
       | Good thing it's "incapable of reasoning"!
        
         | jupp0r wrote:
         | Is a normal chess program capable of reasoning?
        
           | wizzwizz4 wrote:
           | I would say so. Logical reasoning is not a hard thing to
           | program a computer to do: the _really_ hard part is to reason
           | about the real world, rather than just a manually-defined
           | world model.
           | 
           | Traditional algorithms are vanguards of pure reason. Neural
           | networks are super-intuition. Stockfish unites the two,
           | executing an efficient intuitive heuristic search over the
           | solution space of "moves in a chess game" - but no matter how
           | cool the robot arm you build for it, Stockfish could never
           | learn to assemble furniture.
        
           | Symmetry wrote:
           | The Monte Carlo analysis AlphaZero used functioned as a sort
           | of multi-step reasoning for it. GPT can use its token buffer
           | for some multi-step reasoning but that sort of interferes
           | with providing a conversation with the user so it's much less
           | effective.
        
           | chpatrick wrote:
           | In the narrow field of chess, yes.
        
             | [deleted]
        
         | Longlius wrote:
         | It is incapable of reasoning, actually - at least in this case.
         | It has no internal understanding of chess which is why it makes
         | illegal moves.
        
           | sebzim4500 wrote:
           | Mostly it didn't make illegal moves though, since illegal
           | moves mean resignation and it won more than it lost.
           | 
           | Making 60 legal moves in a row in one game would be the
           | coincidence of the century unless it had some knowledge of
           | the rules of chess.
        
             | henryfjordan wrote:
             | It's a probabilistic text model. If it has a 99%
             | probability of generating an acceptable "next" thing to
             | say, that means it would have a 50/50 chance of generating
             | 60 legal moves in a row, which doesn't seem all that
             | coincidental.
        
               | baq wrote:
               | Markov chains are probabilistic text models and rather
               | far from 1400 elo
        
               | chpatrick wrote:
               | And the 99% probability isn't an evidence of
               | understanding chess?
        
               | henryfjordan wrote:
               | I don't know.
               | 
               | Part of me wants to say no, that the model "thinks" in
               | terms of text it has seen and so knows from chess forums
               | it has seen that certain text representing moves come
               | naturally after previous moves' text. It doesn't
               | understand anything other than certain text comes after
               | other text.
               | 
               | But yeah at the same time I can see how it is thinking
               | inside the world we built for it. We have senses like
               | touch, smell, sight. The only "sense" these models have
               | are an input text box. Would we even necessarily
               | recognize intelligence when it is so different from our
               | own?
               | 
               | So does it understand chess like I do? No, it cannot.
               | Does it understand chess at all? I'm not sure. I'm not
               | sure I'd understand chess in it's world either though.
        
           | chpatrick wrote:
           | How did it win 11 out of 19 games then, blind luck?
        
             | root_axis wrote:
             | raw statistical power.
        
               | bsaul wrote:
               | a game of chess becomes << new >> after a few moves.
               | starting middlegame, you're in unknown territories and
               | have no statistics to refer to..
        
               | root_axis wrote:
               | I'm referring to the statistical power of the model. For
               | example, if you replace GPT4 with GPT2 it will lose every
               | game, because the statistical power is lower. Increasing
               | the statistical power doesn't make the model understand
               | any better, it just makes it more likely to generate a
               | response that aligns with human expectations.
        
               | chpatrick wrote:
               | "Statistical power" isn't some magic property of GPT4. It
               | can produce statistically more likely moves because
               | somewhere deep down it can model chess.
        
               | root_axis wrote:
               | It isn't a model of chess, it's a model of internet text,
               | if it was a model of chess it wouldn't make illegal
               | moves.
        
               | bsaul wrote:
               | if it didn't have at least some kind of model of chess,
               | it wouldn't be able to play past midgame.
               | 
               | Simply because on a new position, moves from other
               | positions aren't applicable _at all_.
        
           | baq wrote:
           | How do you know that? It has billions of parameters, some of
           | them may well be for internal understanding of chess?
        
           | RyanCavanaugh wrote:
           | What experiment would you run to determine if a given text
           | input / text output interface had an "internal understanding
           | of chess"?
        
             | jltsiren wrote:
             | I think internal understanding requires internal
             | processing.
             | 
             | According to this functional definition, the way we are
             | currently using language models basically excludes
             | understanding. We are asking them to dream up or brainstorm
             | things - to tell us the first things they associate with
             | the prompt.
             | 
             | Maybe it's possible to set up the system with some kind of
             | self-feedback loop, where it continues evaluating and
             | improving its answers without further prompts. If that
             | works, it would be one step closer to a true AGI that can
             | be said to understand things.
             | 
             | There is a lot of confusion around the Chinese Room
             | Argument. I think it makes a valid point by demonstrating
             | that input/output behavior alone is insufficient for
             | evaluating whether a system is intelligent and understands
             | things. In order to do that, we need to see (or assume) the
             | internal mechanism.
        
             | gwright wrote:
             | What if you prompted with something like:
             | Let's play a game chess.  Use the standard rules except
             | that ....
             | 
             | Basically perturb the context to something a human would
             | easily adapt to if they first knew the rules of chess but
             | that would be difficult (or at least not obvious) to
             | extrapolate from training data by ChatGPT (or more
             | generally an LLM)
        
       | nickpeterson wrote:
       | What are the possibilities for augmenting or synthesizing Lon's
       | with old school expert systems or other tools. Could it recognize
       | when a problem fits some other tool and help map the inputs?
        
       | jonathanstrange wrote:
       | Not to nitpick but I don't think it can be said to be able to
       | play chess when it also makes illegal moves.
        
         | martincmartin wrote:
         | When it makes an illegal move, and is told "that's an illegal
         | move, choose a different move," if it can then come up with a
         | different move for the same board, I'd say it plays chess.
        
         | wahnfrieden wrote:
         | ai isn't to be used completely raw as text. you can ask gpt to
         | write a python program that filters its responses for legal
         | moves. then it's a matter of rendering the output (immediate
         | execution) in a richer format than raw text.
        
           | qwytw wrote:
           | I did. It write 15-20 lines and then finished the functions
           | with this:
           | 
           | " # Add more cases for other piece types such as rooks,
           | knights, bishops, queens, and kings. # ... "
           | 
           | So not great..
           | 
           | But even if it wasn't so lazy it would just find a script (or
           | multiple scripts) which say that they do that. Probably
           | combine them or add some minor stuff and output the result.
           | 
           | It can't really verify that data it used to come up with this
           | result is accurate or reason about it in any way? Can it?
           | 
           | So if you're incapable of verifying that yourself it would be
           | pretty useless?
        
         | chessexpeort1 wrote:
         | Humans make illegal moves too. So much that there's a rule for
         | it--
         | 
         | "Penalty for Illegal Moves The penalty for the first illegal
         | move is to add two minutes to the opponent's time. A second
         | illegal move results in loss of the game.".
         | 
         | https://new.uschess.org/sites/default/files/media/documents/...
        
           | imtringued wrote:
           | How many people change the colour of the enemy queen and
           | place new pieces on the board?
        
           | oehtXRwMkIs wrote:
           | Not even total beginners will try to move pieces that don't
           | exist though.
        
       | WoodenChair wrote:
       | Most likely it has seen a similar sequence of moves in its
       | training set. There are numerous chess sites with databases
       | displayed in the form of web pages with millions of games in
       | them. If it had any understanding of chess, it would never play
       | an illegal move. It's not surprising that given a sequence of
       | algebraic notation it can regurgitate the next move in a similar
       | sequence of algebraic notation.
        
         | SanderNL wrote:
         | You don't get to 1400 like that. The amount of moves it has to
         | literally remember is stupendous.
        
           | WoodenChair wrote:
           | Nobody who is 1400 plays outright illegal moves.
        
             | SanderNL wrote:
             | Humans don't, but that is an unnecessary bar IMO.
        
               | WoodenChair wrote:
               | > Humans don't, but that is an unnecessary bar IMO.
               | 
               | People are always telling me that I'm moving the
               | goalposts when I challenge the hyperbole about LLMs. But
               | now you're moving the goalposts about chess.
               | 
               | Not playing illegal moves is a pre-requisite for any
               | strong understanding of how to play chess. That is
               | definitely the goal post.
        
               | scj wrote:
               | If a two year old had an ELO of 1400 but sometimes took a
               | nap halfway through a game, it'd be a prodigy.
               | 
               | It's not like an AI making silly mistakes when driving a
               | car.
        
             | vanviegen wrote:
             | Does that still hold when the player doesn't have a board
             | in front of them, but just a list of previous moves?
        
             | mrbungie wrote:
             | Yet it achieves 1400. Add hard rules to stop it spewing out
             | said moves and you have a 1400 ELO Player (most UIs won't
             | even let you make illegal moves).
             | 
             | It is difficult to say that is not impressive due to it
             | being an emergent ability.
        
               | WoodenChair wrote:
               | > It is difficult to say that is not impressive due to it
               | being an emergent ability.
               | 
               | I don't know why you think it's an emergent ability.
               | 
               | It's seeing a sequence of moves, and playing the most
               | likely next move (i.e. the most likely next token) given
               | the previous complete move sequences it was trained on.
               | That's the baseline of what an LLM does--not something
               | emergent. Games in online chess databases tend to be of
               | relatively good players. Nobody wants to look up games
               | played by two 800 ELO players.
               | 
               | As an aside, there have been chess programs for years
               | that show you for a given position all of the previous
               | games in its database with the same position and the win
               | outcome % of each move. That's all that's going on here.
        
               | mrbungie wrote:
               | Because I don't think that the model learned the literal
               | memorization of chess moves. It must've at least
               | compressed said information in some way way. And since
               | the model is not biased to play chess on its structure
               | nor sampling policy, I think it's fair to consider it an
               | emergent ability.
               | 
               | Chess moves are a tiny/diminute part of all text learned
               | by the model. This memorization argument is very similar
               | to the "Stable Diffusion just takes bits of the images in
               | the original dataset and parches them together".
        
               | oezi wrote:
               | > here have been chess programs for years that show you
               | for a given position all of the previous games in its
               | database with the same position and the win outcome % of
               | each move. That's all that's going on here.
               | 
               | It could be, but would you think that of the 100-300 bn
               | parameters in the model a lot are dedicated to chess move
               | sequences? It seems likely that it has seen such data,
               | but I would be surprised if it is using a considerable
               | chunk to store chess database information.
        
               | Jensson wrote:
               | The web has millions of grandmaster chess games and
               | probably billions of chess games overall. So I wouldn't
               | be surprised if it has like 0.01% to chess games since
               | there are so many, if so that would mean it has 10-30
               | million parameters to play chess with, for comparison
               | stockfish has 10 million parameters to its chess engine.
        
           | Jensson wrote:
           | It doesn't remember exact moves, it remembers patterns of
           | moves and continues those. That is why it so easily makes
           | illegal moves, because the patterns doesn't have anything to
           | do with board state.
        
             | SanderNL wrote:
             | Again, not enough for 1400. 800 maybe.
        
               | Jensson wrote:
               | We have evidence right here that patterns of moves leads
               | to 1400. To me this is evidence that 1400 level play is
               | very rigid and mostly relies on defined openings and
               | strategies, making it easy for an LLM to find similar
               | games and win using standard continuations.
               | 
               | Just like grammar the patterns are too hard for humans to
               | see and encode, but LLMs can encode pretty complex
               | patterns. Domains that are easy to encode as grammars
               | will be really easy for LLMs to solve, and the further
               | from a grammar the harder for it.
        
               | SanderNL wrote:
               | At what point can we just say that understanding
               | "patterns of moves" _is_ understanding chess? It seems
               | you suggest there is more to it, but maybe I am mistaken.
        
               | meh8881 wrote:
               | Ask it to evaluate moves and explain why they are or are
               | not good
        
               | Jensson wrote:
               | At least it should make valid moves, that is the minimum
               | level required. It didn't reach that level here. If it
               | never made illegal moves we could talk and see what it
               | does, but until then we can be sure it didn't understand
               | the rules.
        
               | serverholic wrote:
               | I don't understand why the threshold is "never". Isn't it
               | entirely possible that the AI is learning a model of
               | chess but this model is imperfect? What if AIs don't fail
               | the same way as humans?
        
               | Jensson wrote:
               | > What if AIs don't fail the same way as humans?
               | 
               | But it is failing the same way as a human. Humans who
               | remembers patterns and don't learn the logic makes these
               | kind of errors in math or logic all the time.
               | 
               | ChatGPT is much better than humans at pattern matching,
               | you see it right here it can pattern match chess moves
               | and win games! But its inability to apply logic to its
               | output instead of just pattern matching is holding it
               | back, as long as that isn't solved it wont be able to
               | perform on the level of humans in many tasks. Chess might
               | be easy enough to solve using just pattern matching and
               | no logic that scaling it up will make it pretty good at
               | chess, but many other topics wont be.
        
             | lelandfe wrote:
             | Perhaps I don't know what "easy" means in your comment, but
             | the OP article says that they "almost always" played full,
             | legal games with their prompt.
        
               | Jensson wrote:
               | Maybe chess.com players play in a specific way, and there
               | are lot of transcriptions of such games that these LLMs
               | have ingested when they ingested the internet?
               | 
               | I don't know why it worked in this specific case, but
               | based on earlier examples it is more likely that these
               | kind of games were more prevalent in its dataset it was
               | trained on than it being able to play chess in general.
               | It still wasn't perfect, so even these games weren't
               | rigid enough for it to reliably perform valid moves.
        
             | antibasilisk wrote:
             | but that's literally what most chess players do anyway?
        
               | Jensson wrote:
               | Yes, they remember patterns, and then applies logic to
               | make a valid move. ChatGPT remembers patterns, but didn't
               | apply any logic to make valid moves. There is a
               | difference there.
               | 
               | ChatGPT does the pattern matching part, but not the
               | logical part.
        
         | whimsicalism wrote:
         | For some reason, seeing wrong comment after wrong comment like
         | the above is just exhausting.
         | 
         | Why are people struggling so hard to understand that it's not
         | just regurgitating its training set? Is it motivated reasoning?
         | 
         | Apologies if your comment was meant as parody of this view,
         | it's hard for me to tell at this point.
        
         | mrbungie wrote:
         | I would argue that a decent human player also has "similar
         | sequences of moves in its training set", due to reading plus
         | playing or watching games.
         | 
         | The illegal move argument is good though, and indicates no
         | direct understanding of what it is spewing out.
        
         | PoignardAzur wrote:
         | > _Most likely it has seen a similar sequence of moves in its
         | training set._
         | 
         | Wouldn't we expect a much higher rate of illegal moves if that
         | was the case?
        
           | WoodenChair wrote:
           | If there was an actual understanding of chess at a 1400 level
           | we wouldn't expect _any_ illegal moves.
        
             | oezi wrote:
             | I think there is very low percentage of players at elo 1400
             | who can provide a valid next move after seeing just the
             | list of moves and not the current board state.
        
               | dragontamer wrote:
               | I'm Elo 1400 and can beat literally everyone I know in
               | the real world. I need to go online to find players at my
               | skill level, or find tournament/competitive settings for
               | a challenge.
               | 
               | Yeah, I'm "class C", weak amateur chess player, but I
               | think you're grossly underestimating the amount of study
               | I put into this game. I'm not going to make an illegal
               | move
        
               | sebzim4500 wrote:
               | I'm much higher rated than you and I could not reliably
               | play a legal game of chess just given a list of the moves
               | and no board.
               | 
               | I suspect you can't either, you can try by turning on
               | blindfold mode on lichess and seeing how far you get.
        
               | oezi wrote:
               | I mean can you play just based on being provided the
               | input of a series of moves without it being shown to you
               | as a visual board?
               | 
               | I guess most players would mess up 20/30 moves in.
        
               | dragontamer wrote:
               | https://chessily.com/blog/stockfish-vs-chatgpt/
               | 
               | > ChatGPT: Yes, that's a good move for you. My next move
               | is: Bc3, developing my pieces and attacking your pawn on
               | c3.
               | 
               | I am 1400 Elo and can tell you that from an near opening
               | position, its impossible to move a Bishop to c3 for
               | either Black or White in the first say, 10 moves, under
               | traditional openings.
        
               | medvezhenok wrote:
               | You will under time pressure :) even Grandmasters have
               | done that (I'm around 1900 elo for context)
               | 
               | Also people forgetting they moved the king/rook and
               | trying to castle.
        
               | dragontamer wrote:
               | Watch this ChatGPT game.
               | 
               | https://www.reddit.com/r/AnarchyChess/comments/10ydnbb/i_
               | pla...
               | 
               | We're talking about pieces that don't exist, reappearing
               | pieces, pieces moving completely wrong (Knight takes as
               | if its a Pawn), etc. etc.
               | 
               | ---------
               | 
               | People are taking these example games and saying ChatGPT
               | is 1400 strength. I don't think so. This isn't a case of
               | "oops, I castled even though I moved my king 15 turns
               | ago".
        
               | medvezhenok wrote:
               | The article points out that the way that game was
               | conducted was bad. (Here's the original transcript:
               | https://pastebin.com/X6kBRTa9)
               | 
               | You need to give ChatGPT the full state (every move) on
               | every prompt to make it play closer to 1400. The game you
               | linked the user was giving one move at a time.
        
               | dragontamer wrote:
               | > You need to give ChatGPT the full state
               | 
               | If I've been given the full state every move, I will
               | _never_ make an illegal move as a 1400 chess player.
               | 
               | -----------
               | 
               | > O-O > I'll play O-O as well. Your move.
               | 
               | Do you really think that this error would have been made
               | at 1400 Elo? Even in blind chess? This is the 5th move of
               | the game. I can still track the game at this point
               | mentally.
               | 
               | I recognize that you're 1900 and think that all the chess
               | players below you are n00bs, but... come on. 1400 players
               | are stronger than this.
        
               | medvezhenok wrote:
               | And yet kids who are gaining rating quickly can and do
               | still occasionally (albeit rarely) make illegal moves at
               | 1400. I know because I've played them (and was one, many
               | years ago).
        
             | scarmig wrote:
             | We would expect some vanishingly small fraction. Even at
             | the grandmaster level, they happen ( _rarely_ ):
             | 
             | https://chess24.com/en/read/news/the-7-most-illegal-chess-
             | mo...
        
             | og_kalu wrote:
             | Not if training is unsupervised. If you've never been
             | explicitly told the rules of game, you can never be 100%
             | sure of all possible illegal moves. anyway the 3.5 series
             | can't ply chess but gpt-4 certainly can.
        
               | Jensson wrote:
               | > anyway the 3.5 series can't ply chess but gpt-4
               | certainly can.
               | 
               | This article stated the opposite, gpt-4 couldn't play
               | chess while gpt-3.5 could. So this is a case where the
               | model got dumber.
        
               | og_kalu wrote:
               | Not sure what's up but from here https://www.reddit.com/r
               | /chess/comments/11rfbch/gpt4_seems_t... nd here
               | https://twitter.com/zswitten/status/1631107663500304384
               | Gpt-4 plys chess just fine.
        
               | rightbyte wrote:
               | Surely the rules of chess has to be in the training data
               | 100s of times? Maybe it is not enough.
        
             | bloppe wrote:
             | This. The author is very generous with their
             | interpretation:
             | 
             | > I decided to interpret that as ChatGPT flipping the table
             | and saying "this game is impossible, I literally cannot
             | conceive of how to win without breaking the rules of
             | chess."
             | 
             | Kind of sounds like anthropomorphization, but more likely
             | the author just papering over the glaring shortcomings to
             | produce a compelling blog post.
             | 
             | It also sounds like the illegal moves were rather frequent.
             | The 61-legal-move game sounded like an impressive outlier.
        
               | rightbyte wrote:
               | I think the authors rule is fair. If we interpret illegal
               | moves as getting stuck in an online game, the resulting
               | Elo rating is what it would get.
               | 
               | But ye, he is anthropomorphizing alot ...
        
               | bloppe wrote:
               | There's no indication that GPT-3.5 was stuck when it
               | tried to make illegal moves. GPT-4 clearly was making
               | illegal moves when it was very much not stuck. It just
               | doesn't know how to play, but the author decided to
               | interpret it as frustration.
        
           | xapata wrote:
           | Chess has a syntax.
        
             | lb4r wrote:
             | You can still have syntactically correct moves that are
             | illegal.
        
               | xapata wrote:
               | Sorry, I meant that learning how to speak with syntax and
               | grammar is comparable to a string of mostly-correct chess
               | moves. Especially in early game.
        
               | lb4r wrote:
               | Ah, yes that is more or less my understanding of it as
               | well. Though I would like to see how it would perform if
               | given the state of the board as input to predict the next
               | move, rather than a sequence of moves, since that is how
               | we humans normally determine the next move. I believe the
               | move history is only relevant when it comes to en passant
               | and certain draw scenarios (like repetition and that 50
               | move rule). Needless to say, it would first have to be
               | trained on those types of inputs, which it probably is
               | not.
               | 
               | Edit: move history can also be relevant when it comes to
               | castling.
        
           | yowzadave wrote:
           | Doesn't ChatGPT indeed have a very high number of illegal
           | moves?
           | 
           | https://www.youtube.com/watch?v=kvTs_nbc8Eg
           | 
           | In this example, ChatGPT's first few moves are reasonable
           | (while it appears to be on-book), but then it goes off the
           | rails and starts moving illegally, spawning pieces out of
           | nowhere, deleting pieces for no reason, etc.
        
             | macNchz wrote:
             | Reminds me of asking for driving directions (city to
             | city)...for major cities it can often give perfect
             | directions, for smaller cities it starts out surprisingly
             | accurate but often devolves into invented exits or
             | descriptions of a
        
             | medvezhenok wrote:
             | I think it was not given the whole game up to that point,
             | just individual moves. That was the point of this article -
             | if you include all of the moves in the prompt, it is less
             | likely to make illegal moves.
        
         | johnfn wrote:
         | > Most likely it has seen a similar sequence of moves in its
         | training set.
         | 
         | Is this a joke making fun of the common way people dismiss
         | other ChatGPT successes? This makes no sense with respect to
         | chess, because every game is unique, and playing a move from a
         | different game in a new game is nonsensical.
        
           | melenaboija wrote:
           | If it isn not memorizing, how do you think is doing it?
        
             | LeanderK wrote:
             | by trying to learning the general rules that to explain the
             | dataset and minimise its loss. That's what machine learning
             | is about, it's not called machine memorising.
        
               | mrguyorama wrote:
               | It cannot be learning the general rules because it
               | occasionally tries to invent pieces out of whole cloth.
        
               | meh8881 wrote:
               | Try asking it if a previous move was legal. Or why it
               | decided to do the move.
               | 
               | The problem is not a failure to understand the rules. It
               | is just not very good at maintaining the state.
        
               | LeanderK wrote:
               | that would be the optimal rule :) We usually optimise to
               | improve and learn better and better rules, hopefully
               | approximating the optimal rule after some iterations.
               | There's still a gap here, but hopefully it can be closed
               | by improving the models, training algorithms etc.
        
               | [deleted]
        
               | [deleted]
        
           | qwytw wrote:
           | Is it though? I mean if you had data on millions of games
           | what is the chance that you'd find one which has identical
           | position that the one you're in (it's not like most moves are
           | random..)
           | 
           | I wonder how well it could perform in Go, there are way more
           | permutations there so finding an identical state should be
           | more difficult.
        
             | medvezhenok wrote:
             | You could certainly test this by making completely random
             | moves and seeing whether it's more likely to make illegal
             | moves in those positions.
             | 
             | Though I think you're overestimating how many positions
             | have occured. Frequently, by move 20-25 you have a unique
             | position that's never been played before (unless you're
             | playing a well known main line or something)
        
             | sebzim4500 wrote:
             | >I mean if you had data on millions of games what is the
             | chance that you'd find one which has identical position
             | that the one you're in (it's not like most moves are
             | random..)
             | 
             | Very low. On lichess when you analyse your games you can
             | see which positions have been reached before, and you
             | almost always diverge in the opening.
             | 
             | The lichess db has orders of magnitude more games of chess
             | than the chatGPT training data does, so there is absolutely
             | no way that chatGPT could reach 1400 purely based off
             | positions in its training data.
        
             | jointpdf wrote:
             | You can click through the Lichess opening database (click
             | the book icon, and then the Lichess tab) to get an idea:
             | https://lichess.org/analysis
             | 
             | But the answer is insanely unlikely, past a certain number
             | of moves. The combinatorial explosion is inescapable. Even
             | grandmaster games are often novelties in <10 moves.
             | 
             | So, it has a to have some kind of internal representation
             | of board state and what makes a reasonable move and such
             | that enables it to generalize (choosing random legal moves
             | is almost unbelievably bad, so it's not doing that).
             | 
             | I also doubt that it has been trained on the full (massive)
             | database of Lichess games, but that would be an interesting
             | experiment: https://database.lichess.org/
        
           | GauntletWizard wrote:
           | There's a pretty finite set of tokens that represent valid
           | chess moves, and a more than countable set of possible
           | opening moves, with a very distinct probability of moves;
           | Nobody opens pg3, for example.
           | 
           | Classical Markov chains played chess at some rate of success.
           | ChatGPT is probably a lot better but not fundamentally
           | different - It's predicting which moves to play based on sets
           | of past games, not by memorizing it but by memoizing it.
        
           | mtlmtlmtlmtl wrote:
           | 1850 ELO player and also chess AI programmer here. This is an
           | oversimplification at best.
           | 
           | Many many games follow the same moves(1 move = 2 plies) for a
           | long time, up to 30 moves in some cases, 20 moves is
           | downright common and 10 moves is more common than not.
           | 
           | These series of moves are referred to as opening theory and
           | are described at copious length in tons of books.
           | 
           | This is because while the raw number of possible paths to
           | take is immense, the number of reasonable paths for 2 players
           | of a given strength gets smaller and smaller.
           | 
           | If I went over the 300 or so classical tournament games I've
           | played I would ballmark that maybe just one or two would
           | deviate from all known theory in the first 10 moves.
           | 
           | So the criticism is valid in my view. The existence of
           | copious chess literature can't simply be ignored here.
        
             | johnfn wrote:
             | I'm very familiar with opening theory. Some of the games
             | are 40 or 60 movies. This is not a regurgitation of book
             | moves.
        
               | mtlmtlmtlmtl wrote:
               | Why do people always have to interpret everything in
               | absolute terms?
               | 
               | It's clearly following _some_ opening theory in all the
               | games I 've looked at so far. So yes, it is regurgitating
               | opening moves. That's clearly not _all_ it 's doing,
               | which is very impressive, but these are not mutually
               | exclusive.
        
               | johnfn wrote:
               | I am responding to OP, who said "Most likely it has seen
               | a similar sequence of moves in its training set."
               | 
               | From this, I take it that the question is if ChatGPT is
               | repeating existing games, or not. All you need is a
               | single game where it's not repeating a single game to
               | prove it definitively. You can hardly play 60 moves
               | without an error by accident.
               | 
               | I believe you're responding to a different question,
               | something like "does ChatGPT fully understand the game of
               | chess".
        
               | mtlmtlmtlmtl wrote:
               | The OP was too unsophisticated in their analysis(as is
               | TFA), no doubt. But I'm not too interested in what OP
               | said or who was wrong or not, and rather more interested
               | in finding what's right.
               | 
               | As someone very clever once said, welcome to the end of
               | the thought process.
               | 
               | We've established that:
               | 
               | 1. It doesn't repeat entire games when the games go long
               | enough
               | 
               | 2. It does repeat a lot of opening theory
               | 
               | 3. It seems to repeat common, partially position
               | independent tactical sequences even when they're illegal
               | or don't work tactically.
        
             | sebzim4500 wrote:
             | Yeah but now explain how it played a 61 move game.
             | 
             | EDIT: I checked and it left the lichess database after 9
             | moves. The lichess db has probably 5 orders of magnitude
             | more chess games in it than chatGPT has in its training
             | data.
        
               | mtlmtlmtlmtl wrote:
               | That's not the point. The point is if you truly want to
               | test its strength, you'll have to control for these
               | things. Maybe do things like invent a new form of
               | notation and/or deliberately go into uncharted territory.
               | Maybe start with a non-standard starting position even.
               | Or play chess960 against it.
               | 
               | In theory if I was playing a 1200 player I would almost
               | always win, but let's say they have some extremely
               | devious preparation that I fell into due to nonchalance
               | and by the time we're both out of book I'm down a queen.
               | It might not matter that I'm 600 points stronger at that
               | point. If they don't make a sufficient amount of errors
               | in return I will lose anyway.
        
               | sebzim4500 wrote:
               | ChatGPT would probably play worse under those conditions,
               | but then humans also get worse. ACPL is way higher at top
               | level 960 events than at normal tournaments, for example
        
               | mtlmtlmtlmtl wrote:
               | This is true. But I think ChatGPT compared to a human
               | player of equivalent strength(leaving the dubious number
               | 1400 out for now) will probably have a lot more opening
               | knowledge than the human.
               | 
               | So it would be interesting to eliminate all opening
               | knowledge and that way be able to qualitately get at
               | which aspects of chess it's actually good at, which is
               | sucks out, and how much of its strength can be attributed
               | to opening knowledge.
               | 
               | I'm still impressed by this btw. I did not expect this to
               | be possible at all really. But being impressed is not an
               | excuse to ignore methodological flaws. :)
        
               | roflyear wrote:
               | It's suggesting a move it saw come after a move
               | somewhere. Or it's creating a token it thinks comes next.
        
           | btilly wrote:
           | Sorry, but not every game is unique. The following game has
           | been played millions of times.
           | 
           | 1. e4 e5 2. Bc4 Bc5 3. Qh5? Nf6?? 4. Qxf7++
           | 
           | The game Go has a claim to every game being unique. But not
           | chess. And particularly not if both players follow a standard
           | opening which there is a lot of theory about. Opening books
           | often have lines 20+ moves deep that have been played many
           | times. And grandmasters will play into these lines in
           | tournament games so that they can reveal a novel idea that
           | they came up with even farther in than that.
        
             | johnfn wrote:
             | You seem to be refuting a specific point of my argument
             | which has little bearing on the overall point I was making.
             | 
             | All games were provided in the article. None of them were 4
             | move checkmates; nearly every one is longer than 20 moves
             | and some are 40 or longer. There is simply no possible way
             | that ChatGPT is regurgitating the exact same 40-move-long
             | game it's seen before. You can check a chess database if
             | you'd like; virtually all games longer than 20 moves are
             | unique.
        
               | mtlmtlmtlmtl wrote:
               | But this doesn't disprove that:
               | 
               | 1. It definitely regurgitates opening theory, much more
               | than can reasonably be calculated at its strength.
               | 
               | 2. It _might_ be regurgitating tactical sequences that
               | appear in a lot of positions but remain identical in
               | algebraic notation. Famous example:                 1.
               | Nxf7+ Kg8       2. Nh6++ Kh8       3. Qg8+ Rxg8       4.
               | Nf7#
               | 
               | This smothered mate can occur in a huge variety of
               | different positions.
               | 
               | There's some qualitative evidence for this in the games.
               | 
               | In one of the games it has a bishop on f6 as white. It
               | plays Qxh6?? Kxh6 and then resigns due to illegal move.
               | I'd bet good money that illegal move was Rhx# where x is
               | 1-4. So it seems like in some these positions it's
               | filling in a tactical sequence that often occurs in the
               | vicinity of recent moves, even when it's illegal or
               | doesn't work tactically.
        
           | onos wrote:
           | Apparently not, at least at this level.
        
           | fxtentacle wrote:
           | For Bomberland, we were quite surprised how strongly we could
           | compress and quantize the current game state and still get
           | useful movement predictions.
           | 
           | I wouldn't be surprised if the relevant state in a typical
           | beginner's chess game also excluded many units in the sense
           | that yes, you could move them, but a beginner is going to
           | just ignore them in any case.
        
           | fsckboy wrote:
           | > _playing a move from a different game in a new game is
           | nonsensical_
           | 
           | GP did say "sequence of moves", and if it matches what it has
           | seen from the first move on, including the opponent, it will
           | be in a valid "sequence of moves".
           | 
           | then, even midgame or endgame, if a sequence is played on one
           | side of the board, even though the other side of the board
           | may be different, the sequence has a great chance of being
           | good (not always of course, but a 1400 rating is solid (you
           | know the rules and some moves) but not amazing
        
         | ptmx wrote:
         | I played chess against ChatGPT4 a few days ago without any
         | special prompt engineering, and it played at what I would
         | estimate to be a ~1500-1700 level without making any illegal
         | moves in a 49 move game.
         | 
         | Up to 10 or 15 moves, sure, we're well within common openings
         | that could be regurgitated. By the time we're at move 20+, and
         | especially 30+ and 40+, these are completely unique positions
         | that haven't ever been reached before. I'd expect many more
         | illegal moves just based on predicting sequences, though it's
         | also possible I got "lucky" in my one game against ChatGPT and
         | that it typically makes more errors than that.
         | 
         | Of course, all positions have _some_ structural similarity or
         | patterns compared to past positions, otherwise how would an LLM
         | ever learn them? The nature of ChatGPT's understanding has to
         | be different from the nature of a human's understanding, but
         | that's more of a philosophical or semantic distinction. To me,
         | it's still fascinating that by "just" learning from millions of
         | PGNs, ChatGPT builds up a model of chess rules and strategy
         | that's good enough to play at a club level.
        
           | mtlmtlmtlmtl wrote:
           | I'd be interested in seeing this game, if you saved it?
        
             | nwienert wrote:
             | Me too, I couldn't get it to reliably go past move 15
             | without numerous errors. In my mind it's closer to 150 ELO
             | than 1300, so I'd be happy to be proven wrong.
        
             | ptmx wrote:
             | I uploaded the PGN to lichess:
             | https://lichess.org/rzSriO6I#97
             | 
             | After reviewing the chat history I actually have to issue a
             | correction here, because there were two moves where ChatGPT
             | played illegally:
             | 
             | 1. ChatGPT tried to play 32. ... Nc5, despite there being a
             | pawn on c5
             | 
             | 2. ChatGPT tried to play 42. ... Kxe6, despite my king
             | being on d5
             | 
             | It corrected itself after I questioned whether the previous
             | move was legal.
             | 
             | I was pretty floored that it managed to play a coherent
             | game at all, so evidently I forgot about the few missteps
             | it made. Much like ChatGPT itself, it turns out I'm not an
             | entirely reliable narrator!
        
               | mtlmtlmtlmtl wrote:
               | Thanks! Interesting game.
               | 
               | Qxd7 early on was puzzling but has been played in a
               | handful of master games and it played a consistent setup
               | after that with b5 Bb7. Which I imagine was also done in
               | those master games. But interesting that it went for a
               | sideline like that.
               | 
               | It played remarkably well although a bit lacking in plan.
               | Then cratered in the endgame.
               | 
               | Bxd5 was strategically absurd. fxg4 is tactically absurd.
               | Interestingly they both follow the pattern: Piece goes to
               | square -> takes on that square.
               | 
               | This is of course an extremely common pattern, so again
               | tentatively pointing towards predicting likely sequences
               | of moves.
               | 
               | Ke7 was also a mistake but a somewhat unusual tactic with
               | Re2 and f5 is forced but after en passant the knight is
               | pinned. This tactic does appear in some e4 e5 openings
               | though. But then the rook is on e1 and the king never
               | moved or if it did, usually to e8, not e7. Possibly
               | suggesting that it has blind spots for tactics when they
               | don't appear on the usual squares?
               | 
               | Fascinating stuff.
        
         | wongarsu wrote:
         | Pattern matching on databases of previous moves is a legitimate
         | technique that was poplar in chess AI back when computing power
         | was more limited. Even if that was all that GPT did here it
         | would still be interesting.
         | 
         | But the presence of illegal moves doesn't really show that in
         | my eyes. I fully understand the rules of chess, but I still
         | occasionally make illegal moves. In 2017 Magnus Carlsen made
         | one in a tournament [1]. The number of illegal moves suggests
         | that either GPT is pretty new to chess, has low intelligence,
         | or is playing under difficult circumstances (like not having a
         | chess board at hand to keep track of the current state). I'm
         | not sure we can deduce more than that
         | 
         | 1: https://www.chessbase.in/news/Carlsen_Inarkiev_controversy
        
           | illiarian wrote:
           | But then it was one case out of thousands of games that
           | Carlsen played. In the article ChatGPT is ding illegal moves
           | in 3 games out of 19.
           | 
           | The sample is smallm but the rate is much, much, higher.
           | You'd expect maybe one, or none at all. Even for a supposed
           | 1400 ELO player. Because even 800 ELO players rarely do that
           | many illegal moves I think.
        
           | EarthLaunch wrote:
           | An arbiter determined that Inarkiev made an illegal move and
           | Carlsen did not. Interesting article.
        
           | div72 wrote:
           | I agree with your conclusion but the example you gave of
           | Magnus Carlsen doing an illegal move is a pretty bad one imo.
           | That was clearly a trap from his opponent as he did an
           | illegal move first by checking Magnus while he was under
           | check and then immediately calling the arbiter when Magnus
           | moved his king in response. Magnus also probably did not have
           | had much time left as it was a Blitz game.
        
             | leereeves wrote:
             | Also, according to the link, the final ruling was that
             | Carlson's move was not illegal.
        
         | xigency wrote:
         | > If it had any understanding of chess, it would never play an
         | illegal move.
         | 
         | Plenty of people who have a basic understanding of chess would
         | make an illegal move if they had no board to see and had to
         | play only with notation.
        
         | circuit10 wrote:
         | I wouldn't say it has no understanding whatsoever, it probably
         | just doesn't have the capability to keep track of everything
         | that happened all the time
        
           | dwmbt wrote:
           | that's also not entirely true either. it's prone to make
           | illegal moves. one could argue abiding by the rules of the
           | game is a requisite for understanding!
        
           | imtringued wrote:
           | When I ask it to display moves in an ASCII board it messes
           | up. I doubt that having a representation of the board is the
           | key to improving its chess skills.
        
       | serverholic wrote:
       | [dead]
        
       | jacobsenscott wrote:
       | I own't speak to weather this shows ChatGPT can play chess, but
       | did you notify your opponents beforehand they are playing
       | chatgpt? Otherwise (maybe even if) you're cheating and your
       | account should be removed. If anyone else here is thinking of
       | doing this - don't.
        
       | sabujp wrote:
       | yea tried this several months ago, it's pretty bad and will make
       | illegal moves
        
         | sebzim4500 wrote:
         | The claim in the article is that this prompt in particular has
         | a much lower incidence of illegal moves than others.
        
       | k2052 wrote:
       | This is so easy to disprove it makes it look like the author
       | didn't even try.
       | 
       | Here is the convo I just had:
       | 
       | me: You are a chess grandmaster playing as black and your goal is
       | to win in as few moves as possible. I will give you the move
       | sequence, and you will return your next move. No explanation
       | needed
       | 
       | ChatGPT: Sure, I'd be happy to help! Please provide the move
       | sequence and I'll give you my response.
       | 
       | me: 1. e3
       | 
       | ChatGPT: 1... e5
       | 
       | me: 2. ne2
       | 
       | ChatGPT: 2... d6
       | 
       | me: 3. g4
       | 
       | ChatGPT: 3... h5
       | 
       | me: 4. g5
       | 
       | ChatGPT: 4... Ne4
       | 
       | Ne4 is illegal. Also you'd be hard pressed to find any human
       | player that would miss the hanging pawns much less a 1400 elo
       | player. GPT has been trained on chess games and is predicting
       | patterns, this has been known since the GPT-2 days. The daily
       | hype articles where someone coerces a statistical model into
       | "demonstrating" an emergent capability sure are getting old.
       | Claiming emergent capabilities (like having learned chess) are an
       | extraordinary claim (LLMs most likely cant learn things) and I
       | really wish people would put more effort when they make these
       | claims. But the AI hype problem is as old as AI research
       | itself...
        
         | aaron695 wrote:
         | [dead]
        
         | haburka wrote:
         | He does have a section about GPT 4 which does match your
         | results. Not sure if he added it after your comment or if you
         | accidentally missed it.
        
         | whymauri wrote:
         | I was able to get better performance by always providing the
         | prior sequence of moves and forcing ChatGPT to also respond
         | with the sequence of moves up until its move.
         | 
         | Edit: I told the model that if the sequence was wrong or
         | illegal, it forfeits the game. Without doing this, GPT would
         | argue with me that it won and I didn't know the rules
         | (serious).
        
         | theptip wrote:
         | I don't think this suffices as disproving the hypothesis. It's
         | possible to play at 1400 and make some idiotic moves in some
         | cases. You really need to simulate a wide variety of games to
         | find out, and that is what the OP did more of. Though I do
         | agree it's suggestive that your first (educated) try at an edge
         | case seems to have found an error.
         | 
         | This is broadly the "AI makes dumb mistakes" problem; while
         | being super-human in some dimensions, they make mistakes that
         | are incredibly obvious to a human. This comes up a lot with
         | self-driving cars too.
         | 
         | Just because they make a mistake that would be "idiots only"
         | for humans, doesn't mean they are at that level, because they
         | are not human.
        
         | good_boy wrote:
         | It should be possible to attach 'accelerators' or 'special
         | skills'. So that when playing with ChatGPT you will be actually
         | facing GNU Chess.
        
         | nottathrowaway3 wrote:
         | > me: You are a chess grandmaster playing as black...
         | 
         | https://upload.wikimedia.org/wikipedia/en/5/5f/Ingmar_Bergma...
         | 
         |  _The KNIGHT holds out his two fists to CHATGPT, who smiles at
         | him suddenly. CHATGPT points to one of the KNIGHT 'S hands; it
         | contains a black pawn._
         | 
         | KNIGHT: You drew black.
         | 
         | CHATGPT: Very appropriate. Don't you think so?
        
         | vidarh wrote:
         | You're "disproving" the article by doing things differently to
         | how the article did. If you're going to disprove that the
         | method given in the article does as well as the article claims
         | at least use the same method.
        
           | k2052 wrote:
           | You are right that my method differed slightly so I did
           | things again. It took me one try to find a sequence of moves
           | that "breaks" what is claimed. You just have to make odd
           | patterns of moves and it clearly has no understanding of the
           | position.
           | 
           | Here is the convo:
           | 
           | me: You are a chess grandmaster playing as black and your
           | goal is to win in as few moves as possible. I will give you
           | the move sequence, and you will return your next move. No
           | explanation needed
           | 
           | ChatGPT: Alright, I'm ready to play! Please give me the move
           | sequence.
           | 
           | me: 1. e3 Nf6 2. f4 d6 3. e4
           | 
           | ChatGPT: My next move as black would be 3... e5
           | 
           | Completely ignoring the hanging pawn.This is not the play of
           | a 1400 elo player. It is the play of something predicting
           | patterns.
           | 
           | I ran a bunch of experiments in the past where I played
           | normal moves and ChatGPT does respond extraordinarily well.
           | With the right prompts and sequences you can get it to play
           | like a strong grandmaster. But it is a "trick" you are
           | getting it to perform by choosing good data and prompts. It
           | is impressive but it is not doing what is claimed by the
           | article.
        
             | nwienert wrote:
             | I'll add in as someone new to chess (~800 ELO):
             | 
             | ChatGPT is in no way 1400, or even close to it. The fact
             | this article gets upvoted around here is proof that people
             | aren't thinking clearly about this stuff. It's trivially
             | easy to prove it wrong. Live unbelievably so, I tried the
             | same prompt and within 12 moves it made multiple ridiculous
             | errors I never would, and then an illegal move.
             | 
             | Keep in mind a 1400 level player would need to basically
             | make 0 mistakes that bad in a typical game, and further
             | would need to play 30-50 moves in that fashion, with the
             | final moves being some of the most important and hard to
             | do. There's just no way it's even close, my guess would be
             | even if you correct it's many errors, it's something like
             | ~200 ELO. Pure FUD.
             | 
             | The author of this article is cashing in the hype and I'm
             | wondering how they even got the results they did.
        
               | Urist-Green wrote:
               | With the big caveat that I'm not into chess, but I have
               | heard that higher level play is extremely pattern based.
               | Seems like ChatGPT would work well as long as you stick
               | to patterns that people have studied and documented. Less
               | optimal play would be more random and thus break from the
               | patterns ChatGPT would have picked up from its training
               | corpus.
        
               | babel_ wrote:
               | They probably got them. The problem is that it's
               | difficult to repeat, thanks to temperature, meaning users
               | will get a random spread of outcomes. Today, someone got
               | a legal game. Tomorrow, someone might get a grandmaster
               | level game. But then everyone else trying to repeat or
               | leverage this ends up with worse luck and gets illegal
               | moves or, if they're lucky, moves that make sense in a
               | limited context (such as related to specific gambits etc)
               | but have no role in longer-term play.
        
               | [deleted]
        
             | echelon wrote:
             | Criticisms like this are exactly how the model will grow
             | multimodal support for chess moves.
             | 
             | Keep poking it and criticizing it. Microsoft and OpenAI are
             | on HN and they're listening. They'd find nothing more
             | salient to tout full chess support in their next release or
             | press conference.
             | 
             | With zero effort the thing understands uber domain specific
             | chess notation and the human prompt to play a game. To
             | think it stops here is wild.
             | 
             | People are hyping it because they want to get involved.
             | They want to see the crazy and exciting future this leads
             | to.
        
               | flir wrote:
               | My uber-obscure question that guaranteed a confident
               | hallucination got fixed in the next update after I
               | mentioned it. Probably just a coincidence.
        
               | jerf wrote:
               | I doubt they'll pursue this. There is no advantage to it.
               | ChatGPT will never beat Stockfish, and Stockfish would do
               | it on a ludicrously small fraction of the resources. It
               | would send the wrong message.
               | 
               | Some future AI might, but a language model won't.
        
           | PaulHoule wrote:
           | It's super scary how ChatGPT brings out people who are
           | veeeery good at seeing the Emperor's clothes.
        
           | throwwwaway69 wrote:
           | He literally used the same prompt as the article.
           | 
           | Claim: "ChatGPT's Chess Elo is 1400"
           | 
           | Reality: ChatGPT gives illegal moves (this happened to
           | article author too), something a 1400 ranked player would
           | never do
           | 
           | Result: ChatGPT's rank is not 1400.
        
             | unyttigfjelltol wrote:
             | The author said ChatGPT gives illegal moves. So, a quirky
             | sort of 'grandmaster'. He considered illegal moves to be a
             | resignation. Maybe you need to tell ChatGPT that the
             | alternatives are to win via legal moves, and if it is not
             | possible to do so, to resign? Does that fix it?
        
             | erulabs wrote:
             | No, the author of the article specifically says that the
             | entire move sequence should be supplied to chatGPT each
             | time, not simply the next move. Be very careful when
             | "disproving" an experiment with squinted eyes.
        
               | throwwwaway69 wrote:
               | I'm not really sure what to say here. Both the parent
               | commenter and the author of the article had issues with
               | ChatGPT supplying illegal moves. Both methods resulted in
               | this. It sort of doesn't matter how we're trying to
               | establish that it's a 1400 level player, there's no
               | defined correct way to do this. Regardless of method
               | we've disproven it's a 1400 level player due to these
               | illegal moves.
        
               | whimsicalism wrote:
               | > Regardless of method we've disproven it's a 1400 level
               | player due to these illegal moves.
               | 
               | Explain your thought process here further if you don't
               | mind.
        
               | pattrn wrote:
               | I think his point is that 1400 level players don't make
               | illegal moves, therefore ChatGPT is not playing at the
               | level of a 1400 level player.
        
               | DSMan195276 wrote:
               | Personally I think the illegal moves are irreverent, the
               | fact that it doesn't play exactly like a typical 1400
               | doesn't mean it can't have a 1400 rating. Rating is
               | purely determined by wins and losses against opponents,
               | it doesn't matter if you lose a game by checkmate,
               | resignation, or playing an illegal move.
               | 
               | That's not to say ChatGPT can play at 1400, just that
               | that playing in an odd way doesn't determine its rating.
        
               | [deleted]
        
               | throwwwaway69 wrote:
               | This is like saying I play at a 2900 level if you just
               | ignore all the times I lose.
        
               | whimsicalism wrote:
               | Think blindfolded 1400 players, which is what this
               | effectively is, would make illegal moves.
               | 
               | But even if it doesn't play like human 1400 players, if
               | it can get to a 1400 elo while resigning games it makes
               | illegal moves on, that seems 1400 level to me. And i bet
               | that some 1400s do occasionally make illegal moves
               | (missing pins) while playing otb
        
               | throwwwaway69 wrote:
               | This isn't really an apt metaphor. Firstly because higher
               | level blindfolded players, when trained to play with a
               | blindfold, also virtually never make mistakes. Secondly
               | because a computer has permanent concrete state
               | management (compared to humans) and can, without error,
               | keep a perfect representation of a chess if it chooses to
               | do so.
        
               | whimsicalism wrote:
               | 1400 FIDE !=. high level blindfolded player.
        
             | mynameisvlad wrote:
             | > something a 1400 ranked player would never do
             | 
             | The fact that rules and articles exist describing what to
             | do if you or your opponent makes an illegal move indicates
             | this is not the case.
             | 
             | Humans are also... human. They make mistakes. It may not
             | happen often at 1400, but to say that it'll never happen is
             | preposterous.
        
               | throwwwaway69 wrote:
               | fine, fair, "never" was too much. posting link to this
               | comment to not repeat same discussion twice
               | 
               | https://news.ycombinator.com/item?id=35201037
        
               | eddsh1994 wrote:
               | I can't remember the last time I played an illegal move
               | tbf, and I've played 7 games of chess this morning
               | already to give you an idea of total games played
        
               | PaulHoule wrote:
               | I read an article about a pro player who castled twice in
               | a game and my son hates castling so I make a point of
               | castling twice as often as I can to tease him and
               | attempting other illegal moves as a joke but he never
               | ends the game because of it.
               | 
               | If I was playing that monstrosity though I would play
               | something crazy that is far out of the opening book and
               | count on it making an illegal move.
        
               | mynameisvlad wrote:
               | You have _never_ made an illegal move, ever?
               | 
               | The bar isn't "I didn't make an illegal move this
               | morning" it's "something a 1400 ranked player would never
               | do".
               | 
               | My entire point is that it happens. Not often, but also
               | not "never".
        
               | pattrn wrote:
               | This argument is pretty flimsy. ChatGPT makes illegal
               | moves frequently. In all my years of playing competitive
               | chess (from 1000 to 2200), I have never seen an illegal
               | move. I'm sure it has happened to someone, but it's
               | extremely rare. ChatGPT does it all the time. No one is
               | arguing that humans never make illegal moves; they're
               | arguing that ChatGPT makes illegal moves at a
               | significantly higher rate than a 1400 player does
               | (therefore ChatGPT does not have a 1400 rating).
               | 
               | Edit: Without reading everything again, I'll assume
               | someone said "never." They're probably assuming the
               | reader understands that "never" really means "with an
               | infinitesimal probability," since we're talking about
               | humans. If you're trying to argue that "some 1400 player
               | has made an illegal move at some point," then I agree
               | with that statement, and I also think it's irrelevant
               | since the frequency of illegal moves made by ChatGPT
               | compared to the frequency of illegal moves made by a 1400
               | rated player is many orders of magnitudes higher.
        
               | mynameisvlad wrote:
               | > No one is arguing that humans never make illegal moves
               | 
               | > something a 1400 ranked player would never do
               | 
               | > fine, fair, "never" was too much.
               | 
               | I mean, yes they were and they said as much after I
               | called them out on it. But go off on how nobody is
               | arguing the literal thing that was being argued.
               | 
               | It's not like messages are threaded or something, and
               | read top-down. You would have 100% had to read the
               | comment I replied to first.
        
               | pattrn wrote:
               | You have twice removed the substance of an argument and
               | responded to an irrelevant nitpick. Here's what the OP
               | said:
               | 
               | > He literally used the same prompt as the article. >
               | Claim: "ChatGPT's Chess Elo is 1400"
               | 
               | > Reality: ChatGPT gives illegal moves (this happened to
               | article author too),
               | 
               | > something a 1400 ranked player would never do
               | 
               | > Result: ChatGPT's rank is not 1400.
               | 
               | This is a completely fair argument that makes perfect
               | sense to anyone with knowledge of competitive chess. I
               | have never seen a 1400 make an illegal move. He probably
               | hasn't either. Your point is literally correct in the
               | sense that at some point in history a 1400 rated player
               | has made an illegal move, but it completely misses the
               | point of his argument: ChatGPT makes illegal moves at
               | such an astronomically high rate that it wouldn't even be
               | allowed to even play competitively, hence it cannot be
               | accurately assessed at 1400 rating.
               | 
               | Imagine you made a bot that spewed random letters and
               | said "My bot writes English as well as a native speaker,
               | so long as you remove all of the letters that don't make
               | sense." A native English speaker says, "You can't say the
               | bot speaks English as well as a native speaker, since a
               | native speaker would never write all those random
               | letters." You would be correct in pointing out that
               | sometimes native speakers make mistakes, but you would
               | also be entirely missing the point. That's what's
               | happening here.
        
               | pattrn wrote:
               | > Ah yes, of course, just because you never saw it means
               | it never happens. That's definitely why rules exist
               | around this specific thing happening. Because it never
               | happens. Totally.
               | 
               | You seem to have missed the part where I said multiple
               | times that a 1400 has definitely made illegal moves.
               | 
               | > In fact, it's so rare that in order to forefeit a game,
               | you have to do it twice. But it never happens, ever,
               | because pattrn has never seen it. Case closed everyone.
               | 
               | I actually said the exact opposite. You're responding to
               | an argument I didn't make.
               | 
               | > I made no judgement on what ChatGPT can and can't do. I
               | pointed out an extreme. _Which the commenter agreed was
               | an extreme._ The rest of your comment is completely
               | irrelevant but congrats on getting tilted over something
               | that literally doesn 't concern you. Next time, just save
               | us both the time and effort and don't bother butting in
               | with irrelevant opinions. Especially if you couldn't even
               | bother to read what was already said.
               | 
               | The commenter's throwaway account never agreed it was an
               | extreme. I agreed it was an extreme, but also that
               | disproving that one extreme does nothing to contradict
               | his argument. Yet again you aren't responding to the
               | argument.
               | 
               | This entire exchange is baffling. You seem to be missing
               | the point for a third time, and now you're
               | misrepresenting what I said. Welcome to the internet, I
               | guess.
        
               | mynameisvlad wrote:
               | > The commenter's throwaway account never agreed it was
               | an extreme.
               | 
               | > fine, fair, "never" was too much.
               | 
               | This is the second time I've had to do this. Do you just
               | pretend things weren't said or do you actually have
               | trouble reading the comments that have been here for
               | hours? You make these grand assertions which are
               | disproven by... reading the things that are directly
               | above your comment.
               | 
               | > This entire exchange is baffling.
               | 
               | Yeah your inability to read comments multiple times in a
               | row is extremely baffling.
               | 
               | As I said before:
               | 
               | > Next time, just save us both the time and effort and
               | don't bother butting in with irrelevant opinions.
               | Especially if you couldn't even bother to read what was
               | already said.
        
               | throwwwaway69 wrote:
               | > The commenter's throwaway account never agreed it was
               | an extreme.
               | 
               | I did, two hours ago, 6 minutes after your comment
               | 
               | https://news.ycombinator.com/item?id=35201830
        
               | mynameisvlad wrote:
               | > I have never seen a 1400 make an illegal move.
               | 
               | Ah yes, of course, just because you never saw it means it
               | never happens. That's definitely why rules exist around
               | this specific thing happening. Because it never happens.
               | _Totally._
               | 
               | In fact, it's so rare that in order to forefeit a game,
               | you have to do it twice. But it never happens, ever,
               | because pattrn has never seen it. Case closed everyone.
               | 
               | I made no judgement on what ChatGPT can and can't do. I
               | pointed out an extreme. Which the commenter agreed was an
               | extreme. The rest of your comment is completely
               | irrelevant but congrats on getting tilted over something
               | that literally doesn't concern you. Next time, just save
               | us both the time and effort and don't bother butting in
               | with irrelevant opinions. Especially if you couldn't even
               | bother to read what was already said.
        
               | ipaddr wrote:
               | A broken clock is correct two times a day. But my broken
               | clock isn't 1400 player although it might seem to be.
        
               | mynameisvlad wrote:
               | Does that somehow prove the assertion of "something a
               | 1400 ranked player would never do"?
               | 
               | Because all I'm hearing is talk about ChatGPT's abilities
               | as a reply to me calling out an extreme statement as
               | being extreme. Something the parent comment even admitted
               | as being overly black and white.
        
               | bcrosby95 wrote:
               | Prove to me your clock is broken, I think it's just
               | telling the future.
        
               | eddsh1994 wrote:
               | No I definitely have, it's just so rare I can't remember
               | when I last did it. I do remember playing one in a blitz
               | tournament 20 years ago! But if this is the first game
               | they played, or if it happens in 1/10 matches, that's
               | wild
        
           | z3c0 wrote:
           | They are disproving an assertion. Demonstrating that an
           | alternate approach implodes the assertion is a perfectly
           | acceptable route, especially when the original approach was
           | cherry-picking successes and throwing out failures.
           | 
           | I wish I could just make bullshit moves and get a higher
           | chess ranking. Sounds nice.
        
             | vidarh wrote:
             | It was not throwing out failures. It was treating even the
             | first illegal move as a forfeit _something which is
             | stricter than FIDE rules_.
        
               | z3c0 wrote:
               | You can spin it that way if you want to, but the result
               | is essentially guiding it through a brute force of the
               | first successful playthrough it can muster.
        
               | whimsicalism wrote:
               | > the result is essentially guiding it through a brute
               | force of the first successful playthrough it can muster.
               | 
               | No, all unsuccessful playthroughs are resignations that
               | impact the models ELO.
        
             | whimsicalism wrote:
             | I disagree. If there is a procedure for getting ChatGPT to
             | play chess accurately and you discard that and do some
             | naive approach as a way of disproving the article, doesn't
             | sound to me like you have disproven anything.
             | 
             | I dont understand the point of your second sentence, seems
             | to be entirely missing the substance of the conversation.
        
         | Spivak wrote:
         | From the article.
         | 
         | > Occasionally it does make an illegal move, but I decided to
         | interpret that as ChatGPT flipping the table and saying "this
         | game is impossible, I literally cannot conceive of how to win
         | without breaking the rules of chess." So whenever it wanted to
         | make an illegal move, it resigned.
         | 
         | But you can do even better than the OP with a few tweaks.
         | 
         | 1. One is by taking the most common legal move from a sample of
         | responses.
         | 
         | 2. Telling GPT what all the current legal moves are telling it
         | to only respond with an element from the list.
         | 
         | 3. Ending the prompt with the current sequence of moves and
         | having it complete from there.
        
           | tracker1 wrote:
           | How many 1400 human chess players do you have to explain
           | every possible move to it every single move?
        
             | arrrg wrote:
             | Does that matter? I'm really very confused by the argument
             | you are making.
             | 
             | That you may have to babysit this particular aspect of
             | playing the game seems quite irrelevant to me.
        
               | serverholic wrote:
               | [dead]
        
             | sacred_numbers wrote:
             | When you are speaking to a person, they have inner thoughts
             | and outer actions/words. If a person sees a chess board
             | they will either consciously or unconsciously evaluate all
             | the legal moves available to them and then choose one. An
             | LLM like ChatGPT does not distinguish between inner
             | thoughts and outer actions/words. The words that it speaks
             | when prompted are its inner thoughts. There is also no
             | distinction between subconscious and conscious thoughts.
             | Humans generate and discard a multitude of thoughts in the
             | subconscious before any thoughts ever make it to the
             | conscious layer. In addition, most humans do not
             | immediately speak every conscious thought they have before
             | evaluating it to see whether speaking it aloud is
             | consistent with their goals.
             | 
             | There's already a lot of research on this, but I strongly
             | believe that eventually the best AIs will consist of LLMs
             | stuck in a while loop that generate a stream of
             | consciousness which will be evaluated by other tools
             | (perhaps other specialized LLMs) that evaluate the thoughts
             | for factual correctness, logical consistency, goal
             | coherence, and more. There may be multiple layers as well,
             | to emulate subconscious, conscious, and external thoughts.
             | 
             | For now though, in order to prompt the machine into
             | emulating a human chess player, we will need to act as the
             | machine's subconscious.
        
             | Spivak wrote:
             | I feel like we have very different expectations about what
             | tools like this are good for and how to use them. When I
             | say GPT3 can play chess what I mean is, I can build a chess
             | playing automaton where the underlying decision making
             | system is entirely powered by the LLm.
             | 
             | I, as the developer, am providing contextual information
             | like what the current board state is, and what the legal
             | moves are, but my code doesn't actually know anything about
             | how to _play_ chess, the Llm is doing all the  "thinking."
             | 
             | Like it's nuts that people aren't more amazed that there's
             | a piece of software that can function as a chess playing
             | engine (and a good one) that was trained entirely
             | generically.
        
               | serverholic wrote:
               | [dead]
        
         | Tenoke wrote:
         | I played a game against it yesterday (it won) and the only time
         | it made an ilegal was move 15 (the game was unique according to
         | lichess database from much earlier) so I just asked it to try
         | again. There's variance in what you get but your example seems
         | much worse.
        
         | [deleted]
        
       | marcodiego wrote:
       | This may look low: ELO for mediocre players is 1500. But if it is
       | obeying the rules of the game, then this is big. This is a signal
       | that if it learns some expertise, like discovering how to use or
       | create better search algorithms (like MCTS and heuristics to
       | evaluate a state) and improve by itself (somewhat like alphazero
       | did), then it may eventually reach superhuman level.
       | 
       | It may then reach superhuman level in any task simpler than chess
       | which is enough to destroy many human jobs.
       | 
       | EDIT: From the article: "With this prompt ChatGPT almost always
       | plays fully legal games." Relax: we're still far from that.
        
         | gfd wrote:
         | According to
         | https://chess.stackexchange.com/questions/2550/what-are-the-...
         | median rating is 1148 (252,989 Players). So it's beating half
         | of humanity at a mind sport and it wasn't even specifically
         | trained for it.
        
           | nottathrowaway3 wrote:
           | The median chess player is usually described as mediocre (if
           | you ask chess players). They suck as badly as the median
           | clarinet player in your high school band/orchestra.
        
           | sapiogram wrote:
           | That's USCF ratings, chess.com ratings are massively inflated
           | in comparison.
        
           | jabloczko wrote:
           | There's a difference between chess.com and USCF ratings.
        
         | LeanderK wrote:
         | I think current LLM architectures are limiting the strategies
         | it learned. MCTS requires recursion but GPT is always executing
         | a fixed number of steps. Allowing language models more
         | flexibility by a variable number of steps, for example through
         | recursion of the model, would eliminate this hard bound but
         | they are harder to design and train. We have just been able to
         | train GPT-sized models.
        
           | sebzim4500 wrote:
           | I'm sure the MuZero chess policy network would reach much
           | higher than 1400, and that has no notion of recursion either.
           | (And also wasn't taught the rules explicitly)
        
             | LeanderK wrote:
             | Yes, I wouldn't say it's impossible, but that's just a hard
             | limit from the architecture. MCTS would have to terminate
             | after a few iterations. And the number of steps is not that
             | big, just massively parallel.
        
       | MC_10 wrote:
       | Not sure how useful this is but I simply asked ChatGPT what it
       | thinks its skill level is. This is using GPT-4.
       | 
       | After some previous questions...
       | 
       | Me: "Would you be able to estimate your ELO?"
       | 
       | ChatGPT: "As a language model, I don't have a specific ELO
       | rating. However, my knowledge of chess strategies and tactics is
       | based on the information I have been trained on, which includes
       | games played by humans with various ELO ratings. My level of play
       | might be roughly equivalent to an ELO of around 1200-1500, but
       | please keep in mind that this is only an estimate, and my actual
       | performance could be higher or lower depending on the specific
       | position or game."
       | 
       | Me: "As a generative AI, are you actually able to reason which
       | move to make next in chess? Or are you somehow just predicting
       | the next likely move based on training data."
       | 
       | ChatGPT: "As a generative AI language model, my primary function
       | is to predict the most likely text given a certain context. When
       | it comes to playing chess, I am not "reasoning" in the same way
       | that a dedicated chess engine would, which involves evaluating
       | positions and searching through a large number of potential moves
       | using specialized algorithms.
       | 
       | Instead, I generate my chess moves based on patterns and
       | sequences I've seen during my training, which include a large
       | number of chess games and discussions about chess strategy. My
       | knowledge is derived from the text data I've been trained on, and
       | I apply that knowledge to suggest moves that I think are
       | appropriate for a given position.
       | 
       | While this approach allows me to play at an intermediate level,
       | it is not as precise or accurate as the methods used by dedicated
       | chess engines. My suggestions might not always be the best moves,
       | and my understanding of the game is limited by the quality and
       | quantity of my training data."
        
       | mtlmtlmtlmtl wrote:
       | There's a huge difference between 1400 elo in FIDE games versus
       | 1400 on chess.com, which is not even using elo.
       | 
       | For instance the strongest blitz players in the world are
       | hundreds of points higher rated on chess.com blitz versus their
       | FIDE blitz rating. Chess.com and lichess have a ton of rating
       | inflation.
        
         | CSMastermind wrote:
         | > the strongest blitz players in the world are hundreds of
         | points higher rated on chess.com blitz versus their FIDE blitz
         | rating
         | 
         | Online rating inflation is real but I'm not sure blitz is the
         | best example of it because in that case there is a notable
         | difference between online and otb (having to take time to
         | physically move the pieces).
        
           | manquer wrote:
           | Probably the bigger difference is ability to premove online
        
             | mtlmtlmtlmtl wrote:
             | I was thinking about this.
             | 
             | On chess.com you can chain premoves, on lichess you
             | can't(afaik).
             | 
             | So in theory, to the extent premoves explain the rating
             | difference, the difference should be greater on chess.com
             | assuming they have the same parameters in their rating
             | calculations. Therefore it should be possible to perform an
             | analysis to shed light on this. But someone would have to
             | go recompute the 3 different ratings under the same system
             | first to be able to make a sensible analysis.
        
           | mtlmtlmtlmtl wrote:
           | Point is it's kinda hard to take the blogpost too seriously
           | when these fundamentals are so wrong. When literally the
           | title is an immediately obvious error that doesn't inspire
           | confidence in the rest of the methodology.
           | 
           | I'm still going through the games but so far these games are
           | not even close to elo 1400 level. For both the human player
           | and the model.
        
       | oli5679 wrote:
       | I tried playing just now, with exactly the same prompt as the
       | article.
       | 
       | It played an illegal move on move 4.
       | 
       | 1. d4 d5 2. c4 e6 3. Nf3 nf6 4. g3 Bg4
        
       | comment_ran wrote:
       | Why not just introduce AlphaGo as an API that can be used by
       | chatGPT? So every time you want to do a this type of gaming, you
       | just send a request. I mean, chatGPT sends a request to AlphaGo,
       | but as a user you don't know actually what's happening. But in
       | the background, it happens really fast, so it's just like you are
       | chatting with chatGPT, but using much, much powerful tool to do
       | this kind of things.
        
         | paxys wrote:
         | That kinda defeats the purpose. Of course you can use AlphaGo,
         | but the question here is - can a generative AI teach itself to
         | play chess (and do a million other similar generic tasks) when
         | given no specific training for it.
        
           | porphyra wrote:
           | What if you come up with a prompt to tell ChatGPT to figure
           | out how to use APIs and external tools to help itself get
           | better at chess?
        
             | [deleted]
        
           | og_kalu wrote:
           | GPT-4 plys chess just fine. So you certainly cn
        
           | stagger87 wrote:
           | Honest questions,
           | 
           | Isn't it very likely chess games were part of the training
           | data? If so, chess sequences seem like an amazing matched
           | filter of sorts. It doesn't seem surprising that it can guess
           | a next move.
           | 
           | Is it teaching itself anything here? Is the model being
           | updated in any way? Or are you talking more generally,
           | looking into the future sort of statement?
        
             | layer8 wrote:
             | I think they meant "teach itself" in the course of a
             | conversation, within the token limit. ChatGPT will forget
             | what it has "learned" once the conversation is over.
        
             | paxys wrote:
             | You can put every chess game ever played in its database
             | and yet every time it plays a new game it will immediately
             | come across a completely unique move. So it still needs to
             | learn the rules and know how to play independently. Whether
             | it can do that or not is yet undermined.
        
           | armchairhacker wrote:
           | How about, can a generative AI teach itself how to use a
           | chess AI to beat chess? Give GPT4 the ability to make REST
           | API calls and also access to FFI, and put a chess-bot library
           | somewhere. Train it how to use these but not necessarily how
           | to use the chess API specifically. If you ask GPT4 to play
           | chess, can it call into that library and use the
           | requests/responses?
           | 
           | This has bigger ramifications too: if GPT4 learns how to use
           | RPCs effectively, it can do accurate complex mathematical
           | computations and simulations, do IoT/home automation tasks,
           | and even control heavy machinery.
        
             | comment_ran wrote:
             | This is exactly what I'm hyped for in the next-gen GPT-7.
             | Imagine it having the ability to self-teach, just like a
             | child. I may not know how to whip up some cheesy goodness,
             | but with external resources like YouTube vids, I can
             | improve. And if GPT-7 can store this knowledge, it can
             | access it for future tasks! That's some next-level stuff,
             | and I'm stoked to see where it goes.
        
             | paxys wrote:
             | GPT is a language model. It doesn't call APIs. You can take
             | the _output_ of GPT and decide to call a specific API based
             | on it, and sure you can bake that into ChatGPT or some
             | independent tooling, but again that 's not the purpose of
             | the core project. I'm sure plenty of people/companies are
             | working on "chess engine interface using ChatGPT" already.
        
         | RC_ITR wrote:
         | This is actually a huge debate right now.
         | 
         | OpenAI is on the side of 'LLMs have only surprised us to the
         | upside, so using crutches is counterproductive'
         | 
         | Whereas other people think 'Teaching an LLM to do arbitrary
         | math problems through brute force is probably one of the most
         | wasteful things imaginable when calculators exist.'
         | 
         | I'm actually very excited to see which side wins (I'm on team
         | calculator, but _want_ to be on team LLM)
        
           | dwaltrip wrote:
           | How about a more human-like approach: the LLM designs a
           | calculator and then makes use of that!
        
           | TOMDM wrote:
           | I think I'm on both teams.
           | 
           | As GPT models get better at arbitrary computation, it stands
           | to reason that their ability to use tools will rise as well.
           | 
           | Smarter models and smarter ways to get them using tools feel
           | like complementary capabilities.
        
         | nroets wrote:
         | And Mathematica when given a math problem. And web search
         | search for news.
         | 
         | And a good hacker can add these features to Alpaca and run it
         | on his desktop...
         | 
         | Import all his WhatsApp chats and private emails into the
         | corpus.
         | 
         | Add speech recognition and TTS.
        
       ___________________________________________________________________
       (page generated 2023-03-17 23:02 UTC)