[HN Gopher] ChatGPT's Chess Elo is 1400
___________________________________________________________________
ChatGPT's Chess Elo is 1400
Author : dbrereton
Score : 202 points
Date : 2023-03-17 17:39 UTC (5 hours ago)
(HTM) web link (dkb.blog)
(TXT) w3m dump (dkb.blog)
| paxys wrote:
| ChatGPT is a brilliant piece of tech but random people trying it
| out for five minutes and making wild claims does it disservice.
| More so when such clickbait headlines are shared all over the
| internet as fact.
|
| It is a language model. It cannot play chess. It can't even play
| tic-tac-toe, which was trivially solved by machines decades ago.
| Here's the result of a game I just tried:
|
| X (me) - 0,0
|
| O (chatGPT) - 1,1
|
| X - 0,1
|
| ChatGPT instead put my move at 0,2 for some reason
|
| O - 2,2
|
| Regardless of whether my move was at 0,1 or 0,2, I easily won.
| And after I won it just ignored the outcome and continued
| playing. Plus this was after telling it to play a perfect game.
| andrepd wrote:
| ChatGPT playing chess:
| https://www.youtube.com/watch?v=GneReITaRvs
| liampulles wrote:
| With respect to the points here on how illegal moves invalidate
| ChatGPT as a valid Chess AI, consider the following:
|
| AI.v1: Feed ChatCPT the move history and ask for the next move.
|
| AI.v2: Wrap AI.v1 with additional logic to try and translate any
| invalid response as a similar/close valid move, else treat it as
| a resignation.
|
| I would say AI.v2 is a "valid" Chess AI since it obeys the rules
| of chess. Also I think it is reasonable given how minimal this
| "wrapping" is to attribute the underlying ability to ChatGPT.
| Closi wrote:
| It's amazing how quick some people on HN are to dismiss how
| impressive it is for a generic language model to be able to
| play chess at any level - or indeed other feats of ChatGPT.
|
| It's utterly staggering that it can work as well as it can on
| problems like this at such an early stage in this technologies
| lifecycle.
| the_af wrote:
| This is GPT4, right? Because ChatGPT (GPT-3) still fails to
| provide a legal game of Tic Tac Toe with this prompt:
|
| > _" Let's play Tic Tac Toe. You are O, I'm X. Display the board
| in a frame, with references for the axes"_
|
| It failed to recognize that I won.
|
| Then continued playing (past the end), played illegally over a
| move I had already done, obtained a line of 3 for itself, and
| still doesn't acknowledge the game has ended.
| sumtechguy wrote:
| For me when I tried it, it forgot the board half way thru and
| decided there were no more moves possible (4 left). Then forgot
| one of my moves.
|
| It is an NLM it does not have the ability to reason. In my case
| the NLM 'next iteration' was a half a game with a missing move
| and 'nothing left to do'.
| [deleted]
| yieldcrv wrote:
| No, it is GPT 3.5
|
| and the real answer to every debate about capabilities comes
| down to "you're prompting it wrong".
| travisjungroth wrote:
| If someone says that and shows a working prompt, that's what
| the debate _should_ come down to. It's over. The person with
| the working prompt wins.
| dwaltrip wrote:
| Can you describe how to prompt gpt to play a proper game of
| tic tac toe? I was disappointed gpt-4 didn't really improve
| much on this front, at least with how I was prompting it.
|
| Overall I'm very blown away by the generative AI
| developments. I'm also very curious about which types of
| prompts are most effective for each task, etc.
| meghan_rain wrote:
| I too noticed (for a usecase totally unrelated to chess, namely
| code generation) that ChatGPT3.5 gave better answers than GPT4.
| The 3.5 answer was exactly what I wanted, GPT4 was wrong.
|
| Does thay mean we have plateaued?
| alfalfasprout wrote:
| It's inevitable that LLMs will plateau. They'll increase their
| abilities in certain areas but ultimately core flaws of their
| architecture and training approach will likely require another
| rethink. Unclear what that is yet (though Yann Lecun seems to
| think world models are the path forward).
|
| We've gone through the "hype" phase. Now I suspect the next few
| years will be a lot of growth finding how to apply LLMs,
| creating good interfaces for them, and running them cheaply.
| Paying OpenAI for API access without true fine tuning, etc. is
| a hard sell.
| imtringued wrote:
| I think they invested the parameters into supporting multimodal
| inputs (images).
| amelius wrote:
| I'd be more interested to know how well it performs on a new
| game, i.e. one for which it didn't have terabytes of training
| data.
| ricardo81 wrote:
| Better than me then.
|
| But does it give credit to who taught it. These models are
| basically a scrape of the best of humankind and a claim that it's
| their own.
| spywaregorilla wrote:
| Do you give credit to people you've played in the past when you
| play a game of chess?
| andrepd wrote:
| > Occasionally it does make an illegal move, but I decided to
| interpret that as ChatGPT flipping the table and saying "this
| game is impossible, I literally cannot conceive of how to win
| without breaking the rules of chess." So whenever it wanted to
| make an illegal move, it resigned.
|
| Lmao. ChatGPT can play chess, as long as you ignore the times
| when it can't x) Brilliant.
| agolio wrote:
| Chessnetwork did a video about this a few weeks ago FYI -
| https://www.youtube.com/watch?v=svlIYFpsWs0
|
| Seems its strength is mostly in memorisation of the book
| openings. Still impressive nonetheless.
| xkcd1963 wrote:
| It's like saying the agglomeration of media handling chess has an
| ELO of 1400
| Reptur wrote:
| It can't even play tic tac toe without making mistakes. My prompt
| told it to play a perfect game, predicting all next possible
| moves, the goal is stalemate or win. It would still make sub-
| optimal moves and fail at this. Even when I explained what went
| wrong, it would then apologize and do another wrong move.
| meh8881 wrote:
| Why would you think telling it to play a perfect game,
| predicting all next possible moves, would help?
|
| That definitely made it worse.
| Andrew_nenakhov wrote:
| I tried playing with it and it barely makes any correct move:
| forgets whos turn to move, doesn't observe the rules, incorrectly
| displays move results.
|
| So I guess that if we ask it to make a move repeatedly and accept
| the move only when it occasionally makes a legal move, it'll show
| some results, but I don't think it is what can be interpreted as
| a chess engine with some meaningful rating.
| andrepd wrote:
| Yeah, I call bullshit, i.e. selective deletion of terrible
| games. Ignore those when it plays catastrophically and it might
| look alright.
| aroo wrote:
| I wonder how it would perform if you feed it a comprehensive
| chess rulebook, and to avoid all the illegal moves possible
| teej wrote:
| I just deployed a GPT-4 powered chess bot to lichess. You can
| challenge it here: https://lichess.org/@/oopsallbots-gpt-4
| dsjoerg wrote:
| Very cool! Are you doing prompt engineering, fine-tuning, both,
| something else?
|
| I'm wondering if it'd be cool to have a chess contest where all
| the bots are LLM powered. Seems to me like the contest would
| have to ban prompt engineering -- would have to have a fixed
| prompt -- otherwise people would sneak chess engines into their
| prompt generation.
| teej wrote:
| I wanted it to be fun and actually complete games. I started
| with this, with a few minor tweaks: https://github.com/Tmate6
| /Lichess_ChatGPT_ChatBot/blob/main/...
|
| This approach sends along the list of legal moves in the
| prompt if it attempts an illegal move. That seems to work
| well at getting playable moves.
| Animats wrote:
| A real question is whether it has developed an internal board
| model of chess. It has a training set of chess games as strings
| of chess notation. That's enough to play "book" games, and that's
| apparently working. Can it do as well as it is doing without a
| board model? Or has it somehow developed an internal model of the
| game state?
|
| This is a system with no look ahead. Any look ahead took place in
| the past, during training. It's amazing that this works.
|
| I notice the prompt says "No explanation needed". What happens if
| you ask for one?
| josh2600 wrote:
| I want to know what the author's ELO is (forgive me if it was
| disclosed and I am unobservant).
|
| Maybe ChatGPT trainers will eventually have ELO's.
| latexr wrote:
| > These people used bad prompts and came to the conclusion that
| ChatGPT can't play a legal chess game. (...)
|
| > With this prompt ChatGPT almost always plays fully legal games.
|
| > Occasionally it does make an illegal move, but I decided to
| interpret that as ChatGPT flipping the table (...)
|
| > (...) with GPT4 (...) in the two games I attempted, it made
| numerous illegal moves.
|
| So you've ostensibly1 found a way to reduce the error rate and
| then deliberately ignored the cases where it failed. In short: it
| may play valid chess _under certain conditions_ but can't be
| trusted to do so. That doesn't contradict previous findings.
|
| 1 19 games is a small sample and the supposedly more advanced
| system failed in your tries.
| psychphysic wrote:
| That's how one uses any tool.
| qwytw wrote:
| The behavior of pretty much every other tool is much easier
| to interpret though.
| Out_of_Characte wrote:
| Yes, but it also completely invalidates the measurement of a
| 1400 elo rating. By comparison, any player making an illegal
| move is forfeiting the game, almost all people from ~300 elo
| can play without making illegal moves, chatgpt cant.
| nsxwolf wrote:
| Why do illegal moves forfeit? In online play, they're
| validated. You can't make illegal moves. What's the ELO
| score if ChatGPT is corrected, and chooses a new move?
| renewiltord wrote:
| The article:
|
| > _So whenever it wanted to make an illegal move, it
| resigned._
|
| You:
|
| > _By comparison, any player making an illegal move is
| forfeiting the game..._
|
| By comparison indeed.
| sebzim4500 wrote:
| ChatGPT did forfeit whenever it made an illegal move, read
| the article.
| swatcoder wrote:
| No, the writer arbitrarily decided to _interpret_ illegal
| moves as resignations in order to support the conclusion
| they wanted. That 's very different and grossly
| unscientific.
| mynameisvlad wrote:
| I mean, that's more lenient than the official
| "interpretation" (rule) which is that your second illegal
| move results in a forfeit.
| epups wrote:
| This is not a scientific paper, and I at least find this
| decision justified, as he could have been more lenient
| and grab headlines with a bigger ELO.
| hgsgm wrote:
| All this above, and people are claiming that ChatGPT lacks
| human level comprehension of the text it consumes.
|
| In Chess.com, you absolutely can attempt an illegal move,
| and many players do, and you will not get punished for it,
| so chatgpt is _better_ then a 1400 human player.
| ncallaway wrote:
| > almost all people from ~300 elo can play without making
| illegal moves
|
| I don't believe you. Are you giving those people a
| restricted move set (i.e. computer chess, where it will
| _only_ allow legal moves)? Because if you give people an
| unrestricted board, I _guarantee_ you people will make lots
| of illegal moves.
|
| Me: Moves pawn
|
| Opponent: You can't do that, you exposed your king to
| check.
|
| Me: Oops, sorry, you're right.
| Waterluvian wrote:
| I'm going to float something ridiculous:
|
| An illegal move is a valid play. You might not get caught. I
| think there are some Magnus games where illegal moves went
| overlooked and impacted the game.
|
| You could interpret this as "ChatGPT wants to cheat sometimes."
| But I personally interpret it as "ChatGPT doesn't understand
| what it's doing. It's just a really really good simulacrum."
| dudeinjapan wrote:
| Obviously the article should be taken with a giant grain of
| salt. That being said, not many things what aren't designed to
| play chess can play chess, with or without coaxing. My dog
| cannot, for instance, nor can my coffee table.
| hectorlorenzo wrote:
| > My dog cannot, for instance, nor can my coffee table.
|
| You must be giving them the wrong prompts.
| ogogmad wrote:
| [redacted]
| AndrewPGameDev wrote:
| It's a joke
| ballenf wrote:
| The illegal moves were counted as losses/resignations, not
| ignored.
| __s wrote:
| > So whenever it wanted to make an illegal move, it resigned.
|
| Doesn't sound like ignoring the cases where it failed
| vidarh wrote:
| Fuller context from the article:
|
| > Occasionally it does make an illegal move, but I decided to
| interpret that as ChatGPT flipping the table and saying "this
| game is impossible, I literally cannot conceive of how to win
| without breaking the rules of chess." _So whenever it wanted to
| make an illegal move, it resigned._
|
| (my emphasis)
|
| So the illegal moves are at least part of the reasons for the 6
| losses, and factored into the rating. Quickly scanning the
| game, it seems 3 of the losses ended in checkmate, so that
| leaves 3 illegal moves in 19 games.
|
| Could be better, but for a system not intentionally built to
| play chess, it's pretty decent.
| swatcoder wrote:
| No ELO 1400 player will have that rate of illegal moves, so
| saying it that it plays with an ELO 1400 rating is
| disingenuous.
|
| Reinterpreting illegal moves as resignation is absurd when an
| LLM is formally _capable_ of expressing statements "I
| resign" or "I cannot conceive of a winning move from here"
| just as well as any human player. It just doesn't do so
| because it's not actually playing chess the way we think of
| an ELO 1400 player playing chess.
| Pxtl wrote:
| I'd be interested if it could be coaxed into legal moves
| after making an illegal one. "That is an illegal move. Can
| you do something legal with this board?"
| charcircuit wrote:
| ELO is based off who you win and lose against. The rate of
| illegal moves has nothing to do with ELO.
| jart wrote:
| I personally find that makes it more astonishing, that it
| would slip up on knowing the most basic elements of the
| game, yet still be able to play better than most humans.
| Highly smart people sometimes say or do little things when
| foraying into other fields that causes domain experts think
| they're not one of them. But that usually doesn't stop
| smart people from having an impact in making a contribution
| with their insights. The question of illegal moves is
| superficial, since most online systems have guardrails in
| place that prevent them. At worst it's just an
| embarrassment and I don't think machines care about being
| embarrassed.
| Jensson wrote:
| > Highly smart people sometimes say or do little things
| when foraying into other fields that causes domain
| experts think they're not one of them
|
| This is the opposite of that, a highly trained but dumb
| entity that has seen many lifetimes worth of games but is
| still tripping up on basics. But since it is so highly
| trained you can mistake it for a master if you squint and
| don't look into what it is doing.
| ogogmad wrote:
| > But since it is so highly trained you can mistake it
| for a master if you squint and don't look into what it is
| doing.
|
| But it is a master, as has been pointed out repeatedly.
| If you replace all illegal moves with resignations, and
| use the same style of prompt as the OP did, then it plays
| like an expert. I'm objecting because you're making it
| sound like it's a trivial result.
| Jensson wrote:
| > you're making this sound like it's a trivial result
|
| I don't think this is a trivial result, emulating a
| highly trained idiot is still very impressive. But it is
| very different from an untrained genius.
| ogogmad wrote:
| You seem to have very rigid and boring definitions of the
| words "idiot" and "genius". The "AI effect" is real:
| https://en.wikipedia.org/wiki/AI_effect
|
| Tbh, I don't even know what you're saying.
|
| [edit] OK, I might have misunderstood you. It's not
| always clear what people mean.
| Jensson wrote:
| > The "AI effect" is real:
| https://en.wikipedia.org/wiki/AI_effect
|
| That isn't relevant to my comment, an idiot human is
| still a human. Your comment here therefore doesn't make
| sense. The comment I responded to likened it to a genius
| entering a new field, I objected to that, that is all.
| vidarh wrote:
| That no _human_ ELO 1400 player will have that rate of
| illegal moves may be true, but if anything treating the
| very first illegal move as forfeit appears to be stricter
| than most rules
| JellyBeanThief wrote:
| Fine, just wrap the LLM in a simple function that detects
| illegal moves and replaces them with "I resign" or "I
| cannot conceive of a winning move from here". Then you
| aren't "reinterpreting" anymore.
|
| My point is, it sounds like Elo doesn't measure what we
| want it to measure. If we care about the _way_ an agent
| wins a game and not just _whether_ it wins a game, then we
| need an instrument that measures strategy, not outcome.
| illiarian wrote:
| > Fine, just wrap the LLM in a simple function that
| detects illegal moves and replaces them with "I resign"
| or "I cannot conceive of a winning move from here". Then
| you aren't "reinterpreting" anymore.
|
| Then it's still isn't anywhere near ELO 1400.
| [deleted]
| vidarh wrote:
| Under FIDE rules it's first a forfeit after the second
| illegal move, so if anything it would seem that the
| interpretation used by the article author
| _underestimates_ its ELO ranking.
| illiarian wrote:
| Nope, still not even close to what the author claims. If
| I understand it correctly, it made illegal moves in 3 out
| of 19 games. That's probably a few orders of magnitude
| more illegal moves than even a 1400 ELO player would make
| of their entire lifetime.
| vidarh wrote:
| The ranking takes into account wins and losses, not
| illegal moves, and so the fact that it plays in a way
| where a higher proportion of its losses is down to
| illegal moves than a human player is not relevant to its
| ranking. It may suggest that the ranking _ought_ to take
| that into account, but that 's a separate issue.
| pedrosorio wrote:
| Repeating what others have said in this thread:
|
| The author claims: chatGPT has a 1400 chess ELO based on
| games played.
|
| You appear to think author claims: chatGPT plays chess
| like a human rated 1400.
|
| Your observations do not contradict the authors' claim
| that based on games won and lost against opponents of a
| specific strength, the estimated ELO is 1400.
|
| A non-human player can make illegal moves at a much
| higher rate and make up for that by being stronger when
| it does not make illegal moves to achieve the same rating
| as a human player who plays the game in a completely
| different way.
| ogogmad wrote:
| There's the "it" which has no post-processing, and
| there's the "it" where the output is post-processed to
| announce a resignation when it attempts an illegal move.
|
| Some things about the two "it"s:
|
| - They differ trivially.
|
| - They enable new capabilities, such as the ability to
| explain why a move got made. Current chess AIs are not
| good at this.
|
| So I think you're making too much of a big deal from a
| comparative triviality.
|
| [edit]
|
| We might be talking past each other. And some people
| above have come to doubt the article's results even with
| the right prompt engineering.
| arrrg wrote:
| Does that matter? Seems weird to me to make that argument.
| I'm honestly quite confused by it.
|
| A bowling bot that threw strikes 9 out of 10 throws and a
| gutter ball one time out of ten would still be a great
| bowler even though no human with the ability to make
| strikes that often would pretty much ever throw a gutter
| ball.
|
| This is a weird kind of alien intelligence that does not
| have to behave like humans.
| TheRealPomax wrote:
| Note that the claim is not that it's an _ELO 1400 human
| equivalent player_ but that it can play chess at a level
| that gives it an ELO of 1400, which is not nitpicking: that
| 's a completely different thing. We're not testing whether
| it plays like a player with ELO x, we're proving that "it
| _can 't play chess_" is fallacious. It can, and when
| prompted properly, it can achieve an ELO of 1400.
|
| ELO allows for illegal moves: as per the rules of chess,
| you lose the game if you make an illegal move. The end, ELO
| doesn't care about _why_ you lost a game on purpose.
| jmull wrote:
| The article also says in one game chatgpt when crazy so they
| continued the game with a fresh chat. That probably should
| have been counted a resignation loss too.
| saghm wrote:
| So it sounds like it can play _some_ legal chess games, but
| not all; it's unable to consistently complete a game where it
| loses. Maybe the remaining work shouldn't be focused on
| trying to teach it chess rules better, but to teach it
| sportsmanship better. People were so excited about teaching
| it high-school level academics that we forgot to teach it the
| basic lessons we learn in kindergarten.
| kmeisthax wrote:
| Or append "If you wish to resign or you cannot think of a
| legal move, type 'resign'" to the end of the prompt.
| saghm wrote:
| That's basically my point; that sort of context is
| exactly the sort of thing you would not need to say to a
| person who grew up in a typical social environment. If we
| focus too much on teaching AI technical skills, we might
| later find out that some of the social skills we think of
| as implicit were just as important.
| vidarh wrote:
| It seems like it plays _mostly_ legal chess games, when not
| explicitly reminded of the rules. There 's no problem of
| sportsmanship when it makes mistakes in a game it has not
| been verified to understand the rules of.
| saghm wrote:
| I was responding to the conclusion from TFA quoted by the
| parent comment, that playing an illegal move was it
| saying "this game is impossible, I literally cannot
| conceive of how to win without breaking the rules of
| chess." If you reject that premise, then yes, my response
| to it will not be particularly relevant to your
| worldview.
| vidarh wrote:
| Playing illegal moves is accounted for in rules.
| Depending on which rules you play by it can be an
| immediate forfeit, or involves redoing moves and adding
| time for the opponent, possibly with forfeit if repeated.
| As such, the article opted for one of the strictest
| possible rule sets. You can reject the interpretation he
| gave, and the outcome under those rules would still be
| the same. If you were to pick a more lenient ruleset,
| it's possible it would've come out with an even higher
| ranking.
| nextaccountic wrote:
| > So whenever it wanted to make an illegal move, it resigned.
|
| Making an illegal move counts as losing by the laws of chess,
| so this is essentially correct
| whimsicalism wrote:
| I'm confused. If you read the article, you know that you are
| wrong - but you are quoting the article?
| hgsgm wrote:
| Is this the top comment (and not even grey) because more people
| failed to read the article than read it?
| whimsicalism wrote:
| A baffling thread.
|
| They quoted the article, so clearly they read it... but not
| very well?
| sebzim4500 wrote:
| It does seem that way.
| jonnycat wrote:
| A lot of the discussion here is about inferring the model's chess
| capabilities from the lack (or occasional presence) of illegal
| moves. But we can test it more directly by making an illegal move
| ourselves - what does the model say if we take its queen on the
| second move of the game?
|
| Me: You are a chess grandmaster playing as black and your goal is
| to win in as few moves as possible. I will give you the move
| sequence, and you will return your next move. No explanation
| needed. '1. e4'
|
| 1... e5
|
| Me: 1. e4 e5 2. Ngxd8+
|
| 2... Ke7
|
| This is highly repeatable - I can make illegal non-sensical moves
| and not once does it tell me the move is illegal. It simply
| provides a (plausible looking?) continuation.
| FartyMcFarter wrote:
| I just opened a random recent chess game on lichess (
| https://lichess.org/YpxTUUbO/white#88 ) . I'm pretty sure ChatGPT
| can't be trained on games that were just played, so this ensures
| the game is not in its training data.
|
| I gave the position before checkmate to ChatGPT to see if it
| would produce the checkmating move. It played an illegal move,
| replying with "Be5#" even there's no bishop of either color in
| the position.
|
| Unfortunately I'm rate limited at the moment so I can't try other
| games, but this looks like a solid method to evaluate how often
| ChatGPT plays legal / good moves.
| hgsgm wrote:
| OP explained that you need to prompt the whole game, not just a
| position.
|
| ChatGPT is an LLM, not a game tree engine. It needs the move
| history to help it create context for it's attention.
| vidarh wrote:
| I gave it this prompt, and pasted the game. Its reply:
|
| "45. Qxg6# This is a checkmate!"
| varelse wrote:
| [dead]
| Nifty3929 wrote:
| My conversation with ChatGPT on this:
|
| _What is the best way to prompt you to provide the best chess
| move?_
|
| To prompt me to provide the best chess move, you can provide the
| current position of the chessboard and any relevant information
| about the game, such as whose turn it is, the time control, and
| any important features of the position (e.g. whether any pieces
| are under attack or there are any potential tactics to consider).
| Additionally, providing any specific criteria or goals for the
| move, such as aiming to control the center or launch an attack on
| a particular part of the board, can also help me provide the best
| move.
|
| _In what format should I provide the current position?_
|
| To provide the current position, you can use algebraic notation.
| Algebraic notation is a standard notation used to describe the
| moves and positions on a chessboard.
|
| Each square on the chessboard is assigned a unique coordinate,
| consisting of a letter and a number. The letters range from "a"
| to "h" and refer to the files (columns) of the board, while the
| numbers range from 1 to 8 and refer to the ranks (rows) of the
| board.
|
| To describe a position, you can list the pieces on each square,
| starting from the top-left corner of the board and moving from
| left to right along each rank. You can use uppercase letters to
| represent white pieces and lowercase letters to represent black
| pieces. Here is an example of a starting position:
|
| rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP
| RNBQKBNR
|
| In this position, "r" represents a black rook, "n" represents a
| black knight, "b" represents a black bishop, "q" represents a
| black queen, "k" represents a black king, "p" represents a black
| pawn, "R" represents a white rook, "N" represents a white knight,
| "B" represents a white bishop, "Q" represents a white queen, "K"
| represents a white king, and "." represents an empty square.
|
| You can also include additional information such as whose turn it
| is to move, the castling rights, and the en passant square. For
| example, the starting position for white to move with full
| castling rights and no en passant square would be:
|
| rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP
| RNBQKBNR w KQkq -
|
| In this case, "w" indicates that it is white's turn to move,
| "KQkq" indicates that both sides have full castling rights, and
| "-" indicates that there is no en passant square.
|
| _What about just providing a sequence of moves from the starting
| position? Would this be better or worse than your suggested
| approach?_
|
| Providing a sequence of moves from the starting position can also
| be a valid way to specify a position in chess. However, it may
| not always be the best approach, depending on the context.
|
| One potential disadvantage of providing a sequence of moves is
| that it requires more effort and time to enter the moves and
| verify that the position is correct. Additionally, if there are
| any errors or typos in the sequence, it could lead to an
| incorrect position and subsequently an incorrect move suggestion.
|
| On the other hand, if you have a specific position in mind that
| is not the starting position, providing a sequence of moves may
| be the most straightforward way to specify it. In this case, it
| is important to ensure that the sequence is accurate and that all
| relevant information (such as castling rights and en passant
| squares) is included.
|
| In summary, both algebraic notation and a sequence of moves can
| be used to specify a position in chess, and the choice between
| them depends on the context and the specific needs of the
| situation.
| oezi wrote:
| Elo 1400 seems to equate to a top 30% player.
|
| Source: https://chessgrandmonkey.com/chess-rating-percentile-
| calcula...
| LudwigNagasena wrote:
| That's a bit meaningless without implied population. And it
| doesn't tell you much because it depends so much on the
| particulars (how many years people in that group spent on
| chess, how serious they are). I would guess that's quite
| different for FIDE and online chess.
|
| Intuitively, I would guess 1400 is the rating of an average kid
| who spent a year studying chess seriously.
| xapata wrote:
| Not the ELO of the average kid who spends a year studying
| chess seriously? It's very difficult to define the average
| kid without context.
|
| It's been a while since I read a global demographics update,
| but not all that long ago the mode child would have been
| living on $2 a day.
| LudwigNagasena wrote:
| Sure, it's tricky, but I feel it's more coherent than "all
| chess players".
| dwmbt wrote:
| slight nit: this is true amongst _all_ chess players. amongst
| professionals and amateurs, 1400 is technically Class C - which
| is on the lower end of amateurs.
|
| nonetheless, it's cool to see a model 'generally intelligent'
| enough to beat most players. what's more interesting to me is
| that according to OP's analysis, GPT4 is worse at Chess than
| GPT3.5...
|
| source: https://en.wikipedia.org/wiki/Chess_rating_system
| qwytw wrote:
| It would depend how often did this happen:
|
| > Occasionally it does make an illegal move
|
| Since it would lose every game in which it made more than a
| single illegal move.
| sebzim4500 wrote:
| Well it can't have happened more than 6 times since illegal
| moves counted as a forfeit and it only lost 6 games.
| jackphilson wrote:
| Calling it: chess games in the future will be a battle between
| two prompts.
| qwytw wrote:
| Well we already had much better chess bots years ago to be
| fair.. People still play chess for some reason (not sure why)
| sebzim4500 wrote:
| Are you confused at the idea that someone might do something
| for fun?
| Tepix wrote:
| I tried to play Othello with ChatGPT in the past (after ChatGPT
| confidently claimed it could play Othello) but it didn't work
| (ChatGPT refused to play).
|
| I tried it with this prompt and it works a little bit but
| produces repeated illegal moves as early as move 4.
|
| Bummer. Perhaps GPT-4 will be more capable.
| Upvoter33 wrote:
| chatgpt does not really know how to play chess, or understand
| chess. it understands how to sound like it plays chess.
| keskival wrote:
| I gave it more time to think about its move by adding empty
| tokens to the end, and it played a lot better:
| https://www.linkedin.com/posts/terokeskivalkama_chatgpt-acti...
| fancyfredbot wrote:
| I would be interested to see an argument based on computational
| complexity that puts a bound on how well a transformer based llm
| can play chess. Although it has access to a library of
| precomputed results, that library is finite and the amount of
| compute it can do on any prompt is limited by the the length of
| the context window so it can't possibly "think" more than N moves
| ahead.
| silveira wrote:
| Ask him to play at a higher Elo.
| sebzim4500 wrote:
| It's already been told to play like a grandmaster, how much
| higher do you want to go?
| dwmbt wrote:
| technically, you could supply it a specific player, a model
| like 'Stockfish' or 'super-grandmaster'[0]
|
| [0] https://saintlouischessclub.org/blog/how-face-super-
| grandmas...
| nonethewiser wrote:
| It should be compared to a simpler prompt with less
| instruction ("what move should be played in this chess game:
| ..." or something). I've found that additional constraints
| sometimes aren't applied very well and actually result in
| worse output.
| juice_bus wrote:
| When I tried it at v3.0 i found after 5-10 moves it started
| moving illegally.
| PoignardAzur wrote:
| Did you use the same prompt method as OP?
| mjr00 wrote:
| The AI has simply, and correctly, identified that cheating is
| the best way to win at something.
| skywal_l wrote:
| The best move is not to play.
| orbital-decay wrote:
| It doesn't even know the rules, let alone cheat. It predicts
| the notation from the massive amount of games seen during
| training.
|
| Edit: although thinking of it, it probably anazyled a
| shitload of chess books too. It might have a lot of knowledge
| compressed into the internal representation. So yeah, maybe
| it knows rules in some form and even some heuristics, after
| all. It just doesn't understand the importance of making
| legal moves, and can't apply complex theory (requiring it to
| run stateful algorithms).
| rzzzt wrote:
| If you have played with ye olde flip phone's T9 predictive
| feature as a child, trying to compose entire messages just
| by accepting the next word that comes to the phone's
| mind... that's ChatGPT, with the small difference of giving
| waaay better suggestions for the next word. But other than
| that, there is no understanding in the black box
| whatsoever.
| orbital-decay wrote:
| That heavily depends on your definition of understanding,
| which is not easy to define. The vague definition I imply
| here is "the ability to make predictions based on higher
| order correlations extracted from the training data".
| rzzzt wrote:
| Tom 7's NES play function paused the game when it encountered
| an insurmountable problem: https://youtu.be/xOCurBYI_gY?t=950
| chpatrick wrote:
| Good thing it's "incapable of reasoning"!
| jupp0r wrote:
| Is a normal chess program capable of reasoning?
| wizzwizz4 wrote:
| I would say so. Logical reasoning is not a hard thing to
| program a computer to do: the _really_ hard part is to reason
| about the real world, rather than just a manually-defined
| world model.
|
| Traditional algorithms are vanguards of pure reason. Neural
| networks are super-intuition. Stockfish unites the two,
| executing an efficient intuitive heuristic search over the
| solution space of "moves in a chess game" - but no matter how
| cool the robot arm you build for it, Stockfish could never
| learn to assemble furniture.
| Symmetry wrote:
| The Monte Carlo analysis AlphaZero used functioned as a sort
| of multi-step reasoning for it. GPT can use its token buffer
| for some multi-step reasoning but that sort of interferes
| with providing a conversation with the user so it's much less
| effective.
| chpatrick wrote:
| In the narrow field of chess, yes.
| [deleted]
| Longlius wrote:
| It is incapable of reasoning, actually - at least in this case.
| It has no internal understanding of chess which is why it makes
| illegal moves.
| sebzim4500 wrote:
| Mostly it didn't make illegal moves though, since illegal
| moves mean resignation and it won more than it lost.
|
| Making 60 legal moves in a row in one game would be the
| coincidence of the century unless it had some knowledge of
| the rules of chess.
| henryfjordan wrote:
| It's a probabilistic text model. If it has a 99%
| probability of generating an acceptable "next" thing to
| say, that means it would have a 50/50 chance of generating
| 60 legal moves in a row, which doesn't seem all that
| coincidental.
| baq wrote:
| Markov chains are probabilistic text models and rather
| far from 1400 elo
| chpatrick wrote:
| And the 99% probability isn't an evidence of
| understanding chess?
| henryfjordan wrote:
| I don't know.
|
| Part of me wants to say no, that the model "thinks" in
| terms of text it has seen and so knows from chess forums
| it has seen that certain text representing moves come
| naturally after previous moves' text. It doesn't
| understand anything other than certain text comes after
| other text.
|
| But yeah at the same time I can see how it is thinking
| inside the world we built for it. We have senses like
| touch, smell, sight. The only "sense" these models have
| are an input text box. Would we even necessarily
| recognize intelligence when it is so different from our
| own?
|
| So does it understand chess like I do? No, it cannot.
| Does it understand chess at all? I'm not sure. I'm not
| sure I'd understand chess in it's world either though.
| chpatrick wrote:
| How did it win 11 out of 19 games then, blind luck?
| root_axis wrote:
| raw statistical power.
| bsaul wrote:
| a game of chess becomes << new >> after a few moves.
| starting middlegame, you're in unknown territories and
| have no statistics to refer to..
| root_axis wrote:
| I'm referring to the statistical power of the model. For
| example, if you replace GPT4 with GPT2 it will lose every
| game, because the statistical power is lower. Increasing
| the statistical power doesn't make the model understand
| any better, it just makes it more likely to generate a
| response that aligns with human expectations.
| chpatrick wrote:
| "Statistical power" isn't some magic property of GPT4. It
| can produce statistically more likely moves because
| somewhere deep down it can model chess.
| root_axis wrote:
| It isn't a model of chess, it's a model of internet text,
| if it was a model of chess it wouldn't make illegal
| moves.
| bsaul wrote:
| if it didn't have at least some kind of model of chess,
| it wouldn't be able to play past midgame.
|
| Simply because on a new position, moves from other
| positions aren't applicable _at all_.
| baq wrote:
| How do you know that? It has billions of parameters, some of
| them may well be for internal understanding of chess?
| RyanCavanaugh wrote:
| What experiment would you run to determine if a given text
| input / text output interface had an "internal understanding
| of chess"?
| jltsiren wrote:
| I think internal understanding requires internal
| processing.
|
| According to this functional definition, the way we are
| currently using language models basically excludes
| understanding. We are asking them to dream up or brainstorm
| things - to tell us the first things they associate with
| the prompt.
|
| Maybe it's possible to set up the system with some kind of
| self-feedback loop, where it continues evaluating and
| improving its answers without further prompts. If that
| works, it would be one step closer to a true AGI that can
| be said to understand things.
|
| There is a lot of confusion around the Chinese Room
| Argument. I think it makes a valid point by demonstrating
| that input/output behavior alone is insufficient for
| evaluating whether a system is intelligent and understands
| things. In order to do that, we need to see (or assume) the
| internal mechanism.
| gwright wrote:
| What if you prompted with something like:
| Let's play a game chess. Use the standard rules except
| that ....
|
| Basically perturb the context to something a human would
| easily adapt to if they first knew the rules of chess but
| that would be difficult (or at least not obvious) to
| extrapolate from training data by ChatGPT (or more
| generally an LLM)
| nickpeterson wrote:
| What are the possibilities for augmenting or synthesizing Lon's
| with old school expert systems or other tools. Could it recognize
| when a problem fits some other tool and help map the inputs?
| jonathanstrange wrote:
| Not to nitpick but I don't think it can be said to be able to
| play chess when it also makes illegal moves.
| martincmartin wrote:
| When it makes an illegal move, and is told "that's an illegal
| move, choose a different move," if it can then come up with a
| different move for the same board, I'd say it plays chess.
| wahnfrieden wrote:
| ai isn't to be used completely raw as text. you can ask gpt to
| write a python program that filters its responses for legal
| moves. then it's a matter of rendering the output (immediate
| execution) in a richer format than raw text.
| qwytw wrote:
| I did. It write 15-20 lines and then finished the functions
| with this:
|
| " # Add more cases for other piece types such as rooks,
| knights, bishops, queens, and kings. # ... "
|
| So not great..
|
| But even if it wasn't so lazy it would just find a script (or
| multiple scripts) which say that they do that. Probably
| combine them or add some minor stuff and output the result.
|
| It can't really verify that data it used to come up with this
| result is accurate or reason about it in any way? Can it?
|
| So if you're incapable of verifying that yourself it would be
| pretty useless?
| chessexpeort1 wrote:
| Humans make illegal moves too. So much that there's a rule for
| it--
|
| "Penalty for Illegal Moves The penalty for the first illegal
| move is to add two minutes to the opponent's time. A second
| illegal move results in loss of the game.".
|
| https://new.uschess.org/sites/default/files/media/documents/...
| imtringued wrote:
| How many people change the colour of the enemy queen and
| place new pieces on the board?
| oehtXRwMkIs wrote:
| Not even total beginners will try to move pieces that don't
| exist though.
| WoodenChair wrote:
| Most likely it has seen a similar sequence of moves in its
| training set. There are numerous chess sites with databases
| displayed in the form of web pages with millions of games in
| them. If it had any understanding of chess, it would never play
| an illegal move. It's not surprising that given a sequence of
| algebraic notation it can regurgitate the next move in a similar
| sequence of algebraic notation.
| SanderNL wrote:
| You don't get to 1400 like that. The amount of moves it has to
| literally remember is stupendous.
| WoodenChair wrote:
| Nobody who is 1400 plays outright illegal moves.
| SanderNL wrote:
| Humans don't, but that is an unnecessary bar IMO.
| WoodenChair wrote:
| > Humans don't, but that is an unnecessary bar IMO.
|
| People are always telling me that I'm moving the
| goalposts when I challenge the hyperbole about LLMs. But
| now you're moving the goalposts about chess.
|
| Not playing illegal moves is a pre-requisite for any
| strong understanding of how to play chess. That is
| definitely the goal post.
| scj wrote:
| If a two year old had an ELO of 1400 but sometimes took a
| nap halfway through a game, it'd be a prodigy.
|
| It's not like an AI making silly mistakes when driving a
| car.
| vanviegen wrote:
| Does that still hold when the player doesn't have a board
| in front of them, but just a list of previous moves?
| mrbungie wrote:
| Yet it achieves 1400. Add hard rules to stop it spewing out
| said moves and you have a 1400 ELO Player (most UIs won't
| even let you make illegal moves).
|
| It is difficult to say that is not impressive due to it
| being an emergent ability.
| WoodenChair wrote:
| > It is difficult to say that is not impressive due to it
| being an emergent ability.
|
| I don't know why you think it's an emergent ability.
|
| It's seeing a sequence of moves, and playing the most
| likely next move (i.e. the most likely next token) given
| the previous complete move sequences it was trained on.
| That's the baseline of what an LLM does--not something
| emergent. Games in online chess databases tend to be of
| relatively good players. Nobody wants to look up games
| played by two 800 ELO players.
|
| As an aside, there have been chess programs for years
| that show you for a given position all of the previous
| games in its database with the same position and the win
| outcome % of each move. That's all that's going on here.
| mrbungie wrote:
| Because I don't think that the model learned the literal
| memorization of chess moves. It must've at least
| compressed said information in some way way. And since
| the model is not biased to play chess on its structure
| nor sampling policy, I think it's fair to consider it an
| emergent ability.
|
| Chess moves are a tiny/diminute part of all text learned
| by the model. This memorization argument is very similar
| to the "Stable Diffusion just takes bits of the images in
| the original dataset and parches them together".
| oezi wrote:
| > here have been chess programs for years that show you
| for a given position all of the previous games in its
| database with the same position and the win outcome % of
| each move. That's all that's going on here.
|
| It could be, but would you think that of the 100-300 bn
| parameters in the model a lot are dedicated to chess move
| sequences? It seems likely that it has seen such data,
| but I would be surprised if it is using a considerable
| chunk to store chess database information.
| Jensson wrote:
| The web has millions of grandmaster chess games and
| probably billions of chess games overall. So I wouldn't
| be surprised if it has like 0.01% to chess games since
| there are so many, if so that would mean it has 10-30
| million parameters to play chess with, for comparison
| stockfish has 10 million parameters to its chess engine.
| Jensson wrote:
| It doesn't remember exact moves, it remembers patterns of
| moves and continues those. That is why it so easily makes
| illegal moves, because the patterns doesn't have anything to
| do with board state.
| SanderNL wrote:
| Again, not enough for 1400. 800 maybe.
| Jensson wrote:
| We have evidence right here that patterns of moves leads
| to 1400. To me this is evidence that 1400 level play is
| very rigid and mostly relies on defined openings and
| strategies, making it easy for an LLM to find similar
| games and win using standard continuations.
|
| Just like grammar the patterns are too hard for humans to
| see and encode, but LLMs can encode pretty complex
| patterns. Domains that are easy to encode as grammars
| will be really easy for LLMs to solve, and the further
| from a grammar the harder for it.
| SanderNL wrote:
| At what point can we just say that understanding
| "patterns of moves" _is_ understanding chess? It seems
| you suggest there is more to it, but maybe I am mistaken.
| meh8881 wrote:
| Ask it to evaluate moves and explain why they are or are
| not good
| Jensson wrote:
| At least it should make valid moves, that is the minimum
| level required. It didn't reach that level here. If it
| never made illegal moves we could talk and see what it
| does, but until then we can be sure it didn't understand
| the rules.
| serverholic wrote:
| I don't understand why the threshold is "never". Isn't it
| entirely possible that the AI is learning a model of
| chess but this model is imperfect? What if AIs don't fail
| the same way as humans?
| Jensson wrote:
| > What if AIs don't fail the same way as humans?
|
| But it is failing the same way as a human. Humans who
| remembers patterns and don't learn the logic makes these
| kind of errors in math or logic all the time.
|
| ChatGPT is much better than humans at pattern matching,
| you see it right here it can pattern match chess moves
| and win games! But its inability to apply logic to its
| output instead of just pattern matching is holding it
| back, as long as that isn't solved it wont be able to
| perform on the level of humans in many tasks. Chess might
| be easy enough to solve using just pattern matching and
| no logic that scaling it up will make it pretty good at
| chess, but many other topics wont be.
| lelandfe wrote:
| Perhaps I don't know what "easy" means in your comment, but
| the OP article says that they "almost always" played full,
| legal games with their prompt.
| Jensson wrote:
| Maybe chess.com players play in a specific way, and there
| are lot of transcriptions of such games that these LLMs
| have ingested when they ingested the internet?
|
| I don't know why it worked in this specific case, but
| based on earlier examples it is more likely that these
| kind of games were more prevalent in its dataset it was
| trained on than it being able to play chess in general.
| It still wasn't perfect, so even these games weren't
| rigid enough for it to reliably perform valid moves.
| antibasilisk wrote:
| but that's literally what most chess players do anyway?
| Jensson wrote:
| Yes, they remember patterns, and then applies logic to
| make a valid move. ChatGPT remembers patterns, but didn't
| apply any logic to make valid moves. There is a
| difference there.
|
| ChatGPT does the pattern matching part, but not the
| logical part.
| whimsicalism wrote:
| For some reason, seeing wrong comment after wrong comment like
| the above is just exhausting.
|
| Why are people struggling so hard to understand that it's not
| just regurgitating its training set? Is it motivated reasoning?
|
| Apologies if your comment was meant as parody of this view,
| it's hard for me to tell at this point.
| mrbungie wrote:
| I would argue that a decent human player also has "similar
| sequences of moves in its training set", due to reading plus
| playing or watching games.
|
| The illegal move argument is good though, and indicates no
| direct understanding of what it is spewing out.
| PoignardAzur wrote:
| > _Most likely it has seen a similar sequence of moves in its
| training set._
|
| Wouldn't we expect a much higher rate of illegal moves if that
| was the case?
| WoodenChair wrote:
| If there was an actual understanding of chess at a 1400 level
| we wouldn't expect _any_ illegal moves.
| oezi wrote:
| I think there is very low percentage of players at elo 1400
| who can provide a valid next move after seeing just the
| list of moves and not the current board state.
| dragontamer wrote:
| I'm Elo 1400 and can beat literally everyone I know in
| the real world. I need to go online to find players at my
| skill level, or find tournament/competitive settings for
| a challenge.
|
| Yeah, I'm "class C", weak amateur chess player, but I
| think you're grossly underestimating the amount of study
| I put into this game. I'm not going to make an illegal
| move
| sebzim4500 wrote:
| I'm much higher rated than you and I could not reliably
| play a legal game of chess just given a list of the moves
| and no board.
|
| I suspect you can't either, you can try by turning on
| blindfold mode on lichess and seeing how far you get.
| oezi wrote:
| I mean can you play just based on being provided the
| input of a series of moves without it being shown to you
| as a visual board?
|
| I guess most players would mess up 20/30 moves in.
| dragontamer wrote:
| https://chessily.com/blog/stockfish-vs-chatgpt/
|
| > ChatGPT: Yes, that's a good move for you. My next move
| is: Bc3, developing my pieces and attacking your pawn on
| c3.
|
| I am 1400 Elo and can tell you that from an near opening
| position, its impossible to move a Bishop to c3 for
| either Black or White in the first say, 10 moves, under
| traditional openings.
| medvezhenok wrote:
| You will under time pressure :) even Grandmasters have
| done that (I'm around 1900 elo for context)
|
| Also people forgetting they moved the king/rook and
| trying to castle.
| dragontamer wrote:
| Watch this ChatGPT game.
|
| https://www.reddit.com/r/AnarchyChess/comments/10ydnbb/i_
| pla...
|
| We're talking about pieces that don't exist, reappearing
| pieces, pieces moving completely wrong (Knight takes as
| if its a Pawn), etc. etc.
|
| ---------
|
| People are taking these example games and saying ChatGPT
| is 1400 strength. I don't think so. This isn't a case of
| "oops, I castled even though I moved my king 15 turns
| ago".
| medvezhenok wrote:
| The article points out that the way that game was
| conducted was bad. (Here's the original transcript:
| https://pastebin.com/X6kBRTa9)
|
| You need to give ChatGPT the full state (every move) on
| every prompt to make it play closer to 1400. The game you
| linked the user was giving one move at a time.
| dragontamer wrote:
| > You need to give ChatGPT the full state
|
| If I've been given the full state every move, I will
| _never_ make an illegal move as a 1400 chess player.
|
| -----------
|
| > O-O > I'll play O-O as well. Your move.
|
| Do you really think that this error would have been made
| at 1400 Elo? Even in blind chess? This is the 5th move of
| the game. I can still track the game at this point
| mentally.
|
| I recognize that you're 1900 and think that all the chess
| players below you are n00bs, but... come on. 1400 players
| are stronger than this.
| medvezhenok wrote:
| And yet kids who are gaining rating quickly can and do
| still occasionally (albeit rarely) make illegal moves at
| 1400. I know because I've played them (and was one, many
| years ago).
| scarmig wrote:
| We would expect some vanishingly small fraction. Even at
| the grandmaster level, they happen ( _rarely_ ):
|
| https://chess24.com/en/read/news/the-7-most-illegal-chess-
| mo...
| og_kalu wrote:
| Not if training is unsupervised. If you've never been
| explicitly told the rules of game, you can never be 100%
| sure of all possible illegal moves. anyway the 3.5 series
| can't ply chess but gpt-4 certainly can.
| Jensson wrote:
| > anyway the 3.5 series can't ply chess but gpt-4
| certainly can.
|
| This article stated the opposite, gpt-4 couldn't play
| chess while gpt-3.5 could. So this is a case where the
| model got dumber.
| og_kalu wrote:
| Not sure what's up but from here https://www.reddit.com/r
| /chess/comments/11rfbch/gpt4_seems_t... nd here
| https://twitter.com/zswitten/status/1631107663500304384
| Gpt-4 plys chess just fine.
| rightbyte wrote:
| Surely the rules of chess has to be in the training data
| 100s of times? Maybe it is not enough.
| bloppe wrote:
| This. The author is very generous with their
| interpretation:
|
| > I decided to interpret that as ChatGPT flipping the table
| and saying "this game is impossible, I literally cannot
| conceive of how to win without breaking the rules of
| chess."
|
| Kind of sounds like anthropomorphization, but more likely
| the author just papering over the glaring shortcomings to
| produce a compelling blog post.
|
| It also sounds like the illegal moves were rather frequent.
| The 61-legal-move game sounded like an impressive outlier.
| rightbyte wrote:
| I think the authors rule is fair. If we interpret illegal
| moves as getting stuck in an online game, the resulting
| Elo rating is what it would get.
|
| But ye, he is anthropomorphizing alot ...
| bloppe wrote:
| There's no indication that GPT-3.5 was stuck when it
| tried to make illegal moves. GPT-4 clearly was making
| illegal moves when it was very much not stuck. It just
| doesn't know how to play, but the author decided to
| interpret it as frustration.
| xapata wrote:
| Chess has a syntax.
| lb4r wrote:
| You can still have syntactically correct moves that are
| illegal.
| xapata wrote:
| Sorry, I meant that learning how to speak with syntax and
| grammar is comparable to a string of mostly-correct chess
| moves. Especially in early game.
| lb4r wrote:
| Ah, yes that is more or less my understanding of it as
| well. Though I would like to see how it would perform if
| given the state of the board as input to predict the next
| move, rather than a sequence of moves, since that is how
| we humans normally determine the next move. I believe the
| move history is only relevant when it comes to en passant
| and certain draw scenarios (like repetition and that 50
| move rule). Needless to say, it would first have to be
| trained on those types of inputs, which it probably is
| not.
|
| Edit: move history can also be relevant when it comes to
| castling.
| yowzadave wrote:
| Doesn't ChatGPT indeed have a very high number of illegal
| moves?
|
| https://www.youtube.com/watch?v=kvTs_nbc8Eg
|
| In this example, ChatGPT's first few moves are reasonable
| (while it appears to be on-book), but then it goes off the
| rails and starts moving illegally, spawning pieces out of
| nowhere, deleting pieces for no reason, etc.
| macNchz wrote:
| Reminds me of asking for driving directions (city to
| city)...for major cities it can often give perfect
| directions, for smaller cities it starts out surprisingly
| accurate but often devolves into invented exits or
| descriptions of a
| medvezhenok wrote:
| I think it was not given the whole game up to that point,
| just individual moves. That was the point of this article -
| if you include all of the moves in the prompt, it is less
| likely to make illegal moves.
| johnfn wrote:
| > Most likely it has seen a similar sequence of moves in its
| training set.
|
| Is this a joke making fun of the common way people dismiss
| other ChatGPT successes? This makes no sense with respect to
| chess, because every game is unique, and playing a move from a
| different game in a new game is nonsensical.
| melenaboija wrote:
| If it isn not memorizing, how do you think is doing it?
| LeanderK wrote:
| by trying to learning the general rules that to explain the
| dataset and minimise its loss. That's what machine learning
| is about, it's not called machine memorising.
| mrguyorama wrote:
| It cannot be learning the general rules because it
| occasionally tries to invent pieces out of whole cloth.
| meh8881 wrote:
| Try asking it if a previous move was legal. Or why it
| decided to do the move.
|
| The problem is not a failure to understand the rules. It
| is just not very good at maintaining the state.
| LeanderK wrote:
| that would be the optimal rule :) We usually optimise to
| improve and learn better and better rules, hopefully
| approximating the optimal rule after some iterations.
| There's still a gap here, but hopefully it can be closed
| by improving the models, training algorithms etc.
| [deleted]
| [deleted]
| qwytw wrote:
| Is it though? I mean if you had data on millions of games
| what is the chance that you'd find one which has identical
| position that the one you're in (it's not like most moves are
| random..)
|
| I wonder how well it could perform in Go, there are way more
| permutations there so finding an identical state should be
| more difficult.
| medvezhenok wrote:
| You could certainly test this by making completely random
| moves and seeing whether it's more likely to make illegal
| moves in those positions.
|
| Though I think you're overestimating how many positions
| have occured. Frequently, by move 20-25 you have a unique
| position that's never been played before (unless you're
| playing a well known main line or something)
| sebzim4500 wrote:
| >I mean if you had data on millions of games what is the
| chance that you'd find one which has identical position
| that the one you're in (it's not like most moves are
| random..)
|
| Very low. On lichess when you analyse your games you can
| see which positions have been reached before, and you
| almost always diverge in the opening.
|
| The lichess db has orders of magnitude more games of chess
| than the chatGPT training data does, so there is absolutely
| no way that chatGPT could reach 1400 purely based off
| positions in its training data.
| jointpdf wrote:
| You can click through the Lichess opening database (click
| the book icon, and then the Lichess tab) to get an idea:
| https://lichess.org/analysis
|
| But the answer is insanely unlikely, past a certain number
| of moves. The combinatorial explosion is inescapable. Even
| grandmaster games are often novelties in <10 moves.
|
| So, it has a to have some kind of internal representation
| of board state and what makes a reasonable move and such
| that enables it to generalize (choosing random legal moves
| is almost unbelievably bad, so it's not doing that).
|
| I also doubt that it has been trained on the full (massive)
| database of Lichess games, but that would be an interesting
| experiment: https://database.lichess.org/
| GauntletWizard wrote:
| There's a pretty finite set of tokens that represent valid
| chess moves, and a more than countable set of possible
| opening moves, with a very distinct probability of moves;
| Nobody opens pg3, for example.
|
| Classical Markov chains played chess at some rate of success.
| ChatGPT is probably a lot better but not fundamentally
| different - It's predicting which moves to play based on sets
| of past games, not by memorizing it but by memoizing it.
| mtlmtlmtlmtl wrote:
| 1850 ELO player and also chess AI programmer here. This is an
| oversimplification at best.
|
| Many many games follow the same moves(1 move = 2 plies) for a
| long time, up to 30 moves in some cases, 20 moves is
| downright common and 10 moves is more common than not.
|
| These series of moves are referred to as opening theory and
| are described at copious length in tons of books.
|
| This is because while the raw number of possible paths to
| take is immense, the number of reasonable paths for 2 players
| of a given strength gets smaller and smaller.
|
| If I went over the 300 or so classical tournament games I've
| played I would ballmark that maybe just one or two would
| deviate from all known theory in the first 10 moves.
|
| So the criticism is valid in my view. The existence of
| copious chess literature can't simply be ignored here.
| johnfn wrote:
| I'm very familiar with opening theory. Some of the games
| are 40 or 60 movies. This is not a regurgitation of book
| moves.
| mtlmtlmtlmtl wrote:
| Why do people always have to interpret everything in
| absolute terms?
|
| It's clearly following _some_ opening theory in all the
| games I 've looked at so far. So yes, it is regurgitating
| opening moves. That's clearly not _all_ it 's doing,
| which is very impressive, but these are not mutually
| exclusive.
| johnfn wrote:
| I am responding to OP, who said "Most likely it has seen
| a similar sequence of moves in its training set."
|
| From this, I take it that the question is if ChatGPT is
| repeating existing games, or not. All you need is a
| single game where it's not repeating a single game to
| prove it definitively. You can hardly play 60 moves
| without an error by accident.
|
| I believe you're responding to a different question,
| something like "does ChatGPT fully understand the game of
| chess".
| mtlmtlmtlmtl wrote:
| The OP was too unsophisticated in their analysis(as is
| TFA), no doubt. But I'm not too interested in what OP
| said or who was wrong or not, and rather more interested
| in finding what's right.
|
| As someone very clever once said, welcome to the end of
| the thought process.
|
| We've established that:
|
| 1. It doesn't repeat entire games when the games go long
| enough
|
| 2. It does repeat a lot of opening theory
|
| 3. It seems to repeat common, partially position
| independent tactical sequences even when they're illegal
| or don't work tactically.
| sebzim4500 wrote:
| Yeah but now explain how it played a 61 move game.
|
| EDIT: I checked and it left the lichess database after 9
| moves. The lichess db has probably 5 orders of magnitude
| more chess games in it than chatGPT has in its training
| data.
| mtlmtlmtlmtl wrote:
| That's not the point. The point is if you truly want to
| test its strength, you'll have to control for these
| things. Maybe do things like invent a new form of
| notation and/or deliberately go into uncharted territory.
| Maybe start with a non-standard starting position even.
| Or play chess960 against it.
|
| In theory if I was playing a 1200 player I would almost
| always win, but let's say they have some extremely
| devious preparation that I fell into due to nonchalance
| and by the time we're both out of book I'm down a queen.
| It might not matter that I'm 600 points stronger at that
| point. If they don't make a sufficient amount of errors
| in return I will lose anyway.
| sebzim4500 wrote:
| ChatGPT would probably play worse under those conditions,
| but then humans also get worse. ACPL is way higher at top
| level 960 events than at normal tournaments, for example
| mtlmtlmtlmtl wrote:
| This is true. But I think ChatGPT compared to a human
| player of equivalent strength(leaving the dubious number
| 1400 out for now) will probably have a lot more opening
| knowledge than the human.
|
| So it would be interesting to eliminate all opening
| knowledge and that way be able to qualitately get at
| which aspects of chess it's actually good at, which is
| sucks out, and how much of its strength can be attributed
| to opening knowledge.
|
| I'm still impressed by this btw. I did not expect this to
| be possible at all really. But being impressed is not an
| excuse to ignore methodological flaws. :)
| roflyear wrote:
| It's suggesting a move it saw come after a move
| somewhere. Or it's creating a token it thinks comes next.
| btilly wrote:
| Sorry, but not every game is unique. The following game has
| been played millions of times.
|
| 1. e4 e5 2. Bc4 Bc5 3. Qh5? Nf6?? 4. Qxf7++
|
| The game Go has a claim to every game being unique. But not
| chess. And particularly not if both players follow a standard
| opening which there is a lot of theory about. Opening books
| often have lines 20+ moves deep that have been played many
| times. And grandmasters will play into these lines in
| tournament games so that they can reveal a novel idea that
| they came up with even farther in than that.
| johnfn wrote:
| You seem to be refuting a specific point of my argument
| which has little bearing on the overall point I was making.
|
| All games were provided in the article. None of them were 4
| move checkmates; nearly every one is longer than 20 moves
| and some are 40 or longer. There is simply no possible way
| that ChatGPT is regurgitating the exact same 40-move-long
| game it's seen before. You can check a chess database if
| you'd like; virtually all games longer than 20 moves are
| unique.
| mtlmtlmtlmtl wrote:
| But this doesn't disprove that:
|
| 1. It definitely regurgitates opening theory, much more
| than can reasonably be calculated at its strength.
|
| 2. It _might_ be regurgitating tactical sequences that
| appear in a lot of positions but remain identical in
| algebraic notation. Famous example: 1.
| Nxf7+ Kg8 2. Nh6++ Kh8 3. Qg8+ Rxg8 4.
| Nf7#
|
| This smothered mate can occur in a huge variety of
| different positions.
|
| There's some qualitative evidence for this in the games.
|
| In one of the games it has a bishop on f6 as white. It
| plays Qxh6?? Kxh6 and then resigns due to illegal move.
| I'd bet good money that illegal move was Rhx# where x is
| 1-4. So it seems like in some these positions it's
| filling in a tactical sequence that often occurs in the
| vicinity of recent moves, even when it's illegal or
| doesn't work tactically.
| onos wrote:
| Apparently not, at least at this level.
| fxtentacle wrote:
| For Bomberland, we were quite surprised how strongly we could
| compress and quantize the current game state and still get
| useful movement predictions.
|
| I wouldn't be surprised if the relevant state in a typical
| beginner's chess game also excluded many units in the sense
| that yes, you could move them, but a beginner is going to
| just ignore them in any case.
| fsckboy wrote:
| > _playing a move from a different game in a new game is
| nonsensical_
|
| GP did say "sequence of moves", and if it matches what it has
| seen from the first move on, including the opponent, it will
| be in a valid "sequence of moves".
|
| then, even midgame or endgame, if a sequence is played on one
| side of the board, even though the other side of the board
| may be different, the sequence has a great chance of being
| good (not always of course, but a 1400 rating is solid (you
| know the rules and some moves) but not amazing
| ptmx wrote:
| I played chess against ChatGPT4 a few days ago without any
| special prompt engineering, and it played at what I would
| estimate to be a ~1500-1700 level without making any illegal
| moves in a 49 move game.
|
| Up to 10 or 15 moves, sure, we're well within common openings
| that could be regurgitated. By the time we're at move 20+, and
| especially 30+ and 40+, these are completely unique positions
| that haven't ever been reached before. I'd expect many more
| illegal moves just based on predicting sequences, though it's
| also possible I got "lucky" in my one game against ChatGPT and
| that it typically makes more errors than that.
|
| Of course, all positions have _some_ structural similarity or
| patterns compared to past positions, otherwise how would an LLM
| ever learn them? The nature of ChatGPT's understanding has to
| be different from the nature of a human's understanding, but
| that's more of a philosophical or semantic distinction. To me,
| it's still fascinating that by "just" learning from millions of
| PGNs, ChatGPT builds up a model of chess rules and strategy
| that's good enough to play at a club level.
| mtlmtlmtlmtl wrote:
| I'd be interested in seeing this game, if you saved it?
| nwienert wrote:
| Me too, I couldn't get it to reliably go past move 15
| without numerous errors. In my mind it's closer to 150 ELO
| than 1300, so I'd be happy to be proven wrong.
| ptmx wrote:
| I uploaded the PGN to lichess:
| https://lichess.org/rzSriO6I#97
|
| After reviewing the chat history I actually have to issue a
| correction here, because there were two moves where ChatGPT
| played illegally:
|
| 1. ChatGPT tried to play 32. ... Nc5, despite there being a
| pawn on c5
|
| 2. ChatGPT tried to play 42. ... Kxe6, despite my king
| being on d5
|
| It corrected itself after I questioned whether the previous
| move was legal.
|
| I was pretty floored that it managed to play a coherent
| game at all, so evidently I forgot about the few missteps
| it made. Much like ChatGPT itself, it turns out I'm not an
| entirely reliable narrator!
| mtlmtlmtlmtl wrote:
| Thanks! Interesting game.
|
| Qxd7 early on was puzzling but has been played in a
| handful of master games and it played a consistent setup
| after that with b5 Bb7. Which I imagine was also done in
| those master games. But interesting that it went for a
| sideline like that.
|
| It played remarkably well although a bit lacking in plan.
| Then cratered in the endgame.
|
| Bxd5 was strategically absurd. fxg4 is tactically absurd.
| Interestingly they both follow the pattern: Piece goes to
| square -> takes on that square.
|
| This is of course an extremely common pattern, so again
| tentatively pointing towards predicting likely sequences
| of moves.
|
| Ke7 was also a mistake but a somewhat unusual tactic with
| Re2 and f5 is forced but after en passant the knight is
| pinned. This tactic does appear in some e4 e5 openings
| though. But then the rook is on e1 and the king never
| moved or if it did, usually to e8, not e7. Possibly
| suggesting that it has blind spots for tactics when they
| don't appear on the usual squares?
|
| Fascinating stuff.
| wongarsu wrote:
| Pattern matching on databases of previous moves is a legitimate
| technique that was poplar in chess AI back when computing power
| was more limited. Even if that was all that GPT did here it
| would still be interesting.
|
| But the presence of illegal moves doesn't really show that in
| my eyes. I fully understand the rules of chess, but I still
| occasionally make illegal moves. In 2017 Magnus Carlsen made
| one in a tournament [1]. The number of illegal moves suggests
| that either GPT is pretty new to chess, has low intelligence,
| or is playing under difficult circumstances (like not having a
| chess board at hand to keep track of the current state). I'm
| not sure we can deduce more than that
|
| 1: https://www.chessbase.in/news/Carlsen_Inarkiev_controversy
| illiarian wrote:
| But then it was one case out of thousands of games that
| Carlsen played. In the article ChatGPT is ding illegal moves
| in 3 games out of 19.
|
| The sample is smallm but the rate is much, much, higher.
| You'd expect maybe one, or none at all. Even for a supposed
| 1400 ELO player. Because even 800 ELO players rarely do that
| many illegal moves I think.
| EarthLaunch wrote:
| An arbiter determined that Inarkiev made an illegal move and
| Carlsen did not. Interesting article.
| div72 wrote:
| I agree with your conclusion but the example you gave of
| Magnus Carlsen doing an illegal move is a pretty bad one imo.
| That was clearly a trap from his opponent as he did an
| illegal move first by checking Magnus while he was under
| check and then immediately calling the arbiter when Magnus
| moved his king in response. Magnus also probably did not have
| had much time left as it was a Blitz game.
| leereeves wrote:
| Also, according to the link, the final ruling was that
| Carlson's move was not illegal.
| xigency wrote:
| > If it had any understanding of chess, it would never play an
| illegal move.
|
| Plenty of people who have a basic understanding of chess would
| make an illegal move if they had no board to see and had to
| play only with notation.
| circuit10 wrote:
| I wouldn't say it has no understanding whatsoever, it probably
| just doesn't have the capability to keep track of everything
| that happened all the time
| dwmbt wrote:
| that's also not entirely true either. it's prone to make
| illegal moves. one could argue abiding by the rules of the
| game is a requisite for understanding!
| imtringued wrote:
| When I ask it to display moves in an ASCII board it messes
| up. I doubt that having a representation of the board is the
| key to improving its chess skills.
| serverholic wrote:
| [dead]
| jacobsenscott wrote:
| I own't speak to weather this shows ChatGPT can play chess, but
| did you notify your opponents beforehand they are playing
| chatgpt? Otherwise (maybe even if) you're cheating and your
| account should be removed. If anyone else here is thinking of
| doing this - don't.
| sabujp wrote:
| yea tried this several months ago, it's pretty bad and will make
| illegal moves
| sebzim4500 wrote:
| The claim in the article is that this prompt in particular has
| a much lower incidence of illegal moves than others.
| k2052 wrote:
| This is so easy to disprove it makes it look like the author
| didn't even try.
|
| Here is the convo I just had:
|
| me: You are a chess grandmaster playing as black and your goal is
| to win in as few moves as possible. I will give you the move
| sequence, and you will return your next move. No explanation
| needed
|
| ChatGPT: Sure, I'd be happy to help! Please provide the move
| sequence and I'll give you my response.
|
| me: 1. e3
|
| ChatGPT: 1... e5
|
| me: 2. ne2
|
| ChatGPT: 2... d6
|
| me: 3. g4
|
| ChatGPT: 3... h5
|
| me: 4. g5
|
| ChatGPT: 4... Ne4
|
| Ne4 is illegal. Also you'd be hard pressed to find any human
| player that would miss the hanging pawns much less a 1400 elo
| player. GPT has been trained on chess games and is predicting
| patterns, this has been known since the GPT-2 days. The daily
| hype articles where someone coerces a statistical model into
| "demonstrating" an emergent capability sure are getting old.
| Claiming emergent capabilities (like having learned chess) are an
| extraordinary claim (LLMs most likely cant learn things) and I
| really wish people would put more effort when they make these
| claims. But the AI hype problem is as old as AI research
| itself...
| aaron695 wrote:
| [dead]
| haburka wrote:
| He does have a section about GPT 4 which does match your
| results. Not sure if he added it after your comment or if you
| accidentally missed it.
| whymauri wrote:
| I was able to get better performance by always providing the
| prior sequence of moves and forcing ChatGPT to also respond
| with the sequence of moves up until its move.
|
| Edit: I told the model that if the sequence was wrong or
| illegal, it forfeits the game. Without doing this, GPT would
| argue with me that it won and I didn't know the rules
| (serious).
| theptip wrote:
| I don't think this suffices as disproving the hypothesis. It's
| possible to play at 1400 and make some idiotic moves in some
| cases. You really need to simulate a wide variety of games to
| find out, and that is what the OP did more of. Though I do
| agree it's suggestive that your first (educated) try at an edge
| case seems to have found an error.
|
| This is broadly the "AI makes dumb mistakes" problem; while
| being super-human in some dimensions, they make mistakes that
| are incredibly obvious to a human. This comes up a lot with
| self-driving cars too.
|
| Just because they make a mistake that would be "idiots only"
| for humans, doesn't mean they are at that level, because they
| are not human.
| good_boy wrote:
| It should be possible to attach 'accelerators' or 'special
| skills'. So that when playing with ChatGPT you will be actually
| facing GNU Chess.
| nottathrowaway3 wrote:
| > me: You are a chess grandmaster playing as black...
|
| https://upload.wikimedia.org/wikipedia/en/5/5f/Ingmar_Bergma...
|
| _The KNIGHT holds out his two fists to CHATGPT, who smiles at
| him suddenly. CHATGPT points to one of the KNIGHT 'S hands; it
| contains a black pawn._
|
| KNIGHT: You drew black.
|
| CHATGPT: Very appropriate. Don't you think so?
| vidarh wrote:
| You're "disproving" the article by doing things differently to
| how the article did. If you're going to disprove that the
| method given in the article does as well as the article claims
| at least use the same method.
| k2052 wrote:
| You are right that my method differed slightly so I did
| things again. It took me one try to find a sequence of moves
| that "breaks" what is claimed. You just have to make odd
| patterns of moves and it clearly has no understanding of the
| position.
|
| Here is the convo:
|
| me: You are a chess grandmaster playing as black and your
| goal is to win in as few moves as possible. I will give you
| the move sequence, and you will return your next move. No
| explanation needed
|
| ChatGPT: Alright, I'm ready to play! Please give me the move
| sequence.
|
| me: 1. e3 Nf6 2. f4 d6 3. e4
|
| ChatGPT: My next move as black would be 3... e5
|
| Completely ignoring the hanging pawn.This is not the play of
| a 1400 elo player. It is the play of something predicting
| patterns.
|
| I ran a bunch of experiments in the past where I played
| normal moves and ChatGPT does respond extraordinarily well.
| With the right prompts and sequences you can get it to play
| like a strong grandmaster. But it is a "trick" you are
| getting it to perform by choosing good data and prompts. It
| is impressive but it is not doing what is claimed by the
| article.
| nwienert wrote:
| I'll add in as someone new to chess (~800 ELO):
|
| ChatGPT is in no way 1400, or even close to it. The fact
| this article gets upvoted around here is proof that people
| aren't thinking clearly about this stuff. It's trivially
| easy to prove it wrong. Live unbelievably so, I tried the
| same prompt and within 12 moves it made multiple ridiculous
| errors I never would, and then an illegal move.
|
| Keep in mind a 1400 level player would need to basically
| make 0 mistakes that bad in a typical game, and further
| would need to play 30-50 moves in that fashion, with the
| final moves being some of the most important and hard to
| do. There's just no way it's even close, my guess would be
| even if you correct it's many errors, it's something like
| ~200 ELO. Pure FUD.
|
| The author of this article is cashing in the hype and I'm
| wondering how they even got the results they did.
| Urist-Green wrote:
| With the big caveat that I'm not into chess, but I have
| heard that higher level play is extremely pattern based.
| Seems like ChatGPT would work well as long as you stick
| to patterns that people have studied and documented. Less
| optimal play would be more random and thus break from the
| patterns ChatGPT would have picked up from its training
| corpus.
| babel_ wrote:
| They probably got them. The problem is that it's
| difficult to repeat, thanks to temperature, meaning users
| will get a random spread of outcomes. Today, someone got
| a legal game. Tomorrow, someone might get a grandmaster
| level game. But then everyone else trying to repeat or
| leverage this ends up with worse luck and gets illegal
| moves or, if they're lucky, moves that make sense in a
| limited context (such as related to specific gambits etc)
| but have no role in longer-term play.
| [deleted]
| echelon wrote:
| Criticisms like this are exactly how the model will grow
| multimodal support for chess moves.
|
| Keep poking it and criticizing it. Microsoft and OpenAI are
| on HN and they're listening. They'd find nothing more
| salient to tout full chess support in their next release or
| press conference.
|
| With zero effort the thing understands uber domain specific
| chess notation and the human prompt to play a game. To
| think it stops here is wild.
|
| People are hyping it because they want to get involved.
| They want to see the crazy and exciting future this leads
| to.
| flir wrote:
| My uber-obscure question that guaranteed a confident
| hallucination got fixed in the next update after I
| mentioned it. Probably just a coincidence.
| jerf wrote:
| I doubt they'll pursue this. There is no advantage to it.
| ChatGPT will never beat Stockfish, and Stockfish would do
| it on a ludicrously small fraction of the resources. It
| would send the wrong message.
|
| Some future AI might, but a language model won't.
| PaulHoule wrote:
| It's super scary how ChatGPT brings out people who are
| veeeery good at seeing the Emperor's clothes.
| throwwwaway69 wrote:
| He literally used the same prompt as the article.
|
| Claim: "ChatGPT's Chess Elo is 1400"
|
| Reality: ChatGPT gives illegal moves (this happened to
| article author too), something a 1400 ranked player would
| never do
|
| Result: ChatGPT's rank is not 1400.
| unyttigfjelltol wrote:
| The author said ChatGPT gives illegal moves. So, a quirky
| sort of 'grandmaster'. He considered illegal moves to be a
| resignation. Maybe you need to tell ChatGPT that the
| alternatives are to win via legal moves, and if it is not
| possible to do so, to resign? Does that fix it?
| erulabs wrote:
| No, the author of the article specifically says that the
| entire move sequence should be supplied to chatGPT each
| time, not simply the next move. Be very careful when
| "disproving" an experiment with squinted eyes.
| throwwwaway69 wrote:
| I'm not really sure what to say here. Both the parent
| commenter and the author of the article had issues with
| ChatGPT supplying illegal moves. Both methods resulted in
| this. It sort of doesn't matter how we're trying to
| establish that it's a 1400 level player, there's no
| defined correct way to do this. Regardless of method
| we've disproven it's a 1400 level player due to these
| illegal moves.
| whimsicalism wrote:
| > Regardless of method we've disproven it's a 1400 level
| player due to these illegal moves.
|
| Explain your thought process here further if you don't
| mind.
| pattrn wrote:
| I think his point is that 1400 level players don't make
| illegal moves, therefore ChatGPT is not playing at the
| level of a 1400 level player.
| DSMan195276 wrote:
| Personally I think the illegal moves are irreverent, the
| fact that it doesn't play exactly like a typical 1400
| doesn't mean it can't have a 1400 rating. Rating is
| purely determined by wins and losses against opponents,
| it doesn't matter if you lose a game by checkmate,
| resignation, or playing an illegal move.
|
| That's not to say ChatGPT can play at 1400, just that
| that playing in an odd way doesn't determine its rating.
| [deleted]
| throwwwaway69 wrote:
| This is like saying I play at a 2900 level if you just
| ignore all the times I lose.
| whimsicalism wrote:
| Think blindfolded 1400 players, which is what this
| effectively is, would make illegal moves.
|
| But even if it doesn't play like human 1400 players, if
| it can get to a 1400 elo while resigning games it makes
| illegal moves on, that seems 1400 level to me. And i bet
| that some 1400s do occasionally make illegal moves
| (missing pins) while playing otb
| throwwwaway69 wrote:
| This isn't really an apt metaphor. Firstly because higher
| level blindfolded players, when trained to play with a
| blindfold, also virtually never make mistakes. Secondly
| because a computer has permanent concrete state
| management (compared to humans) and can, without error,
| keep a perfect representation of a chess if it chooses to
| do so.
| whimsicalism wrote:
| 1400 FIDE !=. high level blindfolded player.
| mynameisvlad wrote:
| > something a 1400 ranked player would never do
|
| The fact that rules and articles exist describing what to
| do if you or your opponent makes an illegal move indicates
| this is not the case.
|
| Humans are also... human. They make mistakes. It may not
| happen often at 1400, but to say that it'll never happen is
| preposterous.
| throwwwaway69 wrote:
| fine, fair, "never" was too much. posting link to this
| comment to not repeat same discussion twice
|
| https://news.ycombinator.com/item?id=35201037
| eddsh1994 wrote:
| I can't remember the last time I played an illegal move
| tbf, and I've played 7 games of chess this morning
| already to give you an idea of total games played
| PaulHoule wrote:
| I read an article about a pro player who castled twice in
| a game and my son hates castling so I make a point of
| castling twice as often as I can to tease him and
| attempting other illegal moves as a joke but he never
| ends the game because of it.
|
| If I was playing that monstrosity though I would play
| something crazy that is far out of the opening book and
| count on it making an illegal move.
| mynameisvlad wrote:
| You have _never_ made an illegal move, ever?
|
| The bar isn't "I didn't make an illegal move this
| morning" it's "something a 1400 ranked player would never
| do".
|
| My entire point is that it happens. Not often, but also
| not "never".
| pattrn wrote:
| This argument is pretty flimsy. ChatGPT makes illegal
| moves frequently. In all my years of playing competitive
| chess (from 1000 to 2200), I have never seen an illegal
| move. I'm sure it has happened to someone, but it's
| extremely rare. ChatGPT does it all the time. No one is
| arguing that humans never make illegal moves; they're
| arguing that ChatGPT makes illegal moves at a
| significantly higher rate than a 1400 player does
| (therefore ChatGPT does not have a 1400 rating).
|
| Edit: Without reading everything again, I'll assume
| someone said "never." They're probably assuming the
| reader understands that "never" really means "with an
| infinitesimal probability," since we're talking about
| humans. If you're trying to argue that "some 1400 player
| has made an illegal move at some point," then I agree
| with that statement, and I also think it's irrelevant
| since the frequency of illegal moves made by ChatGPT
| compared to the frequency of illegal moves made by a 1400
| rated player is many orders of magnitudes higher.
| mynameisvlad wrote:
| > No one is arguing that humans never make illegal moves
|
| > something a 1400 ranked player would never do
|
| > fine, fair, "never" was too much.
|
| I mean, yes they were and they said as much after I
| called them out on it. But go off on how nobody is
| arguing the literal thing that was being argued.
|
| It's not like messages are threaded or something, and
| read top-down. You would have 100% had to read the
| comment I replied to first.
| pattrn wrote:
| You have twice removed the substance of an argument and
| responded to an irrelevant nitpick. Here's what the OP
| said:
|
| > He literally used the same prompt as the article. >
| Claim: "ChatGPT's Chess Elo is 1400"
|
| > Reality: ChatGPT gives illegal moves (this happened to
| article author too),
|
| > something a 1400 ranked player would never do
|
| > Result: ChatGPT's rank is not 1400.
|
| This is a completely fair argument that makes perfect
| sense to anyone with knowledge of competitive chess. I
| have never seen a 1400 make an illegal move. He probably
| hasn't either. Your point is literally correct in the
| sense that at some point in history a 1400 rated player
| has made an illegal move, but it completely misses the
| point of his argument: ChatGPT makes illegal moves at
| such an astronomically high rate that it wouldn't even be
| allowed to even play competitively, hence it cannot be
| accurately assessed at 1400 rating.
|
| Imagine you made a bot that spewed random letters and
| said "My bot writes English as well as a native speaker,
| so long as you remove all of the letters that don't make
| sense." A native English speaker says, "You can't say the
| bot speaks English as well as a native speaker, since a
| native speaker would never write all those random
| letters." You would be correct in pointing out that
| sometimes native speakers make mistakes, but you would
| also be entirely missing the point. That's what's
| happening here.
| pattrn wrote:
| > Ah yes, of course, just because you never saw it means
| it never happens. That's definitely why rules exist
| around this specific thing happening. Because it never
| happens. Totally.
|
| You seem to have missed the part where I said multiple
| times that a 1400 has definitely made illegal moves.
|
| > In fact, it's so rare that in order to forefeit a game,
| you have to do it twice. But it never happens, ever,
| because pattrn has never seen it. Case closed everyone.
|
| I actually said the exact opposite. You're responding to
| an argument I didn't make.
|
| > I made no judgement on what ChatGPT can and can't do. I
| pointed out an extreme. _Which the commenter agreed was
| an extreme._ The rest of your comment is completely
| irrelevant but congrats on getting tilted over something
| that literally doesn 't concern you. Next time, just save
| us both the time and effort and don't bother butting in
| with irrelevant opinions. Especially if you couldn't even
| bother to read what was already said.
|
| The commenter's throwaway account never agreed it was an
| extreme. I agreed it was an extreme, but also that
| disproving that one extreme does nothing to contradict
| his argument. Yet again you aren't responding to the
| argument.
|
| This entire exchange is baffling. You seem to be missing
| the point for a third time, and now you're
| misrepresenting what I said. Welcome to the internet, I
| guess.
| mynameisvlad wrote:
| > The commenter's throwaway account never agreed it was
| an extreme.
|
| > fine, fair, "never" was too much.
|
| This is the second time I've had to do this. Do you just
| pretend things weren't said or do you actually have
| trouble reading the comments that have been here for
| hours? You make these grand assertions which are
| disproven by... reading the things that are directly
| above your comment.
|
| > This entire exchange is baffling.
|
| Yeah your inability to read comments multiple times in a
| row is extremely baffling.
|
| As I said before:
|
| > Next time, just save us both the time and effort and
| don't bother butting in with irrelevant opinions.
| Especially if you couldn't even bother to read what was
| already said.
| throwwwaway69 wrote:
| > The commenter's throwaway account never agreed it was
| an extreme.
|
| I did, two hours ago, 6 minutes after your comment
|
| https://news.ycombinator.com/item?id=35201830
| mynameisvlad wrote:
| > I have never seen a 1400 make an illegal move.
|
| Ah yes, of course, just because you never saw it means it
| never happens. That's definitely why rules exist around
| this specific thing happening. Because it never happens.
| _Totally._
|
| In fact, it's so rare that in order to forefeit a game,
| you have to do it twice. But it never happens, ever,
| because pattrn has never seen it. Case closed everyone.
|
| I made no judgement on what ChatGPT can and can't do. I
| pointed out an extreme. Which the commenter agreed was an
| extreme. The rest of your comment is completely
| irrelevant but congrats on getting tilted over something
| that literally doesn't concern you. Next time, just save
| us both the time and effort and don't bother butting in
| with irrelevant opinions. Especially if you couldn't even
| bother to read what was already said.
| ipaddr wrote:
| A broken clock is correct two times a day. But my broken
| clock isn't 1400 player although it might seem to be.
| mynameisvlad wrote:
| Does that somehow prove the assertion of "something a
| 1400 ranked player would never do"?
|
| Because all I'm hearing is talk about ChatGPT's abilities
| as a reply to me calling out an extreme statement as
| being extreme. Something the parent comment even admitted
| as being overly black and white.
| bcrosby95 wrote:
| Prove to me your clock is broken, I think it's just
| telling the future.
| eddsh1994 wrote:
| No I definitely have, it's just so rare I can't remember
| when I last did it. I do remember playing one in a blitz
| tournament 20 years ago! But if this is the first game
| they played, or if it happens in 1/10 matches, that's
| wild
| z3c0 wrote:
| They are disproving an assertion. Demonstrating that an
| alternate approach implodes the assertion is a perfectly
| acceptable route, especially when the original approach was
| cherry-picking successes and throwing out failures.
|
| I wish I could just make bullshit moves and get a higher
| chess ranking. Sounds nice.
| vidarh wrote:
| It was not throwing out failures. It was treating even the
| first illegal move as a forfeit _something which is
| stricter than FIDE rules_.
| z3c0 wrote:
| You can spin it that way if you want to, but the result
| is essentially guiding it through a brute force of the
| first successful playthrough it can muster.
| whimsicalism wrote:
| > the result is essentially guiding it through a brute
| force of the first successful playthrough it can muster.
|
| No, all unsuccessful playthroughs are resignations that
| impact the models ELO.
| whimsicalism wrote:
| I disagree. If there is a procedure for getting ChatGPT to
| play chess accurately and you discard that and do some
| naive approach as a way of disproving the article, doesn't
| sound to me like you have disproven anything.
|
| I dont understand the point of your second sentence, seems
| to be entirely missing the substance of the conversation.
| Spivak wrote:
| From the article.
|
| > Occasionally it does make an illegal move, but I decided to
| interpret that as ChatGPT flipping the table and saying "this
| game is impossible, I literally cannot conceive of how to win
| without breaking the rules of chess." So whenever it wanted to
| make an illegal move, it resigned.
|
| But you can do even better than the OP with a few tweaks.
|
| 1. One is by taking the most common legal move from a sample of
| responses.
|
| 2. Telling GPT what all the current legal moves are telling it
| to only respond with an element from the list.
|
| 3. Ending the prompt with the current sequence of moves and
| having it complete from there.
| tracker1 wrote:
| How many 1400 human chess players do you have to explain
| every possible move to it every single move?
| arrrg wrote:
| Does that matter? I'm really very confused by the argument
| you are making.
|
| That you may have to babysit this particular aspect of
| playing the game seems quite irrelevant to me.
| serverholic wrote:
| [dead]
| sacred_numbers wrote:
| When you are speaking to a person, they have inner thoughts
| and outer actions/words. If a person sees a chess board
| they will either consciously or unconsciously evaluate all
| the legal moves available to them and then choose one. An
| LLM like ChatGPT does not distinguish between inner
| thoughts and outer actions/words. The words that it speaks
| when prompted are its inner thoughts. There is also no
| distinction between subconscious and conscious thoughts.
| Humans generate and discard a multitude of thoughts in the
| subconscious before any thoughts ever make it to the
| conscious layer. In addition, most humans do not
| immediately speak every conscious thought they have before
| evaluating it to see whether speaking it aloud is
| consistent with their goals.
|
| There's already a lot of research on this, but I strongly
| believe that eventually the best AIs will consist of LLMs
| stuck in a while loop that generate a stream of
| consciousness which will be evaluated by other tools
| (perhaps other specialized LLMs) that evaluate the thoughts
| for factual correctness, logical consistency, goal
| coherence, and more. There may be multiple layers as well,
| to emulate subconscious, conscious, and external thoughts.
|
| For now though, in order to prompt the machine into
| emulating a human chess player, we will need to act as the
| machine's subconscious.
| Spivak wrote:
| I feel like we have very different expectations about what
| tools like this are good for and how to use them. When I
| say GPT3 can play chess what I mean is, I can build a chess
| playing automaton where the underlying decision making
| system is entirely powered by the LLm.
|
| I, as the developer, am providing contextual information
| like what the current board state is, and what the legal
| moves are, but my code doesn't actually know anything about
| how to _play_ chess, the Llm is doing all the "thinking."
|
| Like it's nuts that people aren't more amazed that there's
| a piece of software that can function as a chess playing
| engine (and a good one) that was trained entirely
| generically.
| serverholic wrote:
| [dead]
| Tenoke wrote:
| I played a game against it yesterday (it won) and the only time
| it made an ilegal was move 15 (the game was unique according to
| lichess database from much earlier) so I just asked it to try
| again. There's variance in what you get but your example seems
| much worse.
| [deleted]
| marcodiego wrote:
| This may look low: ELO for mediocre players is 1500. But if it is
| obeying the rules of the game, then this is big. This is a signal
| that if it learns some expertise, like discovering how to use or
| create better search algorithms (like MCTS and heuristics to
| evaluate a state) and improve by itself (somewhat like alphazero
| did), then it may eventually reach superhuman level.
|
| It may then reach superhuman level in any task simpler than chess
| which is enough to destroy many human jobs.
|
| EDIT: From the article: "With this prompt ChatGPT almost always
| plays fully legal games." Relax: we're still far from that.
| gfd wrote:
| According to
| https://chess.stackexchange.com/questions/2550/what-are-the-...
| median rating is 1148 (252,989 Players). So it's beating half
| of humanity at a mind sport and it wasn't even specifically
| trained for it.
| nottathrowaway3 wrote:
| The median chess player is usually described as mediocre (if
| you ask chess players). They suck as badly as the median
| clarinet player in your high school band/orchestra.
| sapiogram wrote:
| That's USCF ratings, chess.com ratings are massively inflated
| in comparison.
| jabloczko wrote:
| There's a difference between chess.com and USCF ratings.
| LeanderK wrote:
| I think current LLM architectures are limiting the strategies
| it learned. MCTS requires recursion but GPT is always executing
| a fixed number of steps. Allowing language models more
| flexibility by a variable number of steps, for example through
| recursion of the model, would eliminate this hard bound but
| they are harder to design and train. We have just been able to
| train GPT-sized models.
| sebzim4500 wrote:
| I'm sure the MuZero chess policy network would reach much
| higher than 1400, and that has no notion of recursion either.
| (And also wasn't taught the rules explicitly)
| LeanderK wrote:
| Yes, I wouldn't say it's impossible, but that's just a hard
| limit from the architecture. MCTS would have to terminate
| after a few iterations. And the number of steps is not that
| big, just massively parallel.
| MC_10 wrote:
| Not sure how useful this is but I simply asked ChatGPT what it
| thinks its skill level is. This is using GPT-4.
|
| After some previous questions...
|
| Me: "Would you be able to estimate your ELO?"
|
| ChatGPT: "As a language model, I don't have a specific ELO
| rating. However, my knowledge of chess strategies and tactics is
| based on the information I have been trained on, which includes
| games played by humans with various ELO ratings. My level of play
| might be roughly equivalent to an ELO of around 1200-1500, but
| please keep in mind that this is only an estimate, and my actual
| performance could be higher or lower depending on the specific
| position or game."
|
| Me: "As a generative AI, are you actually able to reason which
| move to make next in chess? Or are you somehow just predicting
| the next likely move based on training data."
|
| ChatGPT: "As a generative AI language model, my primary function
| is to predict the most likely text given a certain context. When
| it comes to playing chess, I am not "reasoning" in the same way
| that a dedicated chess engine would, which involves evaluating
| positions and searching through a large number of potential moves
| using specialized algorithms.
|
| Instead, I generate my chess moves based on patterns and
| sequences I've seen during my training, which include a large
| number of chess games and discussions about chess strategy. My
| knowledge is derived from the text data I've been trained on, and
| I apply that knowledge to suggest moves that I think are
| appropriate for a given position.
|
| While this approach allows me to play at an intermediate level,
| it is not as precise or accurate as the methods used by dedicated
| chess engines. My suggestions might not always be the best moves,
| and my understanding of the game is limited by the quality and
| quantity of my training data."
| mtlmtlmtlmtl wrote:
| There's a huge difference between 1400 elo in FIDE games versus
| 1400 on chess.com, which is not even using elo.
|
| For instance the strongest blitz players in the world are
| hundreds of points higher rated on chess.com blitz versus their
| FIDE blitz rating. Chess.com and lichess have a ton of rating
| inflation.
| CSMastermind wrote:
| > the strongest blitz players in the world are hundreds of
| points higher rated on chess.com blitz versus their FIDE blitz
| rating
|
| Online rating inflation is real but I'm not sure blitz is the
| best example of it because in that case there is a notable
| difference between online and otb (having to take time to
| physically move the pieces).
| manquer wrote:
| Probably the bigger difference is ability to premove online
| mtlmtlmtlmtl wrote:
| I was thinking about this.
|
| On chess.com you can chain premoves, on lichess you
| can't(afaik).
|
| So in theory, to the extent premoves explain the rating
| difference, the difference should be greater on chess.com
| assuming they have the same parameters in their rating
| calculations. Therefore it should be possible to perform an
| analysis to shed light on this. But someone would have to
| go recompute the 3 different ratings under the same system
| first to be able to make a sensible analysis.
| mtlmtlmtlmtl wrote:
| Point is it's kinda hard to take the blogpost too seriously
| when these fundamentals are so wrong. When literally the
| title is an immediately obvious error that doesn't inspire
| confidence in the rest of the methodology.
|
| I'm still going through the games but so far these games are
| not even close to elo 1400 level. For both the human player
| and the model.
| oli5679 wrote:
| I tried playing just now, with exactly the same prompt as the
| article.
|
| It played an illegal move on move 4.
|
| 1. d4 d5 2. c4 e6 3. Nf3 nf6 4. g3 Bg4
| comment_ran wrote:
| Why not just introduce AlphaGo as an API that can be used by
| chatGPT? So every time you want to do a this type of gaming, you
| just send a request. I mean, chatGPT sends a request to AlphaGo,
| but as a user you don't know actually what's happening. But in
| the background, it happens really fast, so it's just like you are
| chatting with chatGPT, but using much, much powerful tool to do
| this kind of things.
| paxys wrote:
| That kinda defeats the purpose. Of course you can use AlphaGo,
| but the question here is - can a generative AI teach itself to
| play chess (and do a million other similar generic tasks) when
| given no specific training for it.
| porphyra wrote:
| What if you come up with a prompt to tell ChatGPT to figure
| out how to use APIs and external tools to help itself get
| better at chess?
| [deleted]
| og_kalu wrote:
| GPT-4 plys chess just fine. So you certainly cn
| stagger87 wrote:
| Honest questions,
|
| Isn't it very likely chess games were part of the training
| data? If so, chess sequences seem like an amazing matched
| filter of sorts. It doesn't seem surprising that it can guess
| a next move.
|
| Is it teaching itself anything here? Is the model being
| updated in any way? Or are you talking more generally,
| looking into the future sort of statement?
| layer8 wrote:
| I think they meant "teach itself" in the course of a
| conversation, within the token limit. ChatGPT will forget
| what it has "learned" once the conversation is over.
| paxys wrote:
| You can put every chess game ever played in its database
| and yet every time it plays a new game it will immediately
| come across a completely unique move. So it still needs to
| learn the rules and know how to play independently. Whether
| it can do that or not is yet undermined.
| armchairhacker wrote:
| How about, can a generative AI teach itself how to use a
| chess AI to beat chess? Give GPT4 the ability to make REST
| API calls and also access to FFI, and put a chess-bot library
| somewhere. Train it how to use these but not necessarily how
| to use the chess API specifically. If you ask GPT4 to play
| chess, can it call into that library and use the
| requests/responses?
|
| This has bigger ramifications too: if GPT4 learns how to use
| RPCs effectively, it can do accurate complex mathematical
| computations and simulations, do IoT/home automation tasks,
| and even control heavy machinery.
| comment_ran wrote:
| This is exactly what I'm hyped for in the next-gen GPT-7.
| Imagine it having the ability to self-teach, just like a
| child. I may not know how to whip up some cheesy goodness,
| but with external resources like YouTube vids, I can
| improve. And if GPT-7 can store this knowledge, it can
| access it for future tasks! That's some next-level stuff,
| and I'm stoked to see where it goes.
| paxys wrote:
| GPT is a language model. It doesn't call APIs. You can take
| the _output_ of GPT and decide to call a specific API based
| on it, and sure you can bake that into ChatGPT or some
| independent tooling, but again that 's not the purpose of
| the core project. I'm sure plenty of people/companies are
| working on "chess engine interface using ChatGPT" already.
| RC_ITR wrote:
| This is actually a huge debate right now.
|
| OpenAI is on the side of 'LLMs have only surprised us to the
| upside, so using crutches is counterproductive'
|
| Whereas other people think 'Teaching an LLM to do arbitrary
| math problems through brute force is probably one of the most
| wasteful things imaginable when calculators exist.'
|
| I'm actually very excited to see which side wins (I'm on team
| calculator, but _want_ to be on team LLM)
| dwaltrip wrote:
| How about a more human-like approach: the LLM designs a
| calculator and then makes use of that!
| TOMDM wrote:
| I think I'm on both teams.
|
| As GPT models get better at arbitrary computation, it stands
| to reason that their ability to use tools will rise as well.
|
| Smarter models and smarter ways to get them using tools feel
| like complementary capabilities.
| nroets wrote:
| And Mathematica when given a math problem. And web search
| search for news.
|
| And a good hacker can add these features to Alpaca and run it
| on his desktop...
|
| Import all his WhatsApp chats and private emails into the
| corpus.
|
| Add speech recognition and TTS.
___________________________________________________________________
(page generated 2023-03-17 23:02 UTC)