[HN Gopher] Watching 03 Model Sweat over a Paul Morphy Mate-in-2
___________________________________________________________________
Watching 03 Model Sweat over a Paul Morphy Mate-in-2
Author : alexop
Score : 46 points
Date : 2025-04-27 16:23 UTC (6 hours ago)
(HTM) web link (alexop.dev)
(TXT) w3m dump (alexop.dev)
| awestroke wrote:
| O3 is massively underwhelming and is obviously tuned to be
| sycophantic.
|
| Claude reigns supreme.
| omneity wrote:
| This somehow reminds me of Agent-3 from [0].
|
| 0: https://ai-2027.com
| tomduncalf wrote:
| Depends on the task I think. O3 is really effective at going
| off and doing research, try giving it a complex task which
| involves lots of browsing/searching and watch how it behaves.
| Claude cannot do anything like that right now. I do find O3's
| tone of voice a bit odd
| tough wrote:
| I've commited the 03 (zero-three) and not o3 (o-three) typo too,
| but can we rename it on the title please
| sMarsIntruder wrote:
| So, are we talking about OpenAI o3 model, right?
| alexop wrote:
| yes
| bcraven wrote:
| >"When I gave OpenAI's 03 model a tough chess puzzle..."
|
| Opening sentence
| monktastic1 wrote:
| A little annoying that they use zero instead of o, but yeah.
| janaagaard wrote:
| I was also confused. It looks like the article has been
| corrected, and now uses the familiar 'o3' name.
| freediver wrote:
| On a similar note, I just updated LLM Chess Puzzles repo [1]
| yesterday.
|
| The fact that gpt-4.5 gets 85% correctly solved is unexpected and
| somewhat scary (if model was not trained on this).
|
| [1] https://github.com/kagisearch/llm-chess-puzzles
| alexop wrote:
| Oh cool, I wonder how good 03 will be. While using 03, I
| noticed something funny: sometimes I gave it a screenshot
| without any position data. It ended up using Python and spent
| 10 minutes just trying to figure out where the figures were
| exactly.
| Gimpei wrote:
| Given that o3 is trained on the contents of the Internet, and
| the answers to all these chess problems are almost certainly on
| the Internet in multiple places, in a sense it has been weakly
| trained on this content. The question for me becomes: is the
| LLM doing better on these problems because it's improving in
| reasoning, or is it simply improving in information retrieval.
| ttoinou wrote:
| Where does this obsession over giving binary logic tasks to LLMs
| come from ? New LLM breakthroughs are about handling blurry
| logic, non precise requirements and spitting vague human
| realistic outputs. Who care how well it can add integers or solve
| chess puzzles ? We have decades of computer science on those
| topics already
| Arainach wrote:
| If we're going to call LLMs intelligent, they should be
| performant at these tasks as well.
| ttoinou wrote:
| We called our computers intelligent and couldnt do so many
| things LLMs can do now easily.
|
| But yeah calling them intelligent is a marketing trick that
| is very efficient
| tgtweak wrote:
| I remember reading that got3.5-turbo instruct was oddly good at
| chess - would be curious what it outputs as a next two moves
| here.
| Kapura wrote:
| So... it failed to solve the puzzle? That seems distinctly
| unimpressive, especially for a puzzle with a fixed start state
| and a limited set of possible moves.
| IanCal wrote:
| > That seems distinctly unimpressive
|
| I cannot understate how impressive this is to me, having been
| involved in ai research projects and robotics in years gone by.
|
| This is a general purpose model, given an image and human
| written request that then step by step analyses the image,
| iterates through various options, tries to write code to solve
| the problem and then searches the internet for help. It reads
| multiple results and finds an answer, checks to validate it and
| then comes back to the user.
|
| I had a robot that took ages to learn to plan tic tac toe by
| example and if the robot moved originally there was a solid
| chance it thought the entire world had changed and would freak
| out because it thought it might punch through the table.
|
| This is also a chess puzzle marked as _very hard_ that a person
| who is good at chess should give themselves _fifteen minutes to
| solve_. The author of the chess.com blog containing this puzzle
| only solved about half of them!
|
| This is not an image analysis bot, it's not a chess bot, it's a
| general system I can throw bad english at.
| alexop wrote:
| Yes, I agree. Like I said, in the end it did what a human
| would do: google for the answer. Still, it was interesting to
| see how the reasoning unfolded. Normally, humans train on
| these kinds of puzzles until they become pure pattern
| recognition. That's why you can't become a grandmaster if you
| only start learning chess as an adult -- you need to be a kid
| and see thousands of these problems early on, until
| recognizing them becomes second nature. It's something humans
| are naturally very good at.
| kamranjon wrote:
| I am a human and I figured this puzzle out in under a
| minute by just trying the small set of possible moves until
| I got it correct. I am not a serious chess player. I would
| have expected it to at least try the possible moves? I
| think this maybe lends credence to the idea that these
| models aren't actually reasoning but are doing a great job
| of mimicking what we think humans do.
| Kapura wrote:
| I am sorry, but if this impresses you you are a rube. If this
| were a machine with the smallest bit of _actual_ intelligence
| it would, upon seeing its a chess puzzle, remember "hey, i
| am a COMPUTER and a small set of fixed moves should take me
| about 300ms or so to fully solve out" and then do that. If
| the machine _literally has to cheat to solve the puzzle_ then
| we have made technology that is, in fact, less capable than
| we created in the past.
|
| "Well, it's not a chess engine so its impressive it-" No.
| Stop. At best what we have here is an extremely
| computationally expensive way to just google a problem. We've
| been googling things since I was literally a child. We've had
| voice search with google for, idk, a decade+. A computer that
| can't even solve its own chess problems is an expensive
| regression.
| mhh__ wrote:
| If you mean write code to exhaustively search the solution
| space then they actually can do that quite happily provided
| you tell it you will execute the code for them
| bobsmooth wrote:
| A computer program that has the agency to google a problem,
| interpret the results, and respond to a human was science
| fiction just 10 years ago. The entire field of natural
| language processing has been solved and it's insane.
| jncfhnb wrote:
| Looks to me like it would have simulated the steps using
| sensible tools but didn't know it was sandboxed out of
| using those tools? I think that's pretty reasonable.
|
| Suppose we removed its ability to google and it conceded to
| doing the tedium of writing a chess engine to simulate the
| steps. Is that "better" for you?
| currymj wrote:
| > "hey, i am a COMPUTER and a small set of fixed moves
| should take me about 300ms or so to fully solve out"
|
| from the article:
|
| "3. Attempt to Use Python When pure reasoning was not
| enough, o3 tried programming its way out of the situation.
|
| "I should probably check using something like a chess
| engine to confirm." (tries to import chess module, but
| fails: "ModuleNotFoundError").
|
| It wanted to run a simulation, but of course, it had no
| real chess engine installed."
|
| this strategy failed, but if OpenAI were to add "pip
| install python-chess" to the environment, it very well
| might have worked. in any case, the machine did exactly the
| thing you claim it should have done.
|
| possibly scrolling down to read the full article makes you
| a rube though.
| andoando wrote:
| Im 1600 rated player and this took me 20 seconds to solve, is
| this really considered a very hard puzzle?
|
| The obvious moves dont work, you can see whites pawn moving
| forward is mate, and you can see black is essentially trapped
| and has very limited moves, so immediately I thought first
| move is a waiting move and theres only two options there.
| Block the black pawn moving and if bishop moves, rook takes
| is mate. So rook has to block, and you can see bishop either
| moves or captures and pawn moving forward is mate
| bubblyworld wrote:
| Agreed, I'm similar fide (not rated but ~2k lichess) and it
| took me a few seconds as well. Not a hard puzzle, for a
| regular chess player anyway.
| BXLE_1-1-BitIs1 wrote:
| Nice puzzle with a twist of Zugzwang. Took me about 8 minutes,
| but it's been decades since I was doing chess.
| bfung wrote:
| LLMs are not chess engines, similar to how they don't really
| calculate arithmetic. What's new? carry on.
___________________________________________________________________
(page generated 2025-04-27 23:00 UTC)