hngopher.com

       [HN Gopher] Watching 03 Model Sweat over a Paul Morphy Mate-in-2
       ___________________________________________________________________
        
       Watching 03 Model Sweat over a Paul Morphy Mate-in-2
        
       Author : alexop
       Score  : 46 points
       Date   : 2025-04-27 16:23 UTC (6 hours ago)
        
 (HTM) web link (alexop.dev)
 (TXT) w3m dump (alexop.dev)
        
       | awestroke wrote:
       | O3 is massively underwhelming and is obviously tuned to be
       | sycophantic.
       | 
       | Claude reigns supreme.
        
         | omneity wrote:
         | This somehow reminds me of Agent-3 from [0].
         | 
         | 0: https://ai-2027.com
        
         | tomduncalf wrote:
         | Depends on the task I think. O3 is really effective at going
         | off and doing research, try giving it a complex task which
         | involves lots of browsing/searching and watch how it behaves.
         | Claude cannot do anything like that right now. I do find O3's
         | tone of voice a bit odd
        
       | tough wrote:
       | I've commited the 03 (zero-three) and not o3 (o-three) typo too,
       | but can we rename it on the title please
        
       | sMarsIntruder wrote:
       | So, are we talking about OpenAI o3 model, right?
        
         | alexop wrote:
         | yes
        
         | bcraven wrote:
         | >"When I gave OpenAI's 03 model a tough chess puzzle..."
         | 
         | Opening sentence
        
           | monktastic1 wrote:
           | A little annoying that they use zero instead of o, but yeah.
        
         | janaagaard wrote:
         | I was also confused. It looks like the article has been
         | corrected, and now uses the familiar 'o3' name.
        
       | freediver wrote:
       | On a similar note, I just updated LLM Chess Puzzles repo [1]
       | yesterday.
       | 
       | The fact that gpt-4.5 gets 85% correctly solved is unexpected and
       | somewhat scary (if model was not trained on this).
       | 
       | [1] https://github.com/kagisearch/llm-chess-puzzles
        
         | alexop wrote:
         | Oh cool, I wonder how good 03 will be. While using 03, I
         | noticed something funny: sometimes I gave it a screenshot
         | without any position data. It ended up using Python and spent
         | 10 minutes just trying to figure out where the figures were
         | exactly.
        
         | Gimpei wrote:
         | Given that o3 is trained on the contents of the Internet, and
         | the answers to all these chess problems are almost certainly on
         | the Internet in multiple places, in a sense it has been weakly
         | trained on this content. The question for me becomes: is the
         | LLM doing better on these problems because it's improving in
         | reasoning, or is it simply improving in information retrieval.
        
       | ttoinou wrote:
       | Where does this obsession over giving binary logic tasks to LLMs
       | come from ? New LLM breakthroughs are about handling blurry
       | logic, non precise requirements and spitting vague human
       | realistic outputs. Who care how well it can add integers or solve
       | chess puzzles ? We have decades of computer science on those
       | topics already
        
         | Arainach wrote:
         | If we're going to call LLMs intelligent, they should be
         | performant at these tasks as well.
        
           | ttoinou wrote:
           | We called our computers intelligent and couldnt do so many
           | things LLMs can do now easily.
           | 
           | But yeah calling them intelligent is a marketing trick that
           | is very efficient
        
       | tgtweak wrote:
       | I remember reading that got3.5-turbo instruct was oddly good at
       | chess - would be curious what it outputs as a next two moves
       | here.
        
       | Kapura wrote:
       | So... it failed to solve the puzzle? That seems distinctly
       | unimpressive, especially for a puzzle with a fixed start state
       | and a limited set of possible moves.
        
         | IanCal wrote:
         | > That seems distinctly unimpressive
         | 
         | I cannot understate how impressive this is to me, having been
         | involved in ai research projects and robotics in years gone by.
         | 
         | This is a general purpose model, given an image and human
         | written request that then step by step analyses the image,
         | iterates through various options, tries to write code to solve
         | the problem and then searches the internet for help. It reads
         | multiple results and finds an answer, checks to validate it and
         | then comes back to the user.
         | 
         | I had a robot that took ages to learn to plan tic tac toe by
         | example and if the robot moved originally there was a solid
         | chance it thought the entire world had changed and would freak
         | out because it thought it might punch through the table.
         | 
         | This is also a chess puzzle marked as _very hard_ that a person
         | who is good at chess should give themselves _fifteen minutes to
         | solve_. The author of the chess.com blog containing this puzzle
         | only solved about half of them!
         | 
         | This is not an image analysis bot, it's not a chess bot, it's a
         | general system I can throw bad english at.
        
           | alexop wrote:
           | Yes, I agree. Like I said, in the end it did what a human
           | would do: google for the answer. Still, it was interesting to
           | see how the reasoning unfolded. Normally, humans train on
           | these kinds of puzzles until they become pure pattern
           | recognition. That's why you can't become a grandmaster if you
           | only start learning chess as an adult -- you need to be a kid
           | and see thousands of these problems early on, until
           | recognizing them becomes second nature. It's something humans
           | are naturally very good at.
        
             | kamranjon wrote:
             | I am a human and I figured this puzzle out in under a
             | minute by just trying the small set of possible moves until
             | I got it correct. I am not a serious chess player. I would
             | have expected it to at least try the possible moves? I
             | think this maybe lends credence to the idea that these
             | models aren't actually reasoning but are doing a great job
             | of mimicking what we think humans do.
        
           | Kapura wrote:
           | I am sorry, but if this impresses you you are a rube. If this
           | were a machine with the smallest bit of _actual_ intelligence
           | it would, upon seeing its a chess puzzle, remember  "hey, i
           | am a COMPUTER and a small set of fixed moves should take me
           | about 300ms or so to fully solve out" and then do that. If
           | the machine _literally has to cheat to solve the puzzle_ then
           | we have made technology that is, in fact, less capable than
           | we created in the past.
           | 
           | "Well, it's not a chess engine so its impressive it-" No.
           | Stop. At best what we have here is an extremely
           | computationally expensive way to just google a problem. We've
           | been googling things since I was literally a child. We've had
           | voice search with google for, idk, a decade+. A computer that
           | can't even solve its own chess problems is an expensive
           | regression.
        
             | mhh__ wrote:
             | If you mean write code to exhaustively search the solution
             | space then they actually can do that quite happily provided
             | you tell it you will execute the code for them
        
             | bobsmooth wrote:
             | A computer program that has the agency to google a problem,
             | interpret the results, and respond to a human was science
             | fiction just 10 years ago. The entire field of natural
             | language processing has been solved and it's insane.
        
             | jncfhnb wrote:
             | Looks to me like it would have simulated the steps using
             | sensible tools but didn't know it was sandboxed out of
             | using those tools? I think that's pretty reasonable.
             | 
             | Suppose we removed its ability to google and it conceded to
             | doing the tedium of writing a chess engine to simulate the
             | steps. Is that "better" for you?
        
             | currymj wrote:
             | > "hey, i am a COMPUTER and a small set of fixed moves
             | should take me about 300ms or so to fully solve out"
             | 
             | from the article:
             | 
             | "3. Attempt to Use Python When pure reasoning was not
             | enough, o3 tried programming its way out of the situation.
             | 
             | "I should probably check using something like a chess
             | engine to confirm." (tries to import chess module, but
             | fails: "ModuleNotFoundError").
             | 
             | It wanted to run a simulation, but of course, it had no
             | real chess engine installed."
             | 
             | this strategy failed, but if OpenAI were to add "pip
             | install python-chess" to the environment, it very well
             | might have worked. in any case, the machine did exactly the
             | thing you claim it should have done.
             | 
             | possibly scrolling down to read the full article makes you
             | a rube though.
        
           | andoando wrote:
           | Im 1600 rated player and this took me 20 seconds to solve, is
           | this really considered a very hard puzzle?
           | 
           | The obvious moves dont work, you can see whites pawn moving
           | forward is mate, and you can see black is essentially trapped
           | and has very limited moves, so immediately I thought first
           | move is a waiting move and theres only two options there.
           | Block the black pawn moving and if bishop moves, rook takes
           | is mate. So rook has to block, and you can see bishop either
           | moves or captures and pawn moving forward is mate
        
             | bubblyworld wrote:
             | Agreed, I'm similar fide (not rated but ~2k lichess) and it
             | took me a few seconds as well. Not a hard puzzle, for a
             | regular chess player anyway.
        
       | BXLE_1-1-BitIs1 wrote:
       | Nice puzzle with a twist of Zugzwang. Took me about 8 minutes,
       | but it's been decades since I was doing chess.
        
       | bfung wrote:
       | LLMs are not chess engines, similar to how they don't really
       | calculate arithmetic. What's new? carry on.
        
       ___________________________________________________________________
       (page generated 2025-04-27 23:00 UTC)