https://garymarcus.substack.com/p/chatgpt-in-shambles

Marcus on AI

SubscribeSign in

Share this post

 
[https]
#
Marcus on AI
ChatGPT in Shambles
Copy link
Facebook
Email
Notes
More

ChatGPT in Shambles

After two years of massive investment and endless hype, GPT's
reliability problems persist

 
[http]
Gary Marcus
Feb 04, 2025
69

Share this post

 
[https]
#
Marcus on AI
ChatGPT in Shambles
Copy link
Facebook
Email
Notes
More
 
28
4
 
Share

Legendary investor Masayoshi Son assures us that AGI is imminent.
Quoting the Wall Street Journal yesterday,

"Just a few months ago, [Masayoshi] Son predicted that artificial
general intelligence, or AGI, would be achieved within two to three
years. "I now realize that AGI would come much earlier," he said
Monday."

So I decided this morning to give the latest (unpaid) version of
ChatGPT a whirl, asking to make it some basic tables, in hopes of
putting its newfound skills to work.

Things started off so well I wondered if it was me who needed to
update AGI timelines. As the system spat out the first few lines of a
neatly formatted table on income and population in the United States,
apparently responsive to my question, I started to wonder, "Have I
been unfair to Generative AI?". The start was undeniably impressive.

 
[https]

Only later (at the time I was reading this on my phone and not
looking carefully) and did I notice that more than half the states
were left out.

That should have been a clue.

SS

From there, things went downhill fast.

I next asked for an extra column on population density.

 
[https]

That last line made me want to live in Alaska!

But wait, I wondered, what happened to Florida? (And and a couple
dozen other states? South Dakota? The Carolinas? Arizona? Washington?
More were left out than included.) I inquired:

 
[https]

As it its tendency, ChatGPT apologized, "sincerely", and made a much
longer list (for brevity, not shown here, but follow the link if you
care)... and this time it omitted Alaska.

Oops! At least it didn't include Canada.

SS

After another verifying query, and another apology, "You're right to
double-check! I'll ensure that all 50 U.S. states are included. Let
me verify and provide a complete, accurate table", Alaska was
re-included. Hooray!

At this point I moved on, to a more serious challenge: Canadian
provinces, and wait for it ... vowels.

 
[https]

Neverminding that I didn't ask for territories (the bottom three
lines), the vowel counts looked hinky.

So I invited ChatGPT to doublecheck. AGI here we come!

 
[https]

Wait, what? Is h a vowel now? (See answer to Northwest Territories).
And 9 vowels in British Columbia? I live there, and that doesn't
sound right. (And, no, I am not even going to go into the o it
alleges lives in the name Prince Edward Island.)

Let's see if maybe we can sort some of this out?

 
[https]

Nope, still wrong.

SS

Luckily I am patient. Third try on my adopted province is a charm!

 
[https]

Now hear me out. I do honestly believe that if we spent $7 trillion
on infrastructure we could fix all ... oh, never mind.

SS

But one last question, in the interest of future improvements:

 
[https]

SS

In sum, in the space of next few exchanges, over the course of 10
minutes, ChatGPT,

  * failed, multiple times, to properly count to 50

  * failed, multiple times, to include a full list of all US states

  * reported that the letter h could be a vowel, at least when it
    appeared in the word Northwest

  * couldn't count vowels to save its electronic life

  * issued numerous corrections that were wrong, never acknowledging
    uncertainty until after its errors were called out.

  * "lied" about having a subconscious. (In fairness, ChatGPT doesn't
    really lie; it just spews text that often bears little
    resemblance to reality, but you get my drift).

The full conversation including all the prompts I used can be found
here.

SS

As against all the constant claims of exponential progress that I see
practically every day, ChatGPT still seems likes pretty much the same
mix of brilliance and stupidity that I wrote about, more than two
years ago:

How come GPT can seem so brilliant one minute and so breathtakingly
dumb the next?

Gary Marcus
*
December 1, 2022
How come GPT can seem so brilliant one minute and so breathtakingly
dumb the next?
 

In light of the dozens of GPT fails that have circulating in the last
24 hours, regular reader Mike Ma just asked a profound question: how
can GPT seem so brilliant and so stupid at the same time?

 
Read full story

SS

By coincidence Sayash Kapoor, co-author of AI snake oil, reported
some tests of OpenAI's new Operator agent this morning, pushing the 
extreme boundaries of intelligence by testing ... expense reports.

 
[https]

Things didn't go that well there either. Although (impressively)
Operator was able to navigate to the right websites and even uploaded
some things correctly, the train soon derailed:

 
[https]

Great summary. As Davis and I have been arguing since 2019, trust is
of the essence, and we still aren't there.

But honestly, if AI can't do Kapoor's expense reports or my simple
tables, is AGI really imminent? Who is kidding whom?

SS

On my Kindle, this is the book I just started, a classic from an
earlier century:

 
[https]

I'll close with one of the quotes I highlighted in the first chapter:

In reading the history of nations, we find that, like individuals,
they have their whims and their peculiarities; their seasons of
excitement and recklessness, when they care not what they do. We find
that whole communities suddenly fix their minds upon one object, and
go mad in its pursuit; that millions of people become simultaneously
impressed with one delusion, and run after it, till their attention
is caught by some new folly more captivating than the first.

He first published that in 1841.

I am not sure how much has changed.

Share

Gary Marcus first started warning of neural-net-induced
hallucinations in his 2001 analysis of multilayer perceptrons and
their cognitive limitations, The Algebraic Mind.

Marcus on AI is a reader-supported publication. To receive new posts
and support my work, consider becoming a free or paid subscriber.

[                    ]
Subscribe
69

Share this post

 
[https]
#
Marcus on AI
ChatGPT in Shambles
Copy link
Facebook
Email
Notes
More
 
28
4
 
Share

Discussion about this post

CommentsRestacks
[htt]
[                    ]
[                    ]
[                    ]
[                    ]
 
[htt]
Ben Lowndes
1h
Liked by Gary Marcus

I am concerned at the way people unquestionally turn to these tools,
and are prepared to explain away the glaring flaws. It's getting
better. It needs a clearer prompt. If junior staff made this many
mistakes, consistently and without learning, they wouldn't last in
many high performing teams

Expand full comment
Reply
Share
 
[htt]
S.S.W.(ahiyantra)
1hEdited
Liked by Gary Marcus

Here's the spelling of "prince edward island" changed to match
chat-GPT-o's claims:

"prince edwardo aisland".

Expand full comment
Reply
Share
1 reply by Gary Marcus
26 more comments...
TopLatestDiscussions

No posts

Ready for more?

[                    ]
Subscribe
(c) 2025 Gary Marcus
Privacy [?] Terms [?] Collection notice
 Start WritingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please turn on
JavaScript or unblock scripts