Subj : Claude just beat GPT-5, Gemini, and Grok in real-world job tasks,
To   : All
From : TechnologyDaily
Date : Mon Sep 29 2025 11:30:08

Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according 
to OpenAIs own study

Date:
Mon, 29 Sep 2025 10:13:04 +0000

Description:
According to OpenAI, Claude is the top AI model for getting actual work done

FULL STORY
======================================================================OpenAI 
has released GDPval, a new evaluation system to test how AI performs at 
work-related tasks Claude Opus 4.1 comes out in the lead, with 'ChatGPT-5 
high' in second place Tasks include things like emailing a response to a 
dissatisfied customer 

Were all familiar with AI benchmarks, which measure performance at certain 
tasks, but often these tasks dont reflect the real world and how people 
actually use AI, especially at work. 

To combat this problem, OpenAI, the maker of ChatGPT , is introducing GDPval, 
a new way of measuring AI model performance using real-world work tasks 
compared to a real human across 44 occupations, from software developers and 
lawyers to registered nurses and mechanical engineers. 

Surprisingly, the OpenAI study shows that the best performing model was 
Anthropics Claude Opus 4.1, which outpaced not only OpenAIs GPT-5 but also 
Gemini and Grok. GDPval win rate (Image credit: OpenAI) 

This graph shows the overall GDPval win rate (the times when the AI did 
better than an industry expert) and shows that Claude Opus 4.1 is out in the 
lead with a win rate of 47.6, with ChatGPT-5 high coming second with 38.8 and 
ChatGPT o3 high at 34.1. ChatGPT-4o scores the lowest, with a win rate of 
12.4, which is significantly behind both Grok 4 and Gemini 2.5 Pro. 

The study found that Claude was the highest-performing across eight of the 
nine industry sectors it tested, including government, health care, and 
social assistance.  The results clearly show that Claude Opus 4.1 leads 
across a diverse range of work-related tasks. (Image credit: OpenAI) 

Examples of the tasks include things like emailing a response to a 
dissatisfied customer requesting a return, optimizing a table layout for a 
Spring vendor fair, and auditing price inconsistencies in purchase orders. 
Whats in a name? 

The name used by OpenAI, GDPval, comes from the concept of Gross Domestic 
Product (GDP) as a key economic indicator. OpenAI wants GPDval to be widely 
adopted to help ground conversations about future AI improvements in evidence 
rather than guesswork. 

Releasing the results showing a competitor out in front appears to be an 
exercise in radical transparency by OpenAI, but that fits in perfectly with 
the company's philosophy. Our mission is to ensure that artificial general 
intelligence benefits all of humanity. As part of our mission, we want to 
transparently communicate progress on how AI models can help people in the 
real world, reads a statement from OpenAI . 

The paper, which is available to read in its entirety online , comes a week 
after OpenAI released a more consumer-focused paper that showed that the 
majority of ChatGPT users (70%) were actually using it at home, rather than 
at work. 

The study was conducted by OpenAIs Economic Research team and Harvard 
economist David Deming for the National Bureau of Economic Research (NBER). 
The results were surprising to a lot of people, as previously, the focus of 
new ChatGPT releases has been very focused on work-related tasks like coding, 
making presentations, and being a good research tool. 

The news that Claude Opus 4.1 is better at actual work-related tasks, not 
just benchmarks, than even ChatGPT-5 high could mean a renewed focus by 
OpenAI towards its changing user base. You might also like OpenAI responds to 
furious ChatGPT subscribers who accuse it of secretly switching to inferior 
models OpenAI reveals how people use ChatGPT, and the results might surprise 
you ChatGPTs new Pulse feature will help you manage your day with handy 
visual updates



======================================================================
Link to news story:
https://www.techradar.com/ai-platforms-assistants/claude/claude-just-beat-gpt-
5-gemini-and-grok-in-real-world-job-tasks-according-to-openais-own-study


--- Mystic BBS v1.12 A49 (Linux/64)
 * Origin: tqwNet Technology News (1337:1/100)

.