Subj : Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, To : All From : TechnologyDaily Date : Mon Sep 29 2025 11:30:08 Claude just beat GPT-5, Gemini, and Grok in real-world job tasks, according to OpenAIs own study Date: Mon, 29 Sep 2025 10:13:04 +0000 Description: According to OpenAI, Claude is the top AI model for getting actual work done FULL STORY ======================================================================OpenAI has released GDPval, a new evaluation system to test how AI performs at work-related tasks Claude Opus 4.1 comes out in the lead, with 'ChatGPT-5 high' in second place Tasks include things like emailing a response to a dissatisfied customer Were all familiar with AI benchmarks, which measure performance at certain tasks, but often these tasks dont reflect the real world and how people actually use AI, especially at work. To combat this problem, OpenAI, the maker of ChatGPT , is introducing GDPval, a new way of measuring AI model performance using real-world work tasks compared to a real human across 44 occupations, from software developers and lawyers to registered nurses and mechanical engineers. Surprisingly, the OpenAI study shows that the best performing model was Anthropics Claude Opus 4.1, which outpaced not only OpenAIs GPT-5 but also Gemini and Grok. GDPval win rate (Image credit: OpenAI) This graph shows the overall GDPval win rate (the times when the AI did better than an industry expert) and shows that Claude Opus 4.1 is out in the lead with a win rate of 47.6, with ChatGPT-5 high coming second with 38.8 and ChatGPT o3 high at 34.1. ChatGPT-4o scores the lowest, with a win rate of 12.4, which is significantly behind both Grok 4 and Gemini 2.5 Pro. The study found that Claude was the highest-performing across eight of the nine industry sectors it tested, including government, health care, and social assistance. The results clearly show that Claude Opus 4.1 leads across a diverse range of work-related tasks. (Image credit: OpenAI) Examples of the tasks include things like emailing a response to a dissatisfied customer requesting a return, optimizing a table layout for a Spring vendor fair, and auditing price inconsistencies in purchase orders. Whats in a name? The name used by OpenAI, GDPval, comes from the concept of Gross Domestic Product (GDP) as a key economic indicator. OpenAI wants GPDval to be widely adopted to help ground conversations about future AI improvements in evidence rather than guesswork. Releasing the results showing a competitor out in front appears to be an exercise in radical transparency by OpenAI, but that fits in perfectly with the company's philosophy. Our mission is to ensure that artificial general intelligence benefits all of humanity. As part of our mission, we want to transparently communicate progress on how AI models can help people in the real world, reads a statement from OpenAI . The paper, which is available to read in its entirety online , comes a week after OpenAI released a more consumer-focused paper that showed that the majority of ChatGPT users (70%) were actually using it at home, rather than at work. The study was conducted by OpenAIs Economic Research team and Harvard economist David Deming for the National Bureau of Economic Research (NBER). The results were surprising to a lot of people, as previously, the focus of new ChatGPT releases has been very focused on work-related tasks like coding, making presentations, and being a good research tool. The news that Claude Opus 4.1 is better at actual work-related tasks, not just benchmarks, than even ChatGPT-5 high could mean a renewed focus by OpenAI towards its changing user base. You might also like OpenAI responds to furious ChatGPT subscribers who accuse it of secretly switching to inferior models OpenAI reveals how people use ChatGPT, and the results might surprise you ChatGPTs new Pulse feature will help you manage your day with handy visual updates ====================================================================== Link to news story: https://www.techradar.com/ai-platforms-assistants/claude/claude-just-beat-gpt- 5-gemini-and-grok-in-real-world-job-tasks-according-to-openais-own-study --- Mystic BBS v1.12 A49 (Linux/64) * Origin: tqwNet Technology News (1337:1/100) .