OpenAI claims that GPT-5 matches human performance across numerous professions

Bitget-RWA2025/09/25 17:54

By:Bitget-RWA

On Thursday, OpenAI introduced a new benchmark designed to evaluate how its AI models stack up against human professionals across numerous sectors and job roles. This assessment, called GDPval, represents an initial step toward gauging how close OpenAI’s technology is to surpassing humans in economically significant tasks—an essential aspect of the company’s original goal to achieve artificial general intelligence (AGI).

According to OpenAI, its GPT-5 model and Anthropic’s Claude Opus 4.1 “are already nearing the level of work produced by top industry professionals.”

However, this doesn’t mean OpenAI’s models will immediately take over human jobs. While some business leaders predict AI will replace human workers within a few years, OpenAI concedes that GDPval currently only evaluates a narrow set of tasks that people perform in their actual jobs. Still, it serves as one of the latest metrics the company uses to track AI’s advancement toward this goal.

GDPval focuses on nine major industries that make up the largest share of the United States’ gross domestic product, covering areas like healthcare, finance, manufacturing, and government. The benchmark measures AI performance across 44 different professions within these fields, including roles such as software developers, nurses, and journalists.

For the initial version, GDPval-v0, OpenAI enlisted seasoned professionals to review and compare reports generated by AI with those written by their peers, selecting which they found superior. For instance, one scenario had investment bankers create a competitive analysis for the last-mile delivery sector and compare it to AI-generated reports. OpenAI then calculated the average “win rate” of the AI model against human-created reports across all 44 professions.

With GPT-5-high, an enhanced version of GPT-5 with greater computational resources, OpenAI reports that the model matched or exceeded expert-level work in 40.6% of cases.

Anthropic’s Claude Opus 4.1 was also evaluated, with results showing it performed as well as or better than industry experts in 49% of tasks. OpenAI attributes Claude’s high score in part to its ability to produce visually appealing graphics, rather than just raw performance.

OpenAI claims that GPT-5 matches human performance across numerous professions image 0

Image Credits:OpenAI

It’s important to recognize that most professionals do far more than just submit research reports, which is the sole focus of GDPval-v0. OpenAI acknowledges this limitation and intends to develop more comprehensive benchmarks in the future that can better reflect a wider range of industries and more interactive job functions.

Even so, the company considers the progress shown by GDPval to be significant.

Speaking with TechCrunch, OpenAI’s chief economist Dr. Aaron Chatterji noted that GDPval’s findings indicate people in these roles can now leverage AI to focus on more meaningful aspects of their work.

“[Since] the model is becoming proficient at some of these tasks,” Chatterji explains, “workers in these positions can increasingly rely on the model to handle certain responsibilities, freeing them up to pursue higher-value activities.”

Tejal Patwardhan, who leads evaluations at OpenAI, told TechCrunch she is optimistic about the rapid progress seen with GDPval. For example, OpenAI’s GPT-4o model, which launched about 15 months ago, achieved only a 13.7% win or tie rate against humans. Now, GPT-5’s results are nearly three times higher, a trend Patwardhan expects will continue.

The tech industry uses a variety of benchmarks to gauge the advancement of AI models and determine if a model is truly cutting-edge. Popular examples include AIME 2025, which tests competitive math skills, and GPQA Diamond, which assesses knowledge of PhD-level science. Yet, many AI models are reaching their limits on these tests, and researchers are calling for new benchmarks that better evaluate real-world capabilities.

Benchmarks like GDPval may play a growing role in this discussion, as OpenAI argues its models can benefit a broad spectrum of industries. Still, a more thorough version of the test may be necessary before OpenAI can definitively claim its AI surpasses human performance.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops

Lock your assets and earn 10%+ APR

Lock now!

- ChainOpera AI (COAI) plummeted 90% in late 2025 due to CEO resignation, $116M losses, and regulatory ambiguity from the CLARITY Act. - Market panic and 88% supply concentration in top wallets amplified the selloff, while stablecoin collapses worsened liquidity risks. - Contrarians highlight C3 AI's 26% YoY revenue growth and potential 2026 regulatory clarity as signs of mispriced long-term AI/crypto opportunities. - Technical indicators suggest $22.44 as a critical resistance level, with analysts warning

Bitget-RWA•2025/12/12 14:22

COAI Experiences Significant Price Decline in Late November 2025: Is the Market Overreacting or Does This Present a Contrarian Investment Chance?

Hyperliquid (HYPE) Price Rally: An In-Depth Look at Protocol Advancements and Liquidity Trends

- Hyperliquid's HYPE token surged 3.03% amid HIP-3 upgrades enabling permissionless perpetual markets and USDH stablecoin launch. - Protocol innovations boosted liquidity by 15% but failed to halt market share erosion to under 20% against competitors like Aster. - Structural challenges persist through token unstaking, unlocks, and OTC sales, yet HyENA's $50M 48-hour volume signaled renewed engagement. - Whale accumulation of $19.38M near $45-46 and HYPE buybacks aim to stabilize price, though long-term suc

Bitget-RWA•2025/12/12 14:22

Hyperliquid (HYPE) Price Rally: An In-Depth Look at Protocol Advancements and Liquidity Trends

ChainOpera AI Token Plummets Unexpectedly: Is This a Warning Sign for Crypto Investors Focused on AI?

- ChainOpera AI's 96% value collapse in late 2025 exposed critical risks in centralized, opaque AI-driven crypto projects. - 87.9% token concentration in ten wallets enabled manipulation, while untested AI algorithms and lack of audits eroded trust. - Regulatory ambiguity from delayed U.S. CLARITY Act and EU AI Act created fragmented frameworks, deterring institutional participation. - Post-crash trends prioritize decentralized governance, auditable smart contracts, and compliance with AML/KYC protocols fo

Bitget-RWA•2025/12/12 14:02

ChainOpera AI Token Plummets Unexpectedly: Is This a Warning Sign for Crypto Investors Focused on AI?

Modern Monetary Theory and the Transformation of Cryptocurrency Valuation Models in 2025

- Modern Monetary Theory (MMT) reshaped crypto valuation in 2025, transitioning digital assets from speculative tools to institutional liquidity instruments amid low-yield environments. - Central banks and 52% of hedge funds adopted MMT-aligned CBDCs and regulated stablecoins, with BlackRock's IBIT ETF managing $50B as crypto gained portfolio diversification status. - Regulatory divergence (e.g., U.S. CLARITY Act vs. New York BitLicense) created volatility, exemplified by the Momentum (MMT) token's 1,300%

Bitget-RWA•2025/12/12 13:42

OpenAI claims that GPT-5 matches human performance across numerous professions

You may also like

Trending news

Crypto prices