Frontier AI models rival industry experts’ work quality: OpenAI

  1. Homepage
  2. >
  3. News
  4. >

  5. Business
  6. >
  7. Frontier AI models rival industry experts’ work quality: OpenAI

OpenAI, one of the world’s leading artificial intelligence (AI) developers, has introduced a new evaluation tool designed to help track the performance of AI models on real-world tasks. It found that human-produced work still outperforms AI in quality when rated against the leading AI models, but the difference is increasingly narrow.

On September 25, OpenAI announced the evaluation tool ‘GDPval,’ which examined 44 occupations selected from the top nine industries contributing to the United States gross domestic product (GDP). Based on the results, human work was still rated higher in quality in at least 50% of the tasks compared to the various AI models.

The best performing model, Claude Opus 4.1, won or drew in 47% of tasks when compared to the human output. Meanwhile, GPT‑5 “excelled in particular on accuracy,” winning or drawing in 38.8% of tasks.

The evaluation compared the output of industry experts across 220 tasks—representing the types of “day-to-day work” where AI can meaningfully assist professionals—with the output of several leading AI models, namely GPT‑4o, o4-mini, OpenAI o3, GPT‑5, Claude Opus 4.1, Gemini 2.5 Pro, and Grok 4. The quality of the output was then vetted by professionals with over 14 years of experience in the fields being tested.

The results offer some reassurance to those concerned about losing their jobs to AI, concluding that humans still have the edge in quality, for now. However, considering the rapid pace of AI’s evolution since its inception, it is clearly catching up.

The evaluation also revealed that AI was already far outstripping humans in other criteria. Specifically, frontier AI models can complete GDPval tasks “roughly 100x faster and 100x cheaper than industry experts,” indicating that, if-and-when the technology does catch up, businesses may be left with an obvious cost-cutting choice.

“As AI becomes more capable, it will likely cause changes in the job market,” said OpenAI. “Early GDPval results show that models can already take on some repetitive, well-specified tasks faster and at lower cost than experts.”

However, the company qualified this by adding that “most jobs are more than just a collection of tasks that can be written down. GDPval highlights where AI can handle routine tasks so people can spend more time on the creative, judgment-heavy parts of work.”

This, suggested OpenAI, could be the future of the workplace. Rather than AI simply replacing human jobs, it can be used to complement and enhance human work.

“When AI complements workers in this way it can translate into significant economic growth,” said the company. “Our goal is to keep everyone on the “up elevator” of AI by democratizing access to these tools, supporting workers through change, and building systems that reward broad contribution.”

After examining the evaluation results, OpenAI said that they incrementally trained an internal, experimental version of GPT‑5 to assess whether they could improve its performance. They found this process did produce better results for the model on the same evaluation, thus “creating a pathway for further potential improvement.”

According to OpenAI, GDPval represents the first version of this evaluation, which will be ongoing. It also noted that GDPval builds on previous evaluations, from Massive Multitask Language Understanding (MMLU) using exam-style questions across dozens of subjects, to more “applied evaluations” such as SWE-Bench (software engineering bug-fixing tasks) and Paper-Bench (scientific reasoning and critique on research papers).

“People often speculate about AI’s broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing,” said OpenAI. “Previous AI evaluations like challenging academic tests and competitive coding challenges have been essential in pushing the boundaries of model reasoning capabilities, but they often fall short of the kind of tasks that many people handle in their everyday work.”

In contrast, GDPval models’ performance on tasks drawn “directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks,” said the company.

OpenAI said it plans to expand GDPval to include more occupations, industries, and task types, with increased interactivity, as well as more tasks involving navigating ambiguity. The long-term goal is to better measure the progress of AI on “diverse knowledge” work.

“GDPval is an early step that doesn’t reflect the full nuance of many economic tasks,” said OpenAI. “Future versions will extend to more interactive workflows and context-rich tasks to better reflect the complexity of real-world knowledge work.”

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: Can blockchain keep AI in check?

Tagged:

Read More
James Field

Latest

World Cup crypto sponsorships surge as Kraken backs 2026 tournament amid fan token frenzy

Three minutes. That’s all it took for US defender Auston Trusty to find the net against Turkey in the 2026 FIFA World Cup, scoring his first goal for the national team on the sport’s biggest stage. Coach Mauricio Pochettino ran to embrace Trusty on the sidelines, planting a kiss on the defender’s head in a

Netherlands beats Tunisia 3-1 as crypto prediction markets and fan tokens ride World Cup wave

The Netherlands cruised past Tunisia 3-1 on June 25, topping Group F with seven points and booking a round of 32 clash against Morocco in Monterrey on Monday. But the real story for crypto watchers isn’t what happened on the pitch. It’s what happened on-chain. Polymarket, the prediction market platform that has become the de

IBM Unveils Sub-1 Nanometer Chip With 100 Billion Transistors, Extending Moore’s Law

IBM on Thursday unveiled the world’s first sub-1 nanometer chip technology, a research prototype at the 0.7 nanometer node that packs nearly 100 billion transistors onto a chip the size of a fingernail. Key Takeaways IBM’s nanostack chip at the 0.7 nm node packs nearly 100 billion transistors, nearly 2x the density of IBM’s 2021

7 Free Franchise Opportunities to Start Today

If you’re looking to start a business without breaking the bank, there are numerous free franchise opportunities available. You can consider in-home child care, pet care services, or even cleaning startups, all requiring minimal investment and offering flexible schedules. Each option allows you to build client relationships and utilize effective marketing strategies. Ready to explore

Newsletter

Don't miss

World Cup crypto sponsorships surge as Kraken backs 2026 tournament amid fan token frenzy

Three minutes. That’s all it took for US defender Auston Trusty to find the net against Turkey in the 2026 FIFA World Cup, scoring his first goal for the national team on the sport’s biggest stage. Coach Mauricio Pochettino ran to embrace Trusty on the sidelines, planting a kiss on the defender’s head in a

Netherlands beats Tunisia 3-1 as crypto prediction markets and fan tokens ride World Cup wave

The Netherlands cruised past Tunisia 3-1 on June 25, topping Group F with seven points and booking a round of 32 clash against Morocco in Monterrey on Monday. But the real story for crypto watchers isn’t what happened on the pitch. It’s what happened on-chain. Polymarket, the prediction market platform that has become the de

IBM Unveils Sub-1 Nanometer Chip With 100 Billion Transistors, Extending Moore’s Law

IBM on Thursday unveiled the world’s first sub-1 nanometer chip technology, a research prototype at the 0.7 nanometer node that packs nearly 100 billion transistors onto a chip the size of a fingernail. Key Takeaways IBM’s nanostack chip at the 0.7 nm node packs nearly 100 billion transistors, nearly 2x the density of IBM’s 2021

7 Free Franchise Opportunities to Start Today

If you’re looking to start a business without breaking the bank, there are numerous free franchise opportunities available. You can consider in-home child care, pet care services, or even cleaning startups, all requiring minimal investment and offering flexible schedules. Each option allows you to build client relationships and utilize effective marketing strategies. Ready to explore

Top 7 Factors Influencing Business Loan Lending Rates

When you’re considering a business loan, several key factors will influence the lending rates you encounter. Your creditworthiness plays a significant role, as lenders assess your financial stability through credit scores and history. Market conditions, including economic trends and central bank policies, likewise impact rates. Furthermore, the amount you wish to borrow and the collateral

Business Insurance-AZ Achieves Record Response Times for 2026 Arizona Construction Bids

Business Insurance-AZ achieves milestone response speeds for commercial construction bids across Arizona, accelerating documentation delivery to keep local projects moving forward without delay. Phoenix, AZ, June 06-2026, ZEX PR WIRE — Business Insurance-AZ has achieved record-breaking processing speeds and response times for commercial construction bids throughout Arizona, directly supporting the state’s massive infrastructure and advanced manufacturing boom

Business delegation visits Kazakhstan to strengthen economic and trade cooperation

Astana, Kazakhstan, Jun 2, 2026 - (ACN Newswire) - A business delegation led by the Chief Executive of the Hong Kong Special Administrative Region (HKSAR), John Lee, and organised by the Hong Kong Trade Development Council (HKTDC), began its visit to Astana, the capital of Kazakhstan, on 1 June. During the visit, a total of 43

13 Real Business Trip Stories That Prove Work Travel Collects More Stories Than Miles

Real business trips almost never go the way the itinerary promised. They start with a confidently-packed suitcase and an eight-page agenda, and somewhere between the airport gate and the hotel breakfast they quietly turn into something nobody could have invented — equal parts comedy, chaos, and unscheduled adventure. These 13 real business trip moments are exactly that kind of work-trip plot