{"id":875124,"date":"2025-10-02T21:33:59","date_gmt":"2025-10-03T02:33:59","guid":{"rendered":"https:\/\/newsycanuse.com\/index.php\/2025\/10\/02\/frontier-ai-models-rival-industry-experts-work-quality-openai\/"},"modified":"2025-10-02T21:33:59","modified_gmt":"2025-10-03T02:33:59","slug":"frontier-ai-models-rival-industry-experts-work-quality-openai","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2025\/10\/02\/frontier-ai-models-rival-industry-experts-work-quality-openai\/","title":{"rendered":"Frontier AI models rival industry experts\u2019 work quality: OpenAI"},"content":{"rendered":"<div>\n<div>\n<ol>\n<li><a href=\"https:\/\/coingeek.com\">Homepage<\/a><\/li>\n<li> > <\/li>\n<li><a href=\"https:\/\/coingeek.com\/news\/\">News<\/a><\/li>\n<li> > <\/li>\n<li><a href=\"https:\/\/coingeek.com\/news\/category\/business\/\"><br \/>\n                                Business<br \/>\n                       <\/a><\/li>\n<li> > <\/li>\n<li>Frontier AI models rival industry experts\u2019 work quality: OpenAI<\/li>\n<\/ol>\n<\/div>\n<p><a href=\"https:\/\/coingeek.com\/news\/tag\/openai\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI<\/a>, one of the world\u2019s leading artificial intelligence (AI) developers, has introduced a new evaluation tool designed to help track the performance of AI models on real-world tasks. It found that human-produced work still outperforms AI in quality when rated against the leading AI models, but the difference is increasingly narrow.<\/p>\n<p>On September 25, OpenAI <a href=\"https:\/\/openai.com\/index\/gdpval\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">announced<\/a> the evaluation tool \u2018GDPval,\u2019 which examined 44 occupations selected from the top nine industries contributing to the United States gross domestic product (GDP). Based on the results, human work was still rated higher in quality in at least 50% of the tasks compared to the various AI models.<\/p>\n<p>The best performing model, Claude Opus 4.1, won or drew in 47% of tasks when compared to the human output. Meanwhile, <a href=\"https:\/\/coingeek.com\/gpt-5-enterprise-ai-future-of-scalable-blockchain-integration\/\" target=\"_blank\" rel=\"noreferrer noopener\">GPT\u20115<\/a> \u201cexcelled in particular on accuracy,\u201d winning or drawing in 38.8% of tasks.<\/p>\n<p>The evaluation compared the output of industry experts across 220 tasks\u2014representing the types of \u201cday-to-day work\u201d where AI can meaningfully assist professionals\u2014with the output of several leading AI models, namely GPT\u20114o, o4-mini, OpenAI o3, GPT\u20115, Claude Opus 4.1, Gemini 2.5 Pro, and Grok 4. The quality of the output was then vetted by professionals with over 14 years of experience in the fields being tested.<\/p>\n<p>The results offer some reassurance to those <a href=\"https:\/\/coingeek.com\/ai-isnt-taking-your-job-ceos-are\/\" target=\"_blank\" rel=\"noreferrer noopener\">concerned about losing their jobs to AI<\/a>, concluding that humans still have the edge in quality, for now. However, considering the rapid pace of <a href=\"https:\/\/coingeek.com\/chatgpt-evolution-in-2024-what-next\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI\u2019s evolution<\/a> since its inception, it is clearly catching up.<\/p>\n<p>The evaluation also revealed that AI was already far outstripping humans in other criteria. Specifically, frontier AI models can complete GDPval tasks \u201croughly 100x faster and 100x cheaper than industry experts,\u201d indicating that, if-and-when the technology does catch up, businesses may be left with an obvious cost-cutting choice.<\/p>\n<p>\u201cAs AI becomes more capable, it will likely cause changes in the job market,\u201d said OpenAI. \u201cEarly GDPval results show that models can already take on some repetitive, well-specified tasks faster and at lower cost than experts.\u201d<\/p>\n<p>However, the company qualified this by adding that \u201cmost jobs are more than just a collection of tasks that can be written down. GDPval highlights where AI can handle routine tasks so people can spend more time on the creative, judgment-heavy parts of work.\u201d<\/p>\n<div>\n<p>This, suggested OpenAI, could be the <a href=\"https:\/\/coingeek.com\/ai-in-the-workplace\/\" target=\"_blank\" rel=\"noreferrer noopener\">future of the workplace<\/a>. Rather than AI simply replacing human jobs, it can be used to complement and enhance human work.<\/p>\n<p>\u201cWhen AI complements workers in this way it can translate into significant economic growth,\u201d said the company. \u201cOur goal is to keep everyone on the \u201cup elevator\u201d of AI by democratizing access to these tools, supporting workers through change, and building systems that reward broad contribution.\u201d\n<\/p>\n<\/div>\n<p>After examining the evaluation results, OpenAI said that they incrementally trained an internal, experimental version of GPT\u20115 to assess whether they could improve its performance. They found this process did produce better results for the model on the same evaluation, thus \u201ccreating a pathway for further potential improvement.\u201d<\/p>\n<p>According to OpenAI, GDPval represents the first version of this evaluation, which will be ongoing. It also noted that GDPval builds on previous evaluations, from Massive Multitask Language Understanding (MMLU) using exam-style questions across dozens of subjects, to more \u201capplied evaluations\u201d such as <a href=\"https:\/\/openai.com\/index\/introducing-swe-bench-verified\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">SWE-Bench<\/a> (software engineering bug-fixing tasks) and <a href=\"https:\/\/openai.com\/index\/paperbench\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Paper-Bench<\/a> (scientific reasoning and critique on research papers).<\/p>\n<p>\u201cPeople often speculate about AI\u2019s broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing,\u201d said OpenAI. \u201cPrevious AI evaluations like challenging academic tests and competitive coding challenges have been essential in pushing the boundaries of model reasoning capabilities, but they often fall short of the kind of tasks that many people handle in their everyday work.\u201d<\/p>\n<p>In contrast, GDPval models\u2019 performance on tasks drawn \u201cdirectly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks,\u201d said the company.<\/p>\n<p>OpenAI said it plans to expand GDPval to include more occupations, industries, and task types, with increased interactivity, as well as more tasks involving navigating ambiguity. The long-term goal is to better measure the progress of AI on \u201cdiverse knowledge\u201d work.<\/p>\n<p>\u201cGDPval is an early step that doesn\u2019t reflect the full nuance of many economic tasks,\u201d said OpenAI. \u201cFuture versions will extend to more interactive workflows and context-rich tasks to better reflect the complexity of real-world knowledge work.\u201d<\/p>\n<p><em>In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership\u2014allowing it to keep data safe while also guaranteeing the immutability of data.  <a href=\"https:\/\/coingeek.com\/news\/tag\/artificial-intelligence\/\" target=\"_blank\" rel=\"noreferrer noopener\">Check out CoinGeek\u2019s coverage<\/a> on this emerging tech to learn more  <a href=\"https:\/\/coingeek.com\/ai-needs-guardrails-enterprise-blockchain-has-a-role-to-play\/\" target=\"_blank\" rel=\"noreferrer noopener\">why Enterprise blockchain will be the backbone of AI<\/a>.<\/em><\/p>\n<p>Watch: Can blockchain keep AI in check?<\/p>\n<p><iframe src=\"https:\/\/www.youtube.com\/embed\/28soehv0Avs?si=74dX3_Ys9GJ0aqk_&#038;controls=0\" frameborder=\"0\" allowfullscreen> title=&#8221;YouTube video player&#8221; frameborder=&#8221;0&#8243; allow=&#8221;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#8221; referrerpolicy=&#8221;strict-origin-when-cross-origin&#8221; allowfullscreen=&#8221;&#8221;><\/iframe><\/p>\n<div>\n<p><h3>Tagged:<\/h3>\n<\/p>\n<\/div><\/div>\n<p><a href=\"https:\/\/coingeek.com\/frontier-ai-models-rival-industry-experts-work-quality-openai\/\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n James Field<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Homepage &gt; News &gt; Business &gt; Frontier AI models rival industry experts\u2019 work quality: OpenAI OpenAI, one of the world\u2019s leading artificial intelligence (AI) developers, has introduced a new evaluation tool designed to help track the performance of AI models on real-world tasks. It found that human-produced work still outperforms AI in quality when rated<\/p>\n","protected":false},"author":1,"featured_media":875125,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25290,2483],"tags":[],"class_list":["post-875124","post","type-post","status-publish","format-standard","has-post-thumbnail","category-frontier","category-models"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/875124","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=875124"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/875124\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/875125"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=875124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=875124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=875124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}