OpenAI checked to see whether GPT-4 could take over the world

An AI-generated image of the earth enveloped in an explosion.

Ars Technica

As part of pre-release safety testing for its new GPT-4 AI model, launched Tuesday, OpenAI allowed an AI testing group to assess the potential risks of the model’s emergent capabilities—including “power-seeking behavior,” self-replication, and self-improvement.

While the testing group found that GPT-4 was “ineffective at the autonomous replication task,” the nature of the experiments raises eye-opening questions about the safety of future AI systems.

Raising alarms

“Novel capabilities often emerge in more powerful models,” writes OpenAI in a GPT-4 safety document published yesterday. “Some that are particularly concerning are the ability to create and act on long-term plans, to accrue power and resources (“power-seeking”), and to exhibit behavior that is increasingly ‘agentic.'” In this case, OpenAI clarifies that “agentic” isn’t necessarily meant to humanize the models or declare sentience but simply to denote the ability to accomplish independent goals.

Over the past decade, some AI researchers have raised alarms that sufficiently powerful AI models, if not properly controlled, could pose an existential threat to humanity (often called “x-risk,” for existential risk). In particular, “AI takeover” is a hypothetical future in which artificial intelligence surpasses human intelligence and becomes the dominant force on the planet. In this scenario, AI systems gain the ability to control or manipulate human behavior, resources, and institutions, usually leading to catastrophic consequences.

As a result of this potential x-risk, philosophical movements like Effective Altruism (“EA”) seek to find ways to prevent AI takeover from happening. That often involves a separate but often interrelated field called AI alignment research.

In AI, “alignment” refers to the process of ensuring that an AI system’s behaviors align with those of its human creators or operators. Generally, the goal is to prevent AI from doing things that go against human interests. This is an active area of research but also a controversial one, with differing opinions on how best to approach the issue, as well as differences about the meaning and nature of “alignment” itself.

GPT-4’s big tests

Ars Technica

While the concern over AI “x-risk” is hardly new, the emergence of powerful large language models (LLMs) such as ChatGPT and Bing Chat—the latter of which appeared very misaligned but launched anyway—has given the AI alignment community a new sense of urgency. They want to mitigate potential AI harms, fearing that much more powerful AI, possibly with superhuman intelligence, may be just around the corner.

Read More
Benj Edwards

Latest

Newsletter

Don't miss

WD sees sustainability as key business driver in an ‘AI economy’

Hard drive company WD promoted long-term operations and sustainability executive Jackie Jung to become its first chief sustainability officer in February, as it steps up sales to companies building AI data centers. Her vision: Turn sustainability into a “brand” for WD, a strategy that reduces risk for the $6 billion company (formerly known as Western

5 Business Ideas Worth Starting in 2026

If there is one thing Nigerians understand well, it is how to spot opportunity inside hardship. In 2026, that mindset will matter more than ever. The economy is tough, competition is rising, and many people are looking for smarter ways to earn, build, and survive. But even in a difficult environment, some businesses still stand

Getting a business loan now comes with a frequent flyer upside

Australian fintech Prospa has partnered with Qantas Business Rewards, letting eligible SMEs earn up to 500,000 points per loan. What’s happening: Australian fintech lender Prospa has partnered with Qantas Business Rewards to allow eligible small and medium business owners to earn up to 500,000 Qantas Points per loan when taking out a Prospa Small Business