Researchers discover AI models generate photos of real people and copyrighted images

TechSpot is about to celebrate its 25th anniversary. TechSpot means tech analysis and advice you can trust.

What just happened? Researchers have found that popular picture creation models are susceptible to being instructed to generate recognizable images of real people, potentially endangering their privacy. Some prompts cause the AI to copy a picture rather than develop something entirely different. These remade pictures might contain copyrighted material. But what’s worse is that contemporary AI generative models can memorize and replicate private data scraped up for use in an AI training set.

Researchers gathered more than a thousand training examples from the models, which ranged from individual person photographs to film stills, copyrighted news images, and trademarked firm logos, and discovered that the AI reproduced many of them almost identically. Researchers from colleges like Princeton and Berkeley, as well as from the tech sector—specifically Google and DeepMind—conducted the study.

The same team worked on a previous study that pointed out a similar issue with AI language models, especially GPT2, the forerunner to OpenAI’s wildly successful ChatGPT. Reuniting the band, the team under the guidance of Google Brain researcher Nicholas Carlini discovered the results by providing captions for images, such as a person’s name, to Google’s Imagen and Stable Diffusion. Afterward, they verified if any of the generated images matched the originals kept in the model’s database.

The dataset from Stable Diffusion, the multi-terabyte scraped image collection known as LAION, was used to generate the image below. It used the caption specified in the dataset. The identical image, albeit slightly warped by digital noise, was produced when the researchers entered the caption into the Stable Diffusion prompt. Next, the team manually verified if the image was a part of the training set after repeatedly executing the same prompt.

The researchers noted that a non-memorized response can still faithfully represent the text that the model was prompted with, but would not have the same pixel makeup and would differ from any training images.

Professor of computer science at ETH Zurich and research participant Florian Tramèr observed significant limitations to the findings. The photos that the researchers were able to extract either recurred frequently in the training data or stood out significantly from the rest of the photographs in the data set. According to Florian Tramèr, those with uncommon names or appearances are more likely to be ‘memorized.’

Diffusion AI models are the least private kind of image-generation model, according to the researchers. In comparison to Generative Adversarial Networks (GANs), an earlier class of picture model, they leak more than twice as much training data. The goal of the research is to alert developers to the privacy risks associated with diffusion models, which include a variety of concerns such as the potential for misuse and duplication of copyrighted and sensitive private data, including medical images, and vulnerability to outside attacks where training data can be easily extracted. A fix that researchers suggest is identifying duplicate generated photos in the training set and removing them from the data collection.

Read More
Joan Volkman

Latest

Tencent Music Posts 7.3% Q1 2026 Revenue Jump, Points to Triple-Digit Live Growth and Continued Superfan Expansion

A live performance from Jay Chou, whose Children of the Sun is said to have generated about $14.7 million on Tencent Music during Q1 2026. Photo Credit: GEM_Ady Amid a continued SVIP expansion and a triple-digit revenue boost on the concerts side, Tencent Music Entertainment (TME) has reported nearly $1.2 billion in Q1 2026 revenue.

Newsletter

Don't miss

Tencent Music Posts 7.3% Q1 2026 Revenue Jump, Points to Triple-Digit Live Growth and Continued Superfan Expansion

A live performance from Jay Chou, whose Children of the Sun is said to have generated about $14.7 million on Tencent Music during Q1 2026. Photo Credit: GEM_Ady Amid a continued SVIP expansion and a triple-digit revenue boost on the concerts side, Tencent Music Entertainment (TME) has reported nearly $1.2 billion in Q1 2026 revenue.

BLXCKIE Previews New Song “Uphi Usomnyama”

MusicBLXCKIE Previews New Song “Uphi Usomnyama.” The SA...

WD sees sustainability as key business driver in an ‘AI economy’

Hard drive company WD promoted long-term operations and sustainability executive Jackie Jung to become its first chief sustainability officer in February, as it steps up sales to companies building AI data centers. Her vision: Turn sustainability into a “brand” for WD, a strategy that reduces risk for the $6 billion company (formerly known as Western

5 Business Ideas Worth Starting in 2026

If there is one thing Nigerians understand well, it is how to spot opportunity inside hardship. In 2026, that mindset will matter more than ever. The economy is tough, competition is rising, and many people are looking for smarter ways to earn, build, and survive. But even in a difficult environment, some businesses still stand

Getting a business loan now comes with a frequent flyer upside

Australian fintech Prospa has partnered with Qantas Business Rewards, letting eligible SMEs earn up to 500,000 points per loan. What’s happening: Australian fintech lender Prospa has partnered with Qantas Business Rewards to allow eligible small and medium business owners to earn up to 500,000 Qantas Points per loan when taking out a Prospa Small Business