AI models spit out photos of real people and copyrighted images

Popular image generation models can be prompted to produce identifiable photos of real people, potentially threatening their privacy, according to new research. The work also shows that these AI systems can be made to regurgitate exact copies of medical images and copyrighted work by artists. It’s a finding that could strengthen the case for artists who are currently suing AI companies for copyright violations.

The researchers, from Google, DeepMind, UC Berkeley, ETH Zürich, and Princeton, got their results by prompting Stable Diffusion and Google’s Imagen with captions for images, such as a person’s name, many times. Then they analyzed whether any of the images they generated matched original images in the model’s database. The group managed to extract over 100 replicas of images in the AI’s training set. 

These image-generating AI models are trained on vast data sets consisting of images with text descriptions that have been scraped from the internet. The latest generation of the technology works by taking images in the data set and changing one pixel at a time until the original image is nothing but a collection of random pixels. The AI model then reverses the process to make the pixelated mess into a new image. 

The paper is the first time researchers have managed to prove that these AI models memorize images in their training sets, says Ryan Webster, a PhD student at the University of Caen Normandy in France, who has studied privacy in other image generation models but was not involved in the research. This could have implications for startups wanting to use generative AI models in health care, because it shows that these systems risk leaking sensitive private information. OpenAI, Google, and Stability.AI did not respond to our requests for comment. 

Eric Wallace, a PhD student at UC Berkeley who was part of the study group, says they hope to raise the alarm over the potential privacy issues around these AI models before they are rolled out widely in sensitive sectors like medicine. 

“A lot of people are tempted to try to apply these types of generative approaches to sensitive data, and our work is definitely a cautionary tale that that’s probably a bad idea, unless there’s some kind of extreme safeguards taken to prevent [privacy infringements],” Wallace says.

The extent to which these AI models memorize and regurgitate images from their databases is also at the root of a huge feud between AI companies and artists. Stability.AI is facing two lawsuits from a group of artists and Getty Images, who argue that the company unlawfully scraped and processed their copyrighted material. 

The researchers’ findings could strengthen the hand of artists accusing AI companies of copyright violations. If artists whose work was used to train Stable Diffusion can prove that the model has copied their work without permission, the company might have to compensate them.

The findings are timely and important, says Sameer Singh, an associate professor of computer science at the University of California, Irvine, who was not involved in the research. “It is important for general public awareness and to initiate discussions around security and privacy of these large models,” he adds.

The paper demonstrates that it’s possible to work out whether AI models have copied images and measure to what degree this has happened, which are both very valuable in the long term, Singh says. 

Stable Diffusion is open source, meaning anyone can analyze and investigate it. Imagen is closed, but Google granted the researchers access. Singh says the work is a great example of how important it is to give research access to these models for analysis, and he argues that companies should be similarly transparent with other AI models, such as OpenAI’s ChatGPT. 

However, while the results are impressive, they come with some caveats. The images the researchers managed to extract appeared multiple times in the training data or were highly unusual relative to other images in the data set, says Florian Tramèr, an assistant professor of computer science at ETH Zürich, who was part of the group. 

People who look unusual or have unusual names are at higher risk of being memorized, says Tramèr.

The researchers were only able to extract relatively few exact copies of individuals’ photos from the AI model: just one in a million images were copies, according to Webster.

But that’s still worrying, Tramèr says: “I really hope that no one’s going to look at these results and say ‘Oh, actually, these numbers aren’t that bad if it’s just one in a million.’” 

“The fact that they’re bigger than zero is what matters,” he adds.

Read More
Melissa Heikkilä

Latest

Japan to help IBM build a massive 10,000-qubit quantum computer

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Forward-looking: After exiting the PC market and attempting to position the Watson AI supercomputer as the future of healthcare, IBM is focusing on making quantum computers practical and useful. Now, the company aims to usher in a new era in the

The Download: artificial surf pools, and unfunny AI

Plus: Meta has paused its AI data training plans in Europe This is today's edition of The Download , our weekday newsletter that provides a daily dose of what's going on in the world of technology. The cost of building the perfect wave For nearly as long as surfing has existed, surfers have been obsessed with the

Culture That Made Me: Simon Murdoch, afternoon presenter on Cork’s 96FM

Music Simon Murdoch, 40, grew up predominantly in Turner’s...

Newsletter

Don't miss

Japan to help IBM build a massive 10,000-qubit quantum computer

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Forward-looking: After exiting the PC market and attempting to position the Watson AI supercomputer as the future of healthcare, IBM is focusing on making quantum computers practical and useful. Now, the company aims to usher in a new era in the

The Download: artificial surf pools, and unfunny AI

Plus: Meta has paused its AI data training plans in Europe This is today's edition of The Download , our weekday newsletter that provides a daily dose of what's going on in the world of technology. The cost of building the perfect wave For nearly as long as surfing has existed, surfers have been obsessed with the

Culture That Made Me: Simon Murdoch, afternoon presenter on Cork’s 96FM

Music Simon Murdoch, 40, grew up predominantly in Turner’s...

Washed Out Announces Album, Shares Video for New Song “The Hardest Part”: Watch

Music Ernest Greene has announced his first Washed Out...

News24 Business | New payment platform helps SA’s unbanked street traders go cashless

Subscribers can listen to this article An informal trader with a Street Wallet QR code hanging from their neck. (Street Wallet/Supplied) Street Wallet, a new payment platform aimed at informal traders, is hoping to help South Africa's marginalised vendors enter the digital economy. The web-based platform is the brainchild of Kosta Scholiadis, who first conceived

News24 Business | Honey badger pops up in UCT’s backyard, giving student a shock

Subscribers can listen to this article Honey badger captured on camera. (Benjamin Wittenberg/Supplied). A bold honey badger was spotted above the University of Cape Town (UCT) campus, evoking a lively reaction from one student.   Benjamin Wittenberg, a student at UCT's Institute for Communities and Wildlife in Africa (iCWild), said the sight of the wild animal

eNCA Business | Market discussion | 29 May 2024

DStv Channel 403 Tuesday, 11 June 2024 Latest News Wednesday 29 May 2024 - 06:55am JOHANNESBURG - Ahead of the election, we saw some optimism from market participants. With the big day here finally, we talk to Thabani Ndwandwe, chief risk officer at Standard Bank, for his thoughts.