Confidential health records from UK BioBank project exposed online

Confidential health data has been exposed online on dozens of occasions, a Guardian investigation can reveal, raising questions about the safeguarding of patient records by one of the UK’s flagship medical research projects.

UK Biobank, which holds the medical records of 500,000 British volunteers, is one of the world’s most comprehensive stores of health information and is credited with driving breakthroughs in cancer, dementia and diabetes research. But scientists approved to access Biobank’s sensitive data appear to have sometimes been cavalier about its security.

The files, which seem to have been inadvertently posted online by researchers using the data, do not include names or addresses, but they may still pose privacy concerns. One dataset found by the Guardian contained millions of hospital diagnoses and associated dates for more than 400,000 participants.

With the consent of a Biobank volunteer, the Guardian was able to pinpoint what appeared to be extensive hospital diagnosis records for the volunteer, using only their month and year of birth and details of a major surgery they had undergone.

One data expert said the scale and persistence of the problem was “shocking” at a time when AI and social media were making it ever easier to cross-reference information online.

UK Biobank rejected the concerns, saying that no identifying data, such as names and addresses, were provided to researchers.

In a statement, Prof Sir Rory Collins, the chief executive of UK Biobank, said: “We have never seen any evidence of any UK Biobank participant being re-identified by others.”

’They said they would hold our data securely’

Founded in 2003 by the Department of Health and medical research charities, UK Biobank holds genome sequences, scans, blood samples and lifestyle information of 500,000 volunteers. Last month, the government extended Biobank’s access to volunteers’ GP records.

Scientists at universities and private companies across the world apply for access and, until late 2024, were free to download data directly on to their own computer systems.

Before this point, data had been inadvertently published online and Biobank appears to still be grappling with the problem.

The issue emerged because journals and funders increasingly require researchers to publish the code they have used to analyse large datasets. When intending to upload code, some researchers have also accidentally published partial or entire Biobank datasets to GitHub, a popular online code-sharing platform. UK Biobank prohibits researchers from sharing data outside their systems and says it has introduced further training for all researchers.

In the past year, the data leaks appear to have become a more urgent concern to UK Biobank. Between July and December 2025, it issued 80 legal notices to GitHub, which has complied with requests to remove data from the internet. Yet much still remains available.

Some of the data files contain just patient IDs, or test results for small numbers, others are more extensive. One dataset found online by the Guardian in January contained hospital diagnoses and associated diagnosis dates for about 413,000 participants, along with their sex and month and year of birth.

A data expert, who reviewed the file said: “It sent shivers down my spine to even open. I deleted the file immediately. It was very detailed and felt like a gross invasion of privacy even to glance at.”

To test the risk of re-identification, the Guardian approached several Biobank volunteers, two of whom had undergone medical procedures in the timeframe within the data and agreed to share these details with an external data scientist.

One volunteer, who provided treatment dates for a fracture and seizure, could not be located in the dataset. A second volunteer, a woman in her 70s, shared her month and year of birth and the month and year she had a hysterectomy. Only one person in the dataset matched these details. The apparent match was corroborated by five other diagnoses from the records that the volunteer had not initially disclosed.

“Effectively you were rehearsing the main parts of my medical history to me without me having given you any information at all. I didn’t expect that,” the volunteer said.

The woman said she was not too concerned about her own data being exposed and intended to remain a participant, saying that she viewed UK Biobank’s work as “extremely important”. But, she added: “I’m more concerned about whether Biobank has broken its agreement with people. They said they would hold our data securely … I just feel as though that has to come into the equation.”

UK Biobank said the re-identification scenario tested by the Guardian did not highlight a privacy risk because without additional information it would be impossible to identify individuals.

A Biobank spokesperson said: “As we have communicated to our participants, including on our website: ‘If a participant puts information that reveals something about their health and identity, such as genealogy data, on a public website, this could make it possible for their identity to be discovered by cross-referencing UK Biobank research data.’

“You have simply demonstrated why we tell participants not to do this.”

The spokesperson added that Biobank had taken extensive measures to protect participants’ privacy, including proactively searching GitHub, contacting researchers directly and issuing legal takedown notices, actions which they said had led to about 500 repositories being removed. Many of these, it said, contained only patient IDs, not health data.

‘There are tensions between driving research with data and protecting privacy’

Privacy experts said UK Biobank’s approach appeared at odds with the reality that many people, reasonably, shared some health information online and that in an age of AI this could readily be identified and cross-referenced.

“Are these people aware that the internet exists?” asked Prof Felix Ritchie, an economist at the University of the West of England. “The idea that they can rely on their volunteers never putting any other information out there about themselves is an entirely unreasonable thing to expect.”

Dr Luc Rocher, associate professor at the Oxford Internet Institute, who reviewed several Biobank datasets found online, said that removing identifiers often did not guarantee anonymity and that simply knowing a person’s birthday and, say, the date they broke a leg might be enough to pinpoint their record with high confidence.

“Once identified, that record could reveal sensitive information such as a psychiatric diagnosis, an HIV test result, or a history of drug abuse,” they said.

Prof Niels Peek, professor of data science and healthcare improvement at the University of Cambridge, said the scale of the problem was “shocking”. “If it had happened once or 10 times I’d probably say: ‘It’s not great that it’s happened but at the same time zero risk is impossible,’” he said. “Hundreds. That’s a little bit too much.”

In Peek’s view, Biobank’s actions show it has taken the issue seriously and “done everything that one can reasonably expect”. But, he added: “The scale and persistence with which this has happened demonstrates that there are huge tensions between the ambition to drive health research with data at scale and the legal and ethical imperative to protect people’s privacy.”

Experts questioned whether Biobank will be able to fully regain control of the data released online. Despite researchers and GitHub having taken down most of the offending repositories in response to Biobank’s requests, many of the relevant files remained available on a code archive website until shortly before publication.

Additional reporting by Luke Hoyland

Michele Lupo
Read More

Latest

Abortion bans lead to worse outcomes for miscarriages

🛡️ Just a quick check We’re checking your connection to prevent automated abuse

Kids Keep Getting Stuck in Hospitals, Even After Being Cleared for Discharge

Overwhelmed by the demands of caregiving, Quette dialed 911 when she found her teenage son downstairs in their kitchen struggling to breathe. He had rolled his wheelchair to the oven to keep himself warm as he tried to regulate his temperature, she recalled, and was drenched in sweat from an apparent infection. In that moment

Edimakor Mac V4.8.0 Elevates AI Music with Lyria 3 Pro & Integrates Seedance 2.0

Music NEW YORK, NY, April 18, 2026 /24-7PressRelease/ --...

Justin Bieber turns Coachella 2026 into a $5M merch empire

Music Please enable JS and disable any ad blockerRead...

Newsletter

Don't miss

Abortion bans lead to worse outcomes for miscarriages

🛡️ Just a quick check We’re checking your connection to prevent automated abuse

Kids Keep Getting Stuck in Hospitals, Even After Being Cleared for Discharge

Overwhelmed by the demands of caregiving, Quette dialed 911 when she found her teenage son downstairs in their kitchen struggling to breathe. He had rolled his wheelchair to the oven to keep himself warm as he tried to regulate his temperature, she recalled, and was drenched in sweat from an apparent infection. In that moment

Edimakor Mac V4.8.0 Elevates AI Music with Lyria 3 Pro & Integrates Seedance 2.0

Music NEW YORK, NY, April 18, 2026 /24-7PressRelease/ --...

Justin Bieber turns Coachella 2026 into a $5M merch empire

Music Please enable JS and disable any ad blockerRead...

Apple Music announces RCee as the latest ‘Up Next’ artiste in Ghana

Music Music Something's wrong here... It looks like nothing was...

Tesla’s Business Has Become Much More Diversified in Just the Past Five Years. Does That Make Its Stock a Better Buy Today?

Key Points Tesla's energy generation and storage segment generated 27% revenue growth last year. The company's non-automotive segments were able to help offset a double-digit decline in auto revenue in 2025. These 10 stocks could mint the next wave of millionaires › Tesla (NASDAQ: TSLA) is known for its electric vehicles (EVs), and while they

WD sees sustainability as key business driver in an ‘AI economy’

Hard drive company WD promoted long-term operations and sustainability executive Jackie Jung to become its first chief sustainability officer in February, as it steps up sales to companies building AI data centers. Her vision: Turn sustainability into a “brand” for WD, a strategy that reduces risk for the $6 billion company (formerly known as Western

5 Business Ideas Worth Starting in 2026

If there is one thing Nigerians understand well, it is how to spot opportunity inside hardship. In 2026, that mindset will matter more than ever. The economy is tough, competition is rising, and many people are looking for smarter ways to earn, build, and survive. But even in a difficult environment, some businesses still stand