Drug discovery companies are customizing ChatGPT: here’s how

Science & Nature

Large language models are helping scientists to converse with artificial intelligence and even to generate potential drug targets.

Much of the world has been transfixed in recent months by the appearance of text generation engines such as OpenAI’s ChatGPT, artificial intelligence (AI) algorithms capable of producing text that seems as if it were written by a human. While tech companies like Microsoft and Google are focused on using such engines as a way to improve search and others worry they could cause a rash of plagiarized essays, fake news and bad poetry, biotechs are looking at these algorithms to bolster their businesses, as a method to contribute to drug discovery in a variety of ways.

Companies are adopting large language models to aid drug discovery.
Credit: REUTERS / Alamy Stock Photo

Biotechs that already rely on AI in their search for new drugs can turn to text generation as a simple, intuitive way to interact with some of their other AI and machine learning tools. Andrew Beam, an epidemiologist at the Harvard T.H. Chan School of Public Health and head of machine learning at Generate Biomedicines, calls ChatGPT “a really interesting interface” that allows users to work more easily with other forms of AI than their current interfaces.

For example, Insilico Medicine of New York and Hong Kong, a company set up to search for potential drug targets with its AI-driven platform, is now using ChatGPT as a new way to interact with their target discovery platform, augmenting the relationships and integration provided by knowledge graphs — previously the main method for integrating data. Petrina Kamya, a computational chemist who is head of AI platforms and president at Insilico Medicine in Montreal, says they can talk to their own discovery system thanks to ChatGPT: “Instead of clicking and clicking and clicking, you just ask a question and it composes this text that you read and you understand.”

Beyond embracing chatbots to help produce written materials, such as papers, patents or grant applications, others can repurpose them specifically for drug discovery, as a sort of advanced search engine specifically geared to biological science. “We can have a more specific, for example, Bio ChatGPT or Med ChatGPT,” says Lurong Pan, a computational chemist at the University of Alabama, Birmingham and founder and CEO of Ainnocence, a biotech with a platform to aid drug discovery. “It may change the way people are searching.” For instance, Google and DeepMind earlier this year released Med-PaLM, a chatbot designed to provide answers to medical questions.

All these chatbots are based on large language models (LLMs), algorithms trained on millions of examples of text collected from the internet. LLMs are one type of generative AI — algorithms capable of creating data that did not previously exist. For text, LLMs learn the statistical relationships between words. Then, given a prompt such as a question, they generate text by predicting which word is most likely to follow the previous word. The results seem remarkably natural, though the chatbots often make statements at odds with reality, essentially ‘hallucinating’ facts. ChatGPT is based on an LLM called Generative Pre-trained Transformer, Med-PaLM draws on Google’s Pathways Language Model, and Bard, a more generalized chatbot that Google is incorporating into its search engine, relies on Language Model for Dialogue Applications (LaMDA).

These LLMs are already proving useful for drug hunters, says Kamya. Previously, users of Insilico’s platform were able to look at a knowledge graph, a visual representation of the genes linked to a particular disease and the substances known to interact with those genes. That was useful information, but the way researchers worked with it was limited. Now, with the addition of a chat function, Kamya says the data have become much more accessible. “Being able to have a conversation with the tool is very empowering. It makes it more interesting and more fun if you’re able to query our biomedical knowledge graphs in the way you want to,” she says.

If a scientist wants to investigate psoriasis, for example, the chat function can look at the knowledge graph for that disease. It will deliver a text description that includes the major signaling pathways and genes involved in psoriasis and the compounds known to interact with them. The user can then ask any question — for example, “How many genes are in this graph?” — and get an instant response, or look for associations between genes and specific diseases, such as sarcoma. The Insilico platform, called PandaOmics, will show that the top target gene for sarcoma is PLK1. The user could interrogate further, requesting links to specific pathways — for instance, apoptosis — and get an immediate answer.

ChatGPT produces the conversational output. Insilico then validates what comes out in the chat with additional predictive AI programs trained on their own data, collected over many years. As a result, “Our output is extremely accurate,” says Alex Zhavoronkov, founder and CEO of the company. Zhavoronkov, not a native English speaker, also uses ChatGPT to help him improve his grammar when writing papers, and he stirred controversy recently by listing ChatGPT as a co-author of a journal article.

Scientists also find LLMs helpful for linking data and representing it in different ways. Exscientia, a pharmatech based in Oxford, UK has been experimenting with LLMs to translate ordinary English statements into carefully structured, mechanistic assertions that help generate their knowledge graphs, says Garry Pairaudeau, the company’s chief technology officer.

LLMs are still evolving, with developers adding features at a furious pace. The ChatGPT released in December was based on OpenAI’s GPT version 3.5. An update, GPT-4, was released mid-March and vastly outperforms its predecessor. In late March, ChatGPT added a so-called retrieval plug-in that could prove particularly useful to drug discovery. It is a module that allows the software to search personal or company documents, and Dan Neil, chief technology officer at BenevolentAI, an AI-driven biotech in London, is excited about that as a way to customize the chat function on the basis of the company’s own data. “If you had a specialized assay that you wrote up and described in internal company documents, you can say, ‘Hey, looking over these results that we’ve gotten internally, how does this update your thinking? Can you find or imagine other new approaches in life sciences that actually leverage this information that we found?’” he says.

Despite their name, language models need not be trained on English or other human languages. The same techniques of deriving statistical associations can be applied to the ‘language’ of DNA or protein sequences. Then, instead of a new sentence, they can generate new proteins that might make good drug targets. “It’s the same idea,” Beam says, “but we’re showing it biological data instead of text from the Internet.”

Some people worry that training AI systems to design molecules with a high likelihood of hitting their targets requires large volumes of data, hand-labeled by humans. And such collections are not always forthcoming because companies who regularly produce this information are not always keen to share it. But the same methods that allow ChatGPT to write sentences could potentially provide the perfect solution for new molecule design, Pan says. A language model supplied with abundant unlabeled data — such as the nearly 250 million protein sequences contained in the UniProt database — could derive the right relationships between molecular building blocks on its own.

Bioxcel Therapeutics, a company that uses AI to identify for repurposing drugs that were shelved in phase 2 or 3 trials, or even after approval, is considering LLMs to pick out potential winners from the different databases. But LLMs will only prove valuable, says Frank Yocca, a neuroscientist and the company’s CSO, if they fit into Bioxcel’s suite of AI tools. “Right now it’s not very accurate in terms of what you get back,” he cautions. “But we’re in the beginning stages of this.”

One way to ensure results are accurate and avoid AI hallucinations is what Neil calls ‘evidence surfacing’. When an LLM produces what it purports to be a fact, his company has added an algorithm to provide citations and references to back that up. Their system uses semantic search — a way of assessing the meaning of words — to extract sentences from papers and biology texts that support an assertion. The system selects a few relevant sentences from the millions of documents at its disposal and presents them to a human expert, who can then look at this small subset of data to judge whether the purported fact seems true.

Yocca says people can be seduced by the latest technology and lose sight of whether it really helps them achieve their goals. “You can get sort of consumed by just getting the machine to do what you wanted to do and not necessarily give you a functional answer,” he says. “We try to avoid that.”

Not everyone is hopping aboard the ChatGPT bandwagon. “Basically we already have all the tools to generate what we want and we are already exploring a lot of information, and we are not trying to expand more for now,” says Joao Magalhaes, head of immunology research at Enterome in Paris. For one thing, he worries that providing patient information to train the LLM might compromise privacy.

He’s not averse to adopting new AI techniques, though. For instance, the company uses AlphaFold, an AI system developed by DeepMind that looks at amino acid sequences and uses those to predict the three-dimensional structure of a protein, including many that had previously been unknown. “It was a huge improvement for us,” Magalhaes says. He will be keeping an eye on ChatGPT, and if it looks like it might be useful, the company will consider adopting it.

Beam points out that other types of generative AI, such as diffusion models that can create images out of random noise, have already made their way into biology. Because those models can create new images of protein structures, they “are arguably a more direct line to drug discovery and drug development,” Beam says.

If nothing else, he says, the rise of ChatGPT has created widespread awareness of the potential of generative AI and encouraged biotechs to take a closer look. “What ChatGPT has made everyone realize is the power of generative models,” Beam says.

Read More
Neil Savage

Latest

Oregon Sues Oklahoma Transfer Over Alleged Unpaid $10K NIL Contract Buyout

The University of Oregon says one of its former football players owes it $10,000, and the school is willing to go to court to get it. The school filed a lawsuit in Lane County Circuit Court last week against Dakoda Fields, a defensive back who spent two years with the Ducks before transferring to Oklahoma

Breaking Down Ole Miss’ Strengths, Weaknesses and One Thing It Needs to Beat LSU

The hottest location in college football this year brings LSU and Ole Miss together for a matchup that should be as close are expected. Both teams are rebuilt through the transfer portal and new coaching staffs, and this Sept. 19 matchup will be the first big test for either squad. So what gives Ole Miss

What are Indiana Football’s Biggest Trap Games of 2026?

Where will Indiana be ranked to start the 2026 college football season? While debate will rage regardless of the number next to Indiana's name to start the year, the Hoosiers will likely be favored in no fewer than 11 of their 12 regular season contests. That doesn't mean there won't be challenges along the way

Green steel startup Boston Metal is doubling down on critical metals

The startup Boston Metal has raised a $75 million funding round to produce critical metals, MIT Technology Review can exclusively report.   The company has been known largely for its efforts to clean up steel production, an industry that's responsible for about 8% of global greenhouse emissions today. With the additional money, the new focus could

Newsletter

Don't miss

Oregon Sues Oklahoma Transfer Over Alleged Unpaid $10K NIL Contract Buyout

The University of Oregon says one of its former football players owes it $10,000, and the school is willing to go to court to get it. The school filed a lawsuit in Lane County Circuit Court last week against Dakoda Fields, a defensive back who spent two years with the Ducks before transferring to Oklahoma

Breaking Down Ole Miss’ Strengths, Weaknesses and One Thing It Needs to Beat LSU

The hottest location in college football this year brings LSU and Ole Miss together for a matchup that should be as close are expected. Both teams are rebuilt through the transfer portal and new coaching staffs, and this Sept. 19 matchup will be the first big test for either squad. So what gives Ole Miss

What are Indiana Football’s Biggest Trap Games of 2026?

Where will Indiana be ranked to start the 2026 college football season? While debate will rage regardless of the number next to Indiana's name to start the year, the Hoosiers will likely be favored in no fewer than 11 of their 12 regular season contests. That doesn't mean there won't be challenges along the way

Green steel startup Boston Metal is doubling down on critical metals

The startup Boston Metal has raised a $75 million funding round to produce critical metals, MIT Technology Review can exclusively report.   The company has been known largely for its efforts to clean up steel production, an industry that's responsible for about 8% of global greenhouse emissions today. With the additional money, the new focus could

Embracer Follows Ubisoft In Splitting Off New Publisher To Handle Huge IP, Tomb Raider & LOTR Included

Say hello to Fellowship Entertainment by Ben Kerry 11 hours ago Embracer Group has today announced plans to create a secondary publishing label called Fellowship Entertainment, in order to "capture the full potential of the high-quality assets" that the group currently owns. The Swedish game publisher says that it hopes to spin off Fellowship Entertainment

Tesla’s Business Has Become Much More Diversified in Just the Past Five Years. Does That Make Its Stock a Better Buy Today?

Key Points Tesla's energy generation and storage segment generated 27% revenue growth last year. The company's non-automotive segments were able to help offset a double-digit decline in auto revenue in 2025. These 10 stocks could mint the next wave of millionaires › Tesla (NASDAQ: TSLA) is known for its electric vehicles (EVs), and while they

WD sees sustainability as key business driver in an ‘AI economy’

Hard drive company WD promoted long-term operations and sustainability executive Jackie Jung to become its first chief sustainability officer in February, as it steps up sales to companies building AI data centers. Her vision: Turn sustainability into a “brand” for WD, a strategy that reduces risk for the $6 billion company (formerly known as Western

5 Business Ideas Worth Starting in 2026

If there is one thing Nigerians understand well, it is how to spot opportunity inside hardship. In 2026, that mindset will matter more than ever. The economy is tough, competition is rising, and many people are looking for smarter ways to earn, build, and survive. But even in a difficult environment, some businesses still stand