Artificial intelligence has solved one of the greatest puzzles in biology, by predicting the shape of every protein expressed in the human body.
The research was carried out by London AI company DeepMind, which used its AlphaFold algorithm to build the most complete and accurate database yet of the human proteome, which underpins human health and disease.
Last week, DeepMind published the methods and code for its model, AlphaFold2 in Nature, showing it could predict the structures of known proteins with almost perfect accuracy.
It followed that with its second Nature paper in as many weeks, published on Thursday, showing that the model could confidently predict the structural position for almost 60 per cent of amino acids, the building blocks of protein, within the human body, as well as in a host of other organisms such as the fruit fly, the mouse and E.coli bacteria.
The structural position for only about 30 per cent of amino acids was previously known. Understanding the position of amino acids allows researchers to predict the three-dimensional structure of a protein.
The set of 350,000 protein structure predictions is now available via a public database hosted by the European Bioinformatics Institute at the European Molecular Biology Laboratory (EMBL-EBI).
“Accurately predicting their structures has a huge range of scientific applications from developing new drugs and treatments for disease, right through to designing future crops that can withstand climate change, or enzymes that can degrade plastics,” said Edith Heard, director-general of the EMBL. “The applications are limited only by our imaginations.”
Protein structures matter because they dictate how proteins do their jobs. Knowing a protein’s shape — say a Y-shaped antibody — tells scientists more about what that protein’s role is.
Misshapen proteins can cause diseases such as Alzheimer’s, Parkinson’s and cystic fibrosis. Being able to easily predict a protein’s shape could allow scientists to control and modify it, so they can improve its function by changing its DNA sequence, or target drugs that could attach to it.
Accurate prediction of a protein’s structure from its DNA sequence has been one of biology’s grandest challenges. Current experimental methods to determine the shape of a single protein take months or years in a laboratory, which is why only about 180,000 protein structures have been solved, of the more than 200m known proteins in living things.
“We believe that this will represent the most significant contribution AI has made to advancing the state of scientific knowledge to date,” said DeepMind’s chief executive Demis Hassabis. “Our ambitions are to expand [the database] in coming months to the entire protein universe of over 200m proteins.”
Scientists who have not been involved with DeepMind’s research used phrases such as “spine-tingling” and “transformative” to describe the impact of the advance, likening the data set to the human genome.
“It was one of those moments when my hair stood up on the back of my neck,” said John McGeehan, director of the Centre for Enzyme Innovation at the University of Portsmouth, and a structural biologist who has been testing out the AlphaFold algorithm over the past few months.
“We are able to use that information directly to develop faster enzymes for breaking down plastics. Those experiments are under way immediately, so the acceleration to that project here is multiple years.”
AlphaFold is not without limitations. Proteins are dynamic molecules that constantly change shape depending on what they bind to, but DeepMind’s algorithm can predict only a protein’s static structure, said Minkyung Baek, a researcher at the University of Washington’s Institute for Protein Design.
However, its biggest contribution to scientists was the fact that it was open-sourced, she said. “Last year they showed [this] is all possible but did not provide any code, so people knew it was there, but could not use it.”
In the seven months after DeepMind’s announcement Baek and her colleagues used DeepMind’s idea to build their own open-sourced version of the algorithm that they called RosettaFold, and was published in the journal Science last week. “I’m really glad they have made it all publicly available, that is a huge contribution to biological research and also for commercial pharma,” she said. “Now more people can benefit from their method [and] it advances the field much more quickly.”