https://www.nature.com/articles/d41586-022-03539-1 Skip to main content Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Advertisement Advertisement Nature * View all journals * Search * My Account Login * Explore content * About the journal * Publish with us * Subscribe * Sign up for alerts * RSS feed 1. nature 2. news 3. article * NEWS * 01 November 2022 AlphaFold's new rival? Meta AI predicts shape of 600 million proteins Microbial molecules from soil, seawater and human bodies are among the planet's least understood proteins. * Ewen Callaway 1. Ewen Callaway View author publications You can also search for this author in PubMed Google Scholar * Twitter * Facebook * Email You have full access to this article via your institution. Download PDF Exploring 1 million out of 617M proteins on the ESM Metagenomic Atlas website. The ESM Metagenomic Atlas database contains structure predictions for 617 million proteins.Credit: ESM Metagenomic Atlas (CC BY 4.0) When London-based Deep Mind unveiled predicted structures for some 220 million proteins this year, it covered nearly every protein from known organisms in DNA databases. Now, another tech giant is filling in the dark matter of our protein universe. Researchers at Meta (formerly Facebook, headquartered in Menlo Park, California) have used artificial intelligence (AI) to predict the structures of some 600 million proteins from bacteria, viruses and other microbes that haven't been characterized. [d41586-022] 'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures "These are the structures we know the least about. These are incredibly mysterious proteins. I think they offer the potential for great insight into biology," says Alexander Rives, the research lead for Meta AI's protein team. The team generated the predictions -- described in a 1 November preprint^1 -- using a 'large language model', a type of AI that are the basis for tools that can predict text from just a few letters or words. Normally language models are trained on large volumes of text. To apply them to proteins, Rives and his colleagues fed them sequences to known proteins, which can be expressed by a chains of 20 different amino acids, each represented by a letter. The network then learned to 'autocomplete' proteins with a proportion of amino acids obscured. Protein 'autocomplete' This training imbued the network with an intuitive understanding of protein sequences, which hold information about their shapes, says Rives. A second step -- inspired by DeepMind's pioneering protein structure AI AlphaFold -- combines such insights with information about the relationships between known protein structures and sequences, to generate predicted structures from protein sequences. Meta's network, called ESMFold, isn't quite as accurate as AlphaFold, Rives' team reported earlier this summer^2, but it is about 60 times faster at predicting structures, he says. "What this means is that we can scale structure prediction to much larger databases." As a test case, they decided to wield their model on a database of bulk-sequenced 'metagenomic' DNA from environmental sources including soil, seawater, the human gut, skin and other microbial habitats. The vast majority of the DNA entries -- which encode potential proteins -- come from organisms that have never been cultured and are unknown to science. In total, the Meta team predicted the structures of more than 617 million proteins. The effort took just 2 weeks (AlphaFold can take minutes to generate a single prediction). The predictions are freely available for anyone to use, as is the code underlying the model, says Rives. [d41586-022] What's next for AlphaFold and the AI protein-folding revolution Of these 617 million predictions, the model deemed more than one-third to be high quality, such that researchers can have confidence that the overall protein shape is correct and, in some cases, can discern finer atomic-level details. Millions of these structures are entirely novel, and unlike anything in databases of protein structures determined experimentally or in the AlphaFold database of predictions from known organisms. A good chunk of the AlphaFold database is made of structures that are nearly identical to each other, and 'metagenomic' databases "should cover a large part of the previously unseen protein universe", says Martin Steinegger, a computational biologist at Seoul National University. "There's a big opportunity now to unravel more of the darkness." Sergey Ovchinnikov, an evolutionary biologist at Harvard University in Cambridge, Massachusetts, wonders about the hundreds of millions of predictions that ESMFold made with low-confidence. Some might lack a defined structure, at least in isolation, whereas others might be non-coding DNA mistaken as a protein-coding material. "It seems there is still more than half of protein space we know nothing about," he says. Leaner, simpler, cheaper Burkhard Rost, a computational biologist at the Technical University of Munich in Germany, is impressed with the combination of speed and accuracy of Meta's model. But he questions whether it really offers an advantage over AlphaFold's precision, when it comes to predicting proteins from metagenomic databases. Language model-based prediction methods -- including one developed by his team^3 -- are better suited to quickly determine how mutations alter protein structure, which is not possible with AlphaFold. "We will see structure prediction become leaner, simpler cheaper and that will open the door for new things," he says. DeepMind doesn't currently have plans to include metagenomic structure predictions in its database, but hasn't ruled this out for future releases, according to a company representative. But Steinegger and his collaborators have used a version of AlphaFold to predict the structures of some 30 million metagenomic proteins. They are hoping to find new kinds of RNA viruses by looking for novel forms of their genome-copying enzymes. Steinegger sees trawling biology's dark matter as obvious next step for such tools. "I do think we will quite soon have an explosion in the analysis of these metagenomic structures." doi: https://doi.org/10.1038/d41586-022-03539-1 References 1. Lin, Z. et al. Preprint at BioRxiv https://www.biorxiv.org/ content/10.1101/2022.07.20.500902v2 (2022). 2. Lin, Z. et al. Preprint at BioRxiv https://www.biorxiv.org/ content/10.1101/2022.07.20.500902v1 (2022). 3. Weissenow, K., Heinzinger, M. & Rost, B. Structure 30, 1169-1137 (2022). Article PubMed Google Scholar Download references Related Articles * [d41586-022] What's next for AlphaFold and the AI protein-folding revolution * [d41586-022] 'The entire protein universe': AI predicts shape of nearly every known protein * [d41586-022] 'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures * Scientists are using AI to dream up revolutionary new proteins Subjects * Computational biology and bioinformatics * Proteomics * Structural biology Latest on: Computational biology and bioinformatics Broad transcriptomic dysregulation occurs across the cerebral cortex in ASD Broad transcriptomic dysregulation occurs across the cerebral cortex in ASD Article 02 NOV 22 Could an algorithm predict the next pandemic? Could an algorithm predict the next pandemic? Outlook 26 OCT 22 Functional antibodies exhibit light chain coherence Functional antibodies exhibit light chain coherence Article 26 OCT 22 Proteomics Using proteins to test for COVID antibodies Using proteins to test for COVID antibodies Spotlight 05 OCT 22 Scientists are using AI to dream up revolutionary new proteins Scientists are using AI to dream up revolutionary new proteins News 15 SEP 22 Four rising stars who are reshaping nanoscience Four rising stars who are reshaping nanoscience Nature Index 10 AUG 22 Structural biology Cryo-EM structure of the SEA complex Cryo-EM structure of the SEA complex Article 26 OCT 22 Catching actin proteins in action Catching actin proteins in action News & Views 26 OCT 22 Bestrophin-2 and glutamine synthetase form a complex for glutamate release Bestrophin-2 and glutamine synthetase form a complex for glutamate release Article 26 OCT 22 Nature Careers Jobs * Faculty Positions Announcement: 2022-2023 University of Michigan (U-M) Ann Arbor, MI, United States * Research Co-Leader, Cancer Biology AMN Healthcare | Merritt Hawkins Milwaukee, WI, United States * Research Associate University of Wisconsin-Madison (UW-Madison) Madison, WI, United States * Research Professor Centre de recherche du CHU de Quebec - Universite Laval Quebec City, Quebec, Canada You have full access to this article via your institution. Download PDF Related Articles * [d41586-022] What's next for AlphaFold and the AI protein-folding revolution * [d41586-022] 'The entire protein universe': AI predicts shape of nearly every known protein * [d41586-022] 'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures * Scientists are using AI to dream up revolutionary new proteins Subjects * Computational biology and bioinformatics * Proteomics * Structural biology Advertisement Sign up to Nature Briefing An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday. Email address [ ] [ ] Yes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy. Sign up * Close Nature Briefing Sign up for the Nature Briefing newsletter -- what matters in science, free to your inbox daily. Email address [ ] Sign up [ ] I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy. Close Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing Explore content * Research articles * News * Opinion * Research Analysis * Careers * Books & Culture * Podcasts * Videos * Current issue * Browse issues * Collections * Subjects * Follow us on Facebook * Follow us on Twitter * Subscribe * Sign up for alerts * RSS feed About the journal * Journal Staff * About the Editors * Journal Information * Our publishing models * Editorial Values Statement * Journal Metrics * Awards * Contact * Editorial policies * History of Nature * Send a news tip Publish with us * For Authors * For Referees * Language editing services * Submit manuscript Search Search articles by subject, keyword or author [ ] Show results from [All journals] Search Advanced search Quick links * Explore articles by subject * Find a job * Guide to authors * Editorial policies Nature (Nature) ISSN 1476-4687 (online) ISSN 0028-0836 (print) nature.com sitemap Nature portfolio * About us * Press releases * Press office * Contact us * * * Discover content * Journals A-Z * Articles by subject * Nano * Protocol Exchange * Nature Index Publishing policies * Nature portfolio policies * Open access Author & Researcher services * Reprints & permissions * Research data * Language editing * Scientific editing * Nature Masterclasses * Nature Research Academies * Research Solutions Libraries & institutions * Librarian service & tools * Librarian portal * Open research * Recommend to library Advertising & partnerships * Advertising * Partnerships & Services * Media kits * Branded content Career development * Nature Careers * Nature Conferences * Nature events Regional websites * Nature Africa * Nature China * Nature India * Nature Italy * Nature Japan * Nature Korea * Nature Middle East Legal & Privacy * Privacy Policy * Use of cookies * Manage cookies/Do not sell my data * Legal notice * Accessibility statement * Terms & Conditions * California Privacy Statement Springer Nature (c) 2022 Springer Nature Limited