IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-50712-3.html
   My bibliography  Save this article

Neural network extrapolation to distant regions of the protein fitness landscape

Author

Listed:
  • Chase R. Freschlin

    (University of Wisconsin–Madison)

  • Sarah A. Fahlberg

    (University of Wisconsin–Madison)

  • Pete Heinzelman

    (University of Wisconsin–Madison)

  • Philip A. Romero

    (University of Wisconsin–Madison
    University of Wisconsin–Madison)

Abstract

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks’ capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models’ extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. We also find that implementing a simple ensemble of convolutional neural networks enables robust design of high-performing variants in the local landscape. Our findings highlight how each architecture’s inductive biases prime them to learn different aspects of the protein fitness landscape and how a simple ensembling approach makes protein engineering more robust.

Suggested Citation

  • Chase R. Freschlin & Sarah A. Fahlberg & Pete Heinzelman & Philip A. Romero, 2024. "Neural network extrapolation to distant regions of the protein fitness landscape," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-50712-3
    DOI: 10.1038/s41467-024-50712-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-50712-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-50712-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John Jumper & Richard Evans & Alexander Pritzel & Tim Green & Michael Figurnov & Olaf Ronneberger & Kathryn Tunyasuvunakool & Russ Bates & Augustin Žídek & Anna Potapenko & Alex Bridgland & Clemens Me, 2021. "Highly accurate protein structure prediction with AlphaFold," Nature, Nature, vol. 596(7873), pages 583-589, August.
    2. Kathryn Tunyasuvunakool & Jonas Adler & Zachary Wu & Tim Green & Michal Zielinski & Augustin Žídek & Alex Bridgland & Andrew Cowie & Clemens Meyer & Agata Laydon & Sameer Velankar & Gerard J. Kleywegt, 2021. "Highly accurate protein structure prediction for the human proteome," Nature, Nature, vol. 596(7873), pages 590-596, August.
    3. Jonathan C. Greenhalgh & Sarah A. Fahlberg & Brian F. Pfleger & Philip A. Romero, 2021. "Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    4. Alex Hawkins-Hooker & Florence Depardieu & Sebastien Baur & Guillaume Couairon & Arthur Chen & David Bikard, 2021. "Generating functional protein variants with variational autoencoders," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-23, February.
    5. Joseph L. Watson & David Juergens & Nathaniel R. Bennett & Brian L. Trippe & Jason Yim & Helen E. Eisenach & Woody Ahern & Andrew J. Borst & Robert J. Ragotte & Lukas F. Milles & Basile I. M. Wicky & , 2023. "De novo design of protein structure and function with RFdiffusion," Nature, Nature, vol. 620(7976), pages 1089-1100, August.
    6. Tijana Radivojević & Zak Costello & Kenneth Workman & Hector Garcia Martin, 2020. "A machine learning Automated Recommendation Tool for synthetic biology," Nature Communications, Nature, vol. 11(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Simon d’Oelsnitz & Daniel J. Diaz & Wantae Kim & Daniel J. Acosta & Tyler L. Dangerfield & Mason W. Schechter & Matthew B. Minus & James R. Howard & Hannah Do & James M. Loy & Hal S. Alper & Y. Jessie, 2024. "Biosensor and machine learning-aided engineering of an amaryllidaceae enzyme," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    2. Aika Iwama & Ryoji Kise & Hiroaki Akasaka & Fumiya K. Sano & Hidetaka S. Oshima & Asuka Inoue & Wataru Shihoya & Osamu Nureki, 2024. "Structure and dynamics of the pyroglutamylated RF-amide peptide QRFP receptor GPR103," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    3. Laura Shub & Wenjin Liu & Georgios Skiniotis & Michael J. Keiser & Michael J. Robertson, 2025. "MIC: A deep learning tool for assigning ions and waters in cryo-EM and crystal structures," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    4. Lucien F. Krapp & Fernando A. Meireles & Luciano A. Abriata & Jean Devillard & Sarah Vacle & Maria J. Marcaida & Matteo Dal Peraro, 2024. "Context-aware geometric deep learning for protein sequence design," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    5. Arne Matthys & Jan Felix & Joao Paulo Portela Catani & Kenny Roose & Wim Nerinckx & Benthe Buyten & Daria Fijalkowska & Nico Callewaert & Savvas N. Savvides & Xavier Saelens, 2025. "Single-domain antibodies directed against hemagglutinin and neuraminidase protect against influenza B viruses," Nature Communications, Nature, vol. 16(1), pages 1-19, December.
    6. Daniel R. Fox & Kazem Asadollahi & Imogen Samuels & Bradley A. Spicer & Ashleigh Kropp & Christopher J. Lupton & Kevin Lim & Chunxiao Wang & Hari Venugopal & Marija Dramicanin & Gavin J. Knott & Rhys , 2025. "Inhibiting heme piracy by pathogenic Escherichia coli using de novo-designed proteins," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    7. Yash Chainani & Jacob Diaz & Margaret Guilarte-Silva & Vincent Blay & Quan Zhang & William Sprague & Keith E. J. Tyo & Linda J. Broadbelt & Aindrila Mukhopadhyay & Jay D. Keasling & Hector Garcia Mart, 2025. "Merging the computational design of chimeric type I polyketide synthases with enzymatic pathways for chemical biosynthesis," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    8. Wei Lu & Jixian Zhang & Weifeng Huang & Ziqiao Zhang & Xiangyu Jia & Zhenyu Wang & Leilei Shi & Chengtao Li & Peter G. Wolynes & Shuangjia Zheng, 2024. "DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    9. Meghana Kshirsagar & Artur Meller & Ian R. Humphreys & Samuel Sledzieski & Yixi Xu & Rahul Dodhia & Eric Horvitz & Bonnie Berger & Gregory R. Bowman & Juan Lavista Ferres & David Baker & Minkyung Baek, 2025. "Rapid and accurate prediction of protein homo-oligomer symmetry using Seq2Symm," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    10. Isak S. Pretorius & Thomas A. Dixon & Michael Boers & Ian T. Paulsen & Daniel L. Johnson, 2025. "The coming wave of confluent biosynthetic, bioinformational and bioengineering technologies," Nature Communications, Nature, vol. 16(1), pages 1-8, December.
    11. Enrico Orsi & Lennart Schada von Borzyskowski & Stephan Noack & Pablo I. Nikel & Steffen N. Lindner, 2024. "Automated in vivo enzyme engineering accelerates biocatalyst optimization," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    12. Cheyenne Ziegler & Jonathan Martin & Claude Sinner & Faruck Morcos, 2023. "Latent generative landscapes as maps of functional diversity in protein sequence space," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    13. Erika Erickson & Japheth E. Gado & Luisana Avilán & Felicia Bratti & Richard K. Brizendine & Paul A. Cox & Raj Gill & Rosie Graham & Dong-Jin Kim & Gerhard König & William E. Michener & Saroj Poudel &, 2022. "Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    14. Pantelis Livanos & Choy Kriechbaum & Sophia Remers & Arvid Herrmann & Sabine Müller, 2025. "Kinesin-12 POK2 polarization is a prerequisite for a fully functional division site and aids cell plate positioning," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    15. Surabhi Kokane & Ashutosh Gulati & Pascal F. Meier & Rei Matsuoka & Tanadet Pipatpolkai & Giuseppe Albano & Tin Manh Ho & Lucie Delemotte & Daniel Fuster & David Drew, 2025. "PIP2-mediated oligomerization of the endosomal sodium/proton exchanger NHE9," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    16. Justin Riper & Arleth O. Martinez-Claros & Lie Wang & Hannah E. Schneiderman & Sweta Maheshwari & Monica C. Pillon, 2025. "CryoEM structure of the SLFN14 endoribonuclease reveals insight into RNA binding and cleavage," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    17. Pierre Azoulay & Joshua Krieger & Abhishek Nagaraj, 2024. "Old Moats for New Models: Openness, Control, and Competition in Generative Artificial Intelligence," NBER Chapters, in: Entrepreneurship and Innovation Policy and the Economy, volume 4, pages 7-46, National Bureau of Economic Research, Inc.
    18. Xin Yong & Guowen Jia & Qin Yang & Chunzhuang Zhou & Sitao Zhang & Huaqing Deng & Daniel D. Billadeau & Zhaoming Su & Da Jia, 2025. "Cryo-EM structure of the BLOC-3 complex provides insights into the pathogenesis of Hermansky-Pudlak syndrome," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    19. Jun-Yu Si & Yuan-Mei Chen & Ye-Hui Sun & Meng-Xue Gu & Mei-Ling Huang & Lu-Lu Shi & Xiao Yu & Xiao Yang & Qing Xiong & Cheng-Bao Ma & Peng Liu & Zheng-Li Shi & Huan Yan, 2024. "Sarbecovirus RBD indels and specific residues dictating multi-species ACE2 adaptiveness," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    20. Deyun Qiu & Jinxin V. Pei & James E. O. Rosling & Vandana Thathy & Dongdi Li & Yi Xue & John D. Tanner & Jocelyn Sietsma Penington & Yi Tong Vincent Aw & Jessica Yi Han Aw & Guoyue Xu & Abhai K. Tripa, 2022. "A G358S mutation in the Plasmodium falciparum Na+ pump PfATP4 confers clinically-relevant resistance to cipargamin," Nature Communications, Nature, vol. 13(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-50712-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.