IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010271.html
   My bibliography  Save this article

Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation

Author

Listed:
  • Raphael R Eguchi
  • Christian A Choe
  • Po-Ssu Huang

Abstract

While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used with Rosetta to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model.Author summary: Many essential biochemical processes are governed by protein-protein interactions (PPIs), and our ability to make binding proteins that modulate PPIs is crucial to the creation of therapeutics and the study of cell-signaling. One critical aspect of PPI design is to capture protein conformational flexibility. Deep generative models are a class of mathematical models that are able to synthesize novel data from a finite set of training examples. Here, we make advances in computational protein design methodology by developing a deep generative model that creates protein backbones adopting the immunoglobulin fold, which is found in natural binding proteins such as antibodies. While generative models have been powerful in tasks such as image generation, using them to create proteins has remained a challenge. We solve this problem with a new model that allows for the direct generation of novel 3D molecules and show that they are of high chemical accuracy. Generated structures work well with existing protein design methods such as Rosetta, providing access to a large collection of novel immunoglobulin structures. Finally, we present a new protein design framework, called “generative design,” that shows how deep generative models such as ours can be applied to virtually any protein design problem.

Suggested Citation

  • Raphael R Eguchi & Christian A Choe & Po-Ssu Huang, 2022. "Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation," PLOS Computational Biology, Public Library of Science, vol. 18(6), pages 1-18, June.
  • Handle: RePEc:plo:pcbi00:1010271
    DOI: 10.1371/journal.pcbi.1010271
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010271
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010271&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010271?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andrew W. Senior & Richard Evans & John Jumper & James Kirkpatrick & Laurent Sifre & Tim Green & Chongli Qin & Augustin Žídek & Alexander W. R. Nelson & Alex Bridgland & Hugo Penedones & Stig Petersen, 2020. "Improved protein structure prediction using potentials from deep learning," Nature, Nature, vol. 577(7792), pages 706-710, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lauren L. Porter & Allen K. Kim & Swechha Rimal & Loren L. Looger & Ananya Majumdar & Brett D. Mensh & Mary R. Starich & Marie-Paule Strub, 2022. "Many dissimilar NusG protein domains switch between α-helix and β-sheet folds," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    2. Zachary C. Drake & Justin T. Seffernick & Steffen Lindert, 2022. "Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    3. Krzysztof Rusek & Agnieszka Kleszcz & Albert Cabellos-Aparicio, 2022. "Bayesian inference of spatial and temporal relations in AI patents for EU countries," Papers 2201.07168, arXiv.org.
    4. Niklas W. A. Gebauer & Michael Gastegger & Stefaan S. P. Hessmann & Klaus-Robert Müller & Kristof T. Schütt, 2022. "Inverse design of 3d molecular structures with conditional generative neural networks," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    5. Felix Lorenz & Jonas Willwersch & Marcelo Cajias & Franz Fuerst, 2023. "Interpretable machine learning for real estate market analysis," Real Estate Economics, American Real Estate and Urban Economics Association, vol. 51(5), pages 1178-1208, September.
    6. Hajkowicz, Stefan & Naughtin, Claire & Sanderson, Conrad & Schleiger, Emma & Karimi, Sarvnaz & Bratanova, Alexandra & Bednarz, Tomasz, 2022. "Artificial intelligence for science – adoption trends and future development pathways," MPRA Paper 115464, University Library of Munich, Germany.
    7. Agnese I. Curatolo & Ofer Kimchi & Carl P. Goodrich & Ryan K. Krueger & Michael P. Brenner, 2023. "A computational toolbox for the assembly yield of complex and heterogeneous structures," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    8. Noelia Ferruz & Steffen Schmidt & Birte Höcker, 2022. "ProtGPT2 is a deep unsupervised language model for protein design," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    9. Pengcheng Zhang & Qixiu Du & Ye Wang & Lei Wei & Xiaowo Wang, 2025. "Systematic representation and optimization enable the inverse design of cross-species regulatory sequences in bacteria," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    10. Huiyu Li & Ao Ma, 2025. "Enhanced sampling of protein conformational changes via true reaction coordinates from energy relaxation," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    11. Simone Vannuccini & Ekaterina Prytkova, 2021. "Artificial Intelligence’s New Clothes? From General Purpose Technology to Large Technical System," SPRU Working Paper Series 2021-02, SPRU - Science Policy Research Unit, University of Sussex Business School.
    12. Aaron Gupta & Kevin S. Kao & Rachel Yamin & Deena A. Oren & Yehuda Goldgur & Jonathan Du & Pete Lollar & Eric J. Sundberg & Jeffrey V. Ravetch, 2023. "Mechanism of glycoform specificity and in vivo protection by an anti-afucosylated IgG nanobody," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    13. Md Tauhidul Islam & Zixia Zhou & Hongyi Ren & Masoud Badiei Khuzani & Daniel Kapp & James Zou & Lu Tian & Joseph C. Liao & Lei Xing, 2023. "Revealing hidden patterns in deep neural network feature space continuum via manifold learning," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    14. Naughtin, Claire & Hajkowicz, Stefan & Schleiger, Emma & Bratanova, Alexandra & Cameron, Alicia & Zamin, T & Dutta, A, 2022. "Our Future World: Global megatrends impacting the way we live over coming decades," MPRA Paper 113900, University Library of Munich, Germany.
    15. Lei Wang & Jiangguo Zhang & Dali Wang & Chen Song, 2022. "Membrane contact probability: An essential and predictive character for the structural and functional studies of membrane proteins," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-27, March.
    16. Gustavo Arango-Argoty & Elly Kipkogei & Ross Stewart & Gerald J. Sun & Arijit Patra & Ioannis Kagiampakis & Etai Jacob, 2025. "Pretrained transformers applied to clinical studies improve predictions of treatment efficacy and associated biomarkers," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    17. Lu Liu & Benjamin F. Jones & Brian Uzzi & Dashun Wang, 2023. "Data, measurement and empirical methods in the science of science," Nature Human Behaviour, Nature, vol. 7(7), pages 1046-1058, July.
    18. Zhiye Guo & Jian Liu & Jeffrey Skolnick & Jianlin Cheng, 2022. "Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    19. Nicolas Renaud & Cunliang Geng & Sonja Georgievska & Francesco Ambrosetti & Lars Ridder & Dario F. Marzella & Manon F. Réau & Alexandre M. J. J. Bonvin & Li C. Xue, 2021. "DeepRank: a deep learning framework for data mining 3D protein-protein interfaces," Nature Communications, Nature, vol. 12(1), pages 1-8, December.
    20. Willow Coyote-Maestas & David Nedrud & Antonio Suma & Yungui He & Kenneth A. Matreyek & Douglas M. Fowler & Vincenzo Carnevale & Chad L. Myers & Daniel Schmidt, 2021. "Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling," Nature Communications, Nature, vol. 12(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010271. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.