IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-37896-w.html
   My bibliography  Save this article

Sequence-structure-function relationships in the microbial protein universe

Author

Listed:
  • Julia Koehler Leman

    (Flatiron Institute, Simons Foundation
    New York University)

  • Pawel Szczerbiak

    (Jagiellonian University)

  • P. Douglas Renfrew

    (Flatiron Institute, Simons Foundation
    New York University)

  • Vladimir Gligorijevic

    (Flatiron Institute, Simons Foundation
    Prescient Design, a Genentech accelerator)

  • Daniel Berenberg

    (Flatiron Institute, Simons Foundation
    Prescient Design, a Genentech accelerator
    New York University
    New York University)

  • Tommi Vatanen

    (Broad Institute
    University of Auckland
    Faculty of Medicine, 00014 University of Helsinki)

  • Bryn C. Taylor

    (University of California San Diego
    Janssen Research and Development)

  • Chris Chandler

    (Flatiron Institute, Simons Foundation)

  • Stefan Janssen

    (University of California, San Diego
    Justus Liebig University Giessen)

  • Andras Pataki

    (Flatiron Institute, Simons Foundation)

  • Nick Carriero

    (Flatiron Institute, Simons Foundation)

  • Ian Fisk

    (Flatiron Institute, Simons Foundation)

  • Ramnik J. Xavier

    (Broad Institute
    MIT)

  • Rob Knight

    (University of California San Diego
    University of California, San Diego
    University of California San Diego
    University of California)

  • Richard Bonneau

    (Flatiron Institute, Simons Foundation
    New York University
    New York University
    New York University)

  • Tomasz Kosciolek

    (Jagiellonian University)

Abstract

For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don’t rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.

Suggested Citation

  • Julia Koehler Leman & Pawel Szczerbiak & P. Douglas Renfrew & Vladimir Gligorijevic & Daniel Berenberg & Tommi Vatanen & Bryn C. Taylor & Chris Chandler & Stefan Janssen & Andras Pataki & Nick Carrier, 2023. "Sequence-structure-function relationships in the microbial protein universe," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-37896-w
    DOI: 10.1038/s41467-023-37896-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-37896-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-37896-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kathryn Tunyasuvunakool & Jonas Adler & Zachary Wu & Tim Green & Michal Zielinski & Augustin Žídek & Alex Bridgland & Andrew Cowie & Clemens Meyer & Agata Laydon & Sameer Velankar & Gerard J. Kleywegt, 2021. "Highly accurate protein structure prediction for the human proteome," Nature, Nature, vol. 596(7873), pages 590-596, August.
    2. John Jumper & Richard Evans & Alexander Pritzel & Tim Green & Michael Figurnov & Olaf Ronneberger & Kathryn Tunyasuvunakool & Russ Bates & Augustin Žídek & Anna Potapenko & Alex Bridgland & Clemens Me, 2021. "Highly accurate protein structure prediction with AlphaFold," Nature, Nature, vol. 596(7873), pages 583-589, August.
    3. Vladimir Gligorijević & P. Douglas Renfrew & Tomasz Kosciolek & Julia Koehler Leman & Daniel Berenberg & Tommi Vatanen & Chris Chandler & Bryn C. Taylor & Ian M. Fisk & Hera Vlamakis & Ramnik J. Xavie, 2021. "Structure-based protein function prediction using graph convolutional networks," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    4. Joe G. Greener & Shaun M. Kandathil & David T. Jones, 2019. "Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints," Nature Communications, Nature, vol. 10(1), pages 1-13, December.
    5. Martin Steinegger & Johannes Söding, 2018. "Clustering huge protein sequence sets in linear time," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    6. Yibei Xiao & Sherwin Ng & Ki Hyun Nam & Ailong Ke, 2017. "How type II CRISPR–Cas establish immunity through Cas1–Cas2-mediated spacer integration," Nature, Nature, vol. 550(7674), pages 137-141, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jeffrey A. Ruffolo & Lee-Shin Chu & Sai Pooja Mahajan & Jeffrey J. Gray, 2023. "Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    2. Ivan Koludarov & Tobias Senoner & Timothy N. W. Jackson & Daniel Dashevsky & Michael Heinzinger & Steven D. Aird & Burkhard Rost, 2023. "Domain loss enabled evolution of novel functions in the snake three-finger toxin gene superfamily," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. Junhui Peng & Li Zhao, 2024. "The origin and structural evolution of de novo genes in Drosophila," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    4. Ziqi Gao & Chenran Jiang & Jiawen Zhang & Xiaosen Jiang & Lanqing Li & Peilin Zhao & Huanming Yang & Yong Huang & Jia Li, 2023. "Hierarchical graph learning for protein–protein interaction," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    5. David Moi & Shunsuke Nishio & Xiaohui Li & Clari Valansi & Mauricio Langleib & Nicolas G. Brukman & Kateryna Flyak & Christophe Dessimoz & Daniele de Sanctis & Kathryn Tunyasuvunakool & John Jumper & , 2022. "Discovery of archaeal fusexins homologous to eukaryotic HAP2/GCS1 gamete fusion proteins," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    6. Marco Malatesta & Emanuele Fornasier & Martino Luigi Salvo & Angela Tramonti & Erika Zangelmi & Alessio Peracchi & Andrea Secchi & Eugenia Polverini & Gabriele Giachin & Roberto Battistutta & Roberto , 2024. "One substrate many enzymes virtual screening uncovers missing genes of carnitine biosynthesis in human and mouse," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    7. Deyun Qiu & Jinxin V. Pei & James E. O. Rosling & Vandana Thathy & Dongdi Li & Yi Xue & John D. Tanner & Jocelyn Sietsma Penington & Yi Tong Vincent Aw & Jessica Yi Han Aw & Guoyue Xu & Abhai K. Tripa, 2022. "A G358S mutation in the Plasmodium falciparum Na+ pump PfATP4 confers clinically-relevant resistance to cipargamin," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    8. Shuo-Shuo Liu & Tian-Xia Jiang & Fan Bu & Ji-Lan Zhao & Guang-Fei Wang & Guo-Heng Yang & Jie-Yan Kong & Yun-Fan Qie & Pei Wen & Li-Bin Fan & Ning-Ning Li & Ning Gao & Xiao-Bo Qiu, 2024. "Molecular mechanisms underlying the BIRC6-mediated regulation of apoptosis and autophagy," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    9. Xiaoke Yang & Mingqi Zhu & Xue Lu & Yuxin Wang & Junyu Xiao, 2024. "Architecture and activation of human muscle phosphorylase kinase," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    10. Kristy Rochon & Brianna L. Bauer & Nathaniel A. Roethler & Yuli Buckley & Chih-Chia Su & Wei Huang & Rajesh Ramachandran & Maria S. K. Stoll & Edward W. Yu & Derek J. Taylor & Jason A. Mears, 2024. "Structural basis for regulated assembly of the mitochondrial fission GTPase Drp1," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    11. Fan Lu & Liang Zhu & Thomas Bromberger & Jun Yang & Qiannan Yang & Jianmin Liu & Edward F. Plow & Markus Moser & Jun Qin, 2022. "Mechanism of integrin activation by talin and its cooperation with kindlin," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    12. Martin F. Peter & Christian Gebhardt & Rebecca Mächtel & Gabriel G. Moya Muñoz & Janin Glaenzer & Alessandra Narducci & Gavin H. Thomas & Thorben Cordes & Gregor Hagelueken, 2022. "Cross-validation of distance measurements in proteins by PELDOR/DEER and single-molecule FRET," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    13. Jutta Diessl & Jens Berndtsson & Filomena Broeskamp & Lukas Habernig & Verena Kohler & Carmela Vazquez-Calvo & Arpita Nandy & Carlotta Peselj & Sofia Drobysheva & Ludovic Pelosi & F.-Nora Vögtle & Fab, 2022. "Manganese-driven CoQ deficiency," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    14. Alexander Kroll & Sahasra Ranjan & Martin K. M. Engqvist & Martin J. Lercher, 2023. "A general model to predict small molecule substrates of enzymes based on machine and deep learning," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    15. Lisa-Marie Appel & Vedran Franke & Johannes Benedum & Irina Grishkovskaya & Xué Strobl & Anton Polyansky & Gregor Ammann & Sebastian Platzer & Andrea Neudolt & Anna Wunder & Lena Walch & Stefanie Kais, 2023. "The SPOC domain is a phosphoserine binding module that bridges transcription machinery with co- and post-transcriptional regulators," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    16. Maciej K. Kocylowski & Hande Aypek & Wolfgang Bildl & Martin Helmstädter & Philipp Trachte & Bernhard Dumoulin & Sina Wittösch & Lukas Kühne & Ute Aukschun & Carolin Teetzen & Oliver Kretz & Botond Ga, 2022. "A slit-diaphragm-associated protein network for dynamic control of renal filtration," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    17. Peicong Lin & Yumeng Yan & Huanyu Tao & Sheng-You Huang, 2023. "Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    18. Michael A. Longo & Sunetra Roy & Yue Chen & Karl-Heinz Tomaszowski & Andrew S. Arvai & Jordan T. Pepper & Rebecca A. Boisvert & Selvi Kunnimalaiyaan & Caezanne Keshvani & David Schild & Albino Bacolla, 2023. "RAD51C-XRCC3 structure and cancer patient mutations define DNA replication roles," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    19. Zachary C. Drake & Justin T. Seffernick & Steffen Lindert, 2022. "Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    20. Leonardo Betancurt-Anzola & Markel Martínez-Carranza & Marc Delarue & Kelly M. Zatopek & Andrew F. Gardner & Ludovic Sauguet, 2023. "Molecular basis for proofreading by the unique exonuclease domain of Family-D DNA polymerases," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-37896-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.