IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1013679.html
   My bibliography  Save this article

OneProt: Towards multi-modal protein foundation models via latent space alignment of sequence, structure, binding sites and text encoders

Author

Listed:
  • Klemens Flöge
  • Srisruthi Udayakumar
  • Johanna Sommer
  • Marie Piraud
  • Stefan Kesselheim
  • Vincent Fortuin
  • Stephan Günnemann
  • Karel J van der Weg
  • Holger Gohlke
  • Erinc Merdivan
  • Alina Bazarova

Abstract

Recent advances in Artificial Intelligence have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal Deep Learning model for proteins that integrates structural, sequence, text, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of protein modality encoders in a lightweight fine-tuning scheme that focuses on pairwise alignment with sequence data, rather than requiring full matches. This novel approach comprises a mix of Graph Neural Networks and transformer architectures. It demonstrates good performance in retrieval tasks and showcases the efficacy of multi-modal systems in Protein Machine Learning through a broad spectrum of downstream baselines, including enzyme function prediction and binding site analysis. Furthermore, OneProt enables the transfer of representational information from specialized encoders to the sequence encoder, enhancing capabilities for distinguishing evolutionarily related and unrelated sequences and exhibiting representational properties where evolutionarily related proteins align in similar directions within the latent space. In addition, we extensively investigate modality ablations to identify the encoders that contribute the most to predictive performance, highlighting the significance of the binding site encoder, which has not been used in similar models previously. This work expands the horizons of multi-modal protein models, paving the way for transformative applications in drug discovery, biocatalytic reaction planning, and protein engineering.Author summary: In this study, we introduce OneProt, a novel, versatile Artificial Intelligence system designed for protein analysis. In order to integrate different types of data, structural, sequence, text, and binding sites, OneProt uses the ImageBind framework, efficiently aligning protein data without needing full matches. Combining Graph Neural Networks and transformer architectures, OneProt excels in tasks like enzyme function prediction and binding site analysis. It enhances the understanding of protein relationships by transferring information between different data types, making it easier to identify related proteins. The OneProt framework stands out for two key features: the ability to incorporate custom modalities during pre-training and a simple fine-tuning process that requires only a Multi-Layer Perceptron projection. Notably, we also show that incorporating multiple modalities can reduce the need for extensive datasets and training, leading to competitive downstream performance. In addition, we conduct an exhaustive ablation study, where we highlight the crucial role of the binding site encoder, which has not been used in similar models before. Overall, OneProt represents a significant step forward in multi-modal protein modeling, with promising applications in drug discovery and protein engineering.

Suggested Citation

  • Klemens Flöge & Srisruthi Udayakumar & Johanna Sommer & Marie Piraud & Stefan Kesselheim & Vincent Fortuin & Stephan Günnemann & Karel J van der Weg & Holger Gohlke & Erinc Merdivan & Alina Bazarova, 2025. "OneProt: Towards multi-modal protein foundation models via latent space alignment of sequence, structure, binding sites and text encoders," PLOS Computational Biology, Public Library of Science, vol. 21(11), pages 1-27, November.
  • Handle: RePEc:plo:pcbi00:1013679
    DOI: 10.1371/journal.pcbi.1013679
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013679
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1013679&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1013679?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vladimir Gligorijević & P. Douglas Renfrew & Tomasz Kosciolek & Julia Koehler Leman & Daniel Berenberg & Tommi Vatanen & Chris Chandler & Bryn C. Taylor & Ian M. Fisk & Hera Vlamakis & Ramnik J. Xavie, 2021. "Structure-based protein function prediction using graph convolutional networks," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Karel Weg & Erinc Merdivan & Marie Piraud & Holger Gohlke, 2025. "TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    2. Ziqi Gao & Chenran Jiang & Jiawen Zhang & Xiaosen Jiang & Lanqing Li & Peilin Zhao & Huanming Yang & Yong Huang & Jia Li, 2023. "Hierarchical graph learning for protein–protein interaction," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    3. Steven A. Sullivan & Jordan C. Orosco & Francisco Callejas-Hernández & Frances Blow & Hayan Lee & T. Rhyker Ranallo-Benavidez & Andrew Peters & Shane R. Raidal & Yvette A. Girard & Christine K. Johnso, 2025. "Comparative genomics of the parasite Trichomonas vaginalis reveals genes involved in spillover from birds to humans," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    4. Stefanie Duller & Simone Vrbancic & Łukasz Szydłowski & Alexander Mahnert & Marcus Blohs & Michael Predl & Christina Kumpitsch & Verena Zrim & Christoph Högenauer & Tomasz Kosciolek & Ruth A. Schmitz , 2024. "Targeted isolation of Methanobrevibacter strains from fecal samples expands the cultivated human archaeome," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    5. Kerr Ding & Jiaqi Luo & Yunan Luo, 2024. "Leveraging conformal prediction to annotate enzyme function space with limited false positives," PLOS Computational Biology, Public Library of Science, vol. 20(5), pages 1-21, May.
    6. Yaan J. Jang & Qi-Qi Qin & Si-Yu Huang & Arun T. John Peter & Xue-Ming Ding & Benoît Kornmann, 2024. "Accurate prediction of protein function using statistics-informed graph networks," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    7. Chuwen Zhang & Yong He & Jieni Wang & Tengkai Chen & Federico Baltar & Minjie Hu & Jing Liao & Xi Xiao & Zhao-Rong Li & Xiyang Dong, 2025. "LucaPCycle: Illuminating microbial phosphorus cycling in deep-sea cold seep sediments using protein language models," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    8. Paweł Szczerbiak & Lukasz M. Szydlowski & Witold Wydmański & P. Douglas Renfrew & Julia Koehler Leman & Tomasz Kosciolek, 2025. "Large protein databases reveal structural complementarity and functional locality," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    9. Samuel Miravet-Verde & Rocco Mazzolini & Carolina Segura-Morales & Alicia Broto & Maria Lluch-Senar & Luis Serrano, 2024. "ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    10. Julia Koehler Leman & Pawel Szczerbiak & P. Douglas Renfrew & Vladimir Gligorijevic & Daniel Berenberg & Tommi Vatanen & Bryn C. Taylor & Chris Chandler & Stefan Janssen & Andras Pataki & Nick Carrier, 2023. "Sequence-structure-function relationships in the microbial protein universe," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    11. Marco Malatesta & Emanuele Fornasier & Martino Luigi Salvo & Angela Tramonti & Erika Zangelmi & Alessio Peracchi & Andrea Secchi & Eugenia Polverini & Gabriele Giachin & Roberto Battistutta & Roberto , 2024. "One substrate many enzymes virtual screening uncovers missing genes of carnitine biosynthesis in human and mouse," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    12. Shunshi Kohyama & Béla P. Frohn & Leon Babl & Petra Schwille, 2024. "Machine learning-aided design and screening of an emergent protein function in synthetic cells," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    13. William Mo & Christopher A. Vaiana & Chris J. Myers, 2024. "The need for adaptability in detection, characterization, and attribution of biosecurity threats," Nature Communications, Nature, vol. 15(1), pages 1-9, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013679. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.