IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1014084.html

PepLM-GNN: A graph neural network framework leveraging pre-trained language models for peptide-protein binding prediction

Author

Listed:
  • Ke Yan
  • Meijing Li
  • Shutao Chen
  • Tianyi Liu
  • Jing Hao
  • Bin Liu
  • Zhen Li

Abstract

Motivation: The precise prediction of peptide-protein interaction (PepPI) is a core support for promoting breakthroughs in peptide drug research, as well as understanding the regulatory mechanisms of biomolecules. Researchers have developed several computational methods to predict PepPI. However, existing computational methods also have significant limitations. At the level of data feature characterisation, the problem of PepPI does not conform to the Euclidean axioms, making it difficult for conventional prediction methods to effectively measure the underlying correlations between peptides and proteins. At the level of model generalisation performance, existing approaches are often hampered by insufficient generalisation ability, as manifested by their markedly degraded performance in cold start scenarios involving novel peptides, novel proteins, and novel binding pairs. Results: In this study, we propose a computing framework, PepLM-GNN, that integrates a pre-trained language ProtT5 model with a hybrid graph network for accurate identification of PepPI. This model constructs a graph by using ProtT5-extracted semantic context features of peptides and proteins to form heterogeneous nodes, with edges connecting interacting peptide-protein pairs. The hybrid graph network Graph Convolutional Networks (GCN) provides the comprehensive information of the peptide and protein sequences, while employing the Graph Isomorphism Network (GIN) to capture the global interactions between them. Specifically, the GCN aggregates both the semantic context information of node sequences and local neighbourhood information, effectively representing non-Euclidean data. To capture the global associations, we adopt a GIN strategy to optimize the cross-node feature interaction and transfer process, thereby enhancing the generalisation performance of addressing the cold start scenario. Compared with the existing advanced methods, PepLM-GNN demonstrated highly accurate performance and robustness in predicting the PepPI. We further demonstrated the capabilities of PepLM-GNN in virtual peptide drug screening, which is expected to facilitate the discovery of peptide drugs and the elucidation of protein functions. Author summary: We propose a computational framework, PepLM-GNN, that integrates the ProtT5 pre-trained language model with a hybrid graph network. Specifically, the semantic features of peptides and proteins are extracted using ProtT5 to construct a graph. Within the hybrid graph network, GCN model aggregates semantic and local neighborhood information from node sequences, enabling an adequate representation of non-Euclidean data. Meanwhile, GIN model is utilized to optimize the process of cross-node feature interaction and transmission, thereby enhancing the generalization performance in addressing cold-start scenarios. Experimental results demonstrate that PepLM-GNN outperforms existing state-of-the-art methods in both accuracy and robustness for PepPI prediction. Moreover, PepLM-GNN can be applied to virtual peptide drug screening, thereby accelerating the development of peptide drugs. Furthermore, we have established a public online service platform (http://bliulab.net/PepLM-GNN) to facilitate the practical application.

Suggested Citation

  • Ke Yan & Meijing Li & Shutao Chen & Tianyi Liu & Jing Hao & Bin Liu & Zhen Li, 2026. "PepLM-GNN: A graph neural network framework leveraging pre-trained language models for peptide-protein binding prediction," PLOS Computational Biology, Public Library of Science, vol. 22(3), pages 1-19, March.
  • Handle: RePEc:plo:pcbi00:1014084
    DOI: 10.1371/journal.pcbi.1014084
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014084
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1014084&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1014084?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014084. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.