Author
Listed:
- Hua Shi
- Yihang Lin
- Dachen Liu
- Quan Zou
Abstract
Identifying effector proteins of secretion systems in Gram-negative bacteria is crucial for deciphering their pathogenic mechanisms and guiding the development of antimicrobial strategies. Extracting evolutionary and sequence features using pre-trained protein language models (PLMs) has emerged as an effective approach to improve the performance of effector protein prediction. However, the high-dimensional features generated by PLMs contain extensive general biological information, making it difficult to focus on core features when applied directly to effector protein tasks, which in turn limits prediction performance. In this study, we propose MoCETSE, a deep learning model for predicting effector proteins in Gram-negative bacteria. Specifically, MoCETSE first extracts contextual representations of sequences using the pre-trained protein language model ESM-1b. Subsequently, it refines key functional features via a target preprocessing network to construct more expressive sequence representations. Finally, integrated with a transformer module incorporating relative positional encoding, MoCETSE explicitly models the relative spatial relationships between residues, enabling highly accurate prediction of secreted effector proteins. MoCETSE exhibits excellent and robust performance in both five-fold cross-validation and independent testing. Benchmark results demonstrate that it maintains strong competitiveness compared to existing binary and multi-class predictors. Additionally, the model can effectively perform genome-wide effector protein prediction, showing outstanding specificity and reliability. MoCETSE provides an efficient and robust computational framework for the accurate identification of bacterial effector substrates and offers key biological insights.Author summary: Secreted effector proteins are a class of key virulence factors in Gram-negative bacteria. After being injected into host cells, they interfere with normal cellular functions, leading to the development of diseases. Accurate identification of these virulence proteins is crucial for understanding bacterial pathogenic mechanisms and developing therapeutic strategies. However, existing methods suffer from issues such as feature redundancy and insufficient capture of long-range dependency signals. Here, we developed a novel computational framework called MoCETSE that enables end-to-end intelligent prediction of effector proteins directly from raw protein sequence information. The model leverages a pre-trained protein language model to extract deep biological information from raw sequences; a target preprocessing network then refines the extracted information to focus on features most relevant to effector protein identification. During the learning of secretion signal features, we introduced relative positional encoding to effectively capture associations between distant positions in the sequence. In cross-category prediction, MoCETSE outperformed tools such as DeepSecE. Furthermore, we provide interpretable biological mechanisms supporting the model, revealing which key sequence motifs and functional regions play core roles in distinguishing different types of effector proteins.
Suggested Citation
Hua Shi & Yihang Lin & Dachen Liu & Quan Zou, 2026.
"MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors,"
PLOS Computational Biology, Public Library of Science, vol. 22(3), pages 1-23, March.
Handle:
RePEc:plo:pcbi00:1013397
DOI: 10.1371/journal.pcbi.1013397
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013397. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.