HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis

HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis

Author

Listed:

Bangyi Zhang
Yun Zuo
Jun Wan
Jiayue Liu
Xiangrong Liu
Xiangxiang Zeng
Zhaohong Deng

Abstract

Cancer remains a major contributor to global mortality, constituting a significant and escalating threat to human health. Anticancer peptides (ACPs) have emerged as promising therapeutic agents due to their specific mechanisms of action, pronounced tumor-targeting capability, and low toxicity. Nevertheless, traditional approaches for ACP identification are constrained by their reliance on shallow, hand-crafted sequence features, which fail to capture deeper semantic and structural characteristics. Moreover, such models exhibit limited robustness and interpretability when confronted with practical challenges such as severe class imbalance. To address these limitations, this study proposes HyperACP, an innovative framework for ACP recognition that integrates deep representation learning, adaptive sampling, and mechanistic interpretability. The framework leverages the ESMC protein language model to extract comprehensive sequence features and employs a novel adaptive algorithm, ANBS, to mitigate class imbalance at the decision boundary. For enhanced model transparency, SHAP-Res is incorporated to elucidate the contributions of individual residues to the final predictions. Comprehensive evaluations demonstrate that HyperACP consistently outperforms state-of-the-art methods across multiple datasets and validation protocols—including 10-fold cross-validation and independent test sets—according to metrics such as Accuracy (ACC), Sensitivity (SN), Specificity (SP), Matthews Correlation Coefficient (MCC), and Area Under the Curve (AUC). Furthermore, the model yields biologically interpretable results, pinpointing key residues (K, L, F, G) known to play pivotal roles in anticancer activity. These findings provide not only a robust predictive tool (available at www.hyperacp.com) but also novel insights into the structure-function relationships underlying ACPs.Author summary: Accurately identifying anticancer peptides (ACPs) is crucial for the discovery of next-generation cancer therapeutics, but existing computational methods often struggle with feature limitations, data imbalance, and poor interpretability. In this study, we present HyperACP—a novel computational framework that combines advanced deep protein representation, an adaptive sampling algorithm, and an ensemble learning strategy to systematically address these challenges. HyperACP utilizes a cutting-edge protein language model to capture the complex structural and functional information embedded in peptide sequences. Our newly designed sampling method generates informative minority class samples in challenging boundary regions, improving the model’s robustness to imbalanced data. Most importantly, we introduce SHAP-Res, a residue-level interpretability module that reveals how individual amino acids drive model predictions and links computational insights directly to biological function. HyperACP not only achieves state-of-the-art prediction accuracy but also provides transparent and biologically meaningful explanations, paving the way for more reliable and interpretable peptide-based drug discovery.

Suggested Citation

Bangyi Zhang & Yun Zuo & Jun Wan & Jiayue Liu & Xiangrong Liu & Xiangxiang Zeng & Zhaohong Deng, 2025. "HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis," PLOS Computational Biology, Public Library of Science, vol. 21(9), pages 1-25, September.

Handle: RePEc:plo:pcbi00:1013489
DOI: 10.1371/journal.pcbi.1013489

Download full text from publisher

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013489. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data