Author
Listed:
- Yun Zuo
- Chenyi Zhang
- Ge Hua
- Qiao Ning
- Xiangrong Liu
- Xiangxiang Zeng
- Zhaohong Deng
Abstract
In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug–disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model’s predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system’s outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer’s disease and Parkinson’s disease, 80% and 60% of the top 10 candidate drugs recommended by FKSUDDAPre, respectively, had been confirmed by literature, demonstrating the model’s good practical application potential. Furthermore, we conducted a LIME-based feature importance analysis on the model’s predictions, visualizing the correlations between features and the target variable to demonstrate the model’s interpretability. A cross-platform, user-friendly visualization tool had also been developed using the PyQt5 framework.Author summary: Drug repurposing offers a cost-effective alternative to traditional drug discovery, yet accurately predicting which existing drugs can treat specific diseases remains computationally challenging. In this study, we present FKSUDDAPre, a novel framework designed to identify potential drug-disease associations with high precision. Our approach is driven by three key innovations: first, the integration of pre-trained Large Language Models (specifically K-BERT) to capture deep semantic features of drug molecules; second, the development of the AMDKSU resampling algorithm, which effectively solves the critical issue of data imbalance to enhance model robustness; and third, the incorporation of HyperFast, a cutting-edge hypernetwork architecture, to boost classification performance. By combining these advanced components with a dynamic weighting strategy, FKSUDDAPre significantly outperforms existing baselines, achieving an average AUC of 0.9725. The framework’s practical utility was validated through case studies on Alzheimer’s and Parkinson’s diseases, where it successfully identified numerous literature-confirmed drug candidates. Furthermore, we prioritize transparency and usability by incorporating LIME-based interpretability analysis and providing a user-friendly visualization tool, making FKSUDDAPre a powerful resource for accelerating biomedical research.
Suggested Citation
Yun Zuo & Chenyi Zhang & Ge Hua & Qiao Ning & Xiangrong Liu & Xiangxiang Zeng & Zhaohong Deng, 2026.
"FKSUDDAPre: A drug–disease association prediction framework based on F-TEST feature selection and AMDKSU resampling with interpretability analysis,"
PLOS Computational Biology, Public Library of Science, vol. 22(2), pages 1-29, February.
Handle:
RePEc:plo:pcbi00:1013947
DOI: 10.1371/journal.pcbi.1013947
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013947. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.