Author
Listed:
- Shihong Liu
- Yunfang Xie
- Xuanli Gong
- Jieyu He
- Wei Zou
Abstract
Objective: Gliomas are among the most common and heterogeneous primary tumours of the central nervous system. Accurate grading is essential for treatment planning and prognosis, yet conventional histopathological approaches are limited by subjectivity and poor reproducibility. This study aimed to develop a machine learning–based prediction model that integrates clinical and molecular characteristics to improve early glioma grading, thereby enhancing diagnostic accuracy and supporting individualized treatment strategies. Methods: An efficient prediction model for low-grade gliomas (LGGs) and glioblastoma (grade IV, GBM) was developed by utilizing the clinical and molecular characteristics of gliomas from The Cancer Genome Atlas (TCGA) dataset. A novel integration of recursive feature elimination (RFE) with random forest (RF) and elastic net regression (ENR) was implemented to select features efficiently. Additionally, the synthetic minority oversampling technique (SMOTE) was applied to balance the training set, and K-nearest neighbours (KNN), support vector machine (SVM), and other algorithms were optimized through random-search hyper-parameter optimization (HPO) with five-fold cross-validation, yielding nine distinct machine learning (ML) models. Ultimately, by applying the voting and stacking algorithms, 34 ensemble learning models were constructed. Furthermore, all the models were externally validated using the Chinese Glioma Genome Atlas (CGGA) dataset. Finally, SHapley Additive exPlanations (SHAP) analysis was conducted to elucidate the prediction processes of the ensemble models. Results: Feature selection revealed 11 key grading features, including Tumour Protein 53 (TP53) and Isocitrate Dehydrogenase 1 (IDH1). Among the 9 basic models constructed by combining optimization techniques such as SMOTE, the RF model had the best performance (Area Under Curve (AUC) of 0.916 for TCGA and 0.797 for CGGA). Among the 34 integrated models constructed, the Voting25 model integrating RF, Extreme Gradient Boosting (XGBoost), and KNN achieved AUC values of 0.928 and 0.794, respectively, on the TCGA and CGGA datasets, demonstrating overall optimal predictive performance. Conclusion: Eleven key features have been identified that facilitate molecular detection and personalized targeted therapy for glioma. Nine models were developed and optimized, and the RF model was observed to provide the best performance, potentially guiding future ML-related research in glioma. Additionally, the voting ensemble method, which integrates RF, XGBoost, and KNN, was shown to achieve superior performance, thereby enhancing both accuracy and robustness. Finally, all the models were successfully validated on the CGGA dataset, indicating strong generalizability.
Suggested Citation
Shihong Liu & Yunfang Xie & Xuanli Gong & Jieyu He & Wei Zou, 2025.
"Machine learning-based prediction of glioma grading,"
PLOS ONE, Public Library of Science, vol. 20(12), pages 1-23, December.
Handle:
RePEc:plo:pone00:0314831
DOI: 10.1371/journal.pone.0314831
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0314831. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.