Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

My bibliography Save this article

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Author

Listed:

Jin Hayakawa
Tomohisa Seki
Yoshimasa Kawazoe
Kazuhiko Ohe

Registered:

Abstract

Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.

Suggested Citation

Jin Hayakawa & Tomohisa Seki & Yoshimasa Kawazoe & Kazuhiko Ohe, 2022. "Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma," PLOS ONE, Public Library of Science, vol. 17(6), pages 1-17, June.

Handle: RePEc:plo:pone00:0269570
DOI: 10.1371/journal.pone.0269570

Download full text from publisher

References listed on IDEAS

Stan Lipovetsky & Michael Conklin, 2001. "Analysis of regression in game theory approach," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 17(4), pages 319-330, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Pera, Rebecca & Viglia, Giampaolo & Furlan, Roberto, 2016. "Who Am I? How Compelling Self-storytelling Builds Digital Personal Reputation," Journal of Interactive Marketing, Elsevier, vol. 35(C), pages 44-55.
Hugh Chen & Scott M. Lundberg & Su-In Lee, 2022. "Explaining a series of models by propagating Shapley values," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
Emrah Arbak, 2017. "Identifying the provisioning policies of Belgian banks," Working Paper Research 326, National Bank of Belgium.
Xingwei Hu, 2020. "A theory of dichotomous valuation with applications to variable selection," Econometric Reviews, Taylor & Francis Journals, vol. 39(10), pages 1075-1099, November.
- Hu, Xingwei, 2017. "A Theory of Dichotomous Valuation with Applications to Variable Selection," MPRA Paper 80457, University Library of Munich, Germany.
Dmitry Sharapov & Paul Kattuman & Diego Rodriguez & F. Javier Velazquez, 2021. "Using the SHAPLEY value approach to variance decomposition in strategy research: Diversification, internationalization, and corporate group effects on affiliate profitability," Strategic Management Journal, Wiley Blackwell, vol. 42(3), pages 608-623, March.
Qifeng Zhuang & Weiwei Zhu & Nana Yan & Ghaleb Faour & Mariam Ibrahim & Liang Zhu, 2025. "An Interpretable Machine Learning Approach to Remote Sensing-Based Estimation of Hourly Agricultural Evapotranspiration in Drylands," Agriculture, MDPI, vol. 15(21), pages 1-17, October.
Elena Pokryshevskaya & Evgeny Antipov, 2013. "Importance-performance analysis for internet stores: a system based on publicly available panel data," HSE Working papers WP BRP 08/MAN/2013, National Research University Higher School of Economics.
Pelin Ayranci & Phung Lai & Nhathai Phan & Han Hu & Alexander Kolinowski & David Newman & Deijing Dou, 2022. "OnML: an ontology-based approach for interpretable machine learning," Journal of Combinatorial Optimization, Springer, vol. 44(1), pages 770-793, August.
Khoa Tran & Hai-Canh Vu & Lam Pham & Nassim Boudaoud & Ho-Si-Hung Nguyen, 2024. "Robust-MBDL: A Robust Multi-Branch Deep-Learning-Based Model for Remaining Useful Life Prediction of Rotating Machines," Mathematics, MDPI, vol. 12(10), pages 1-25, May.
Gabriel Ferrettini & Elodie Escriva & Julien Aligon & Jean-Baptiste Excoffier & Chantal Soulé-Dupuy, 2022. "Coalitional Strategies for Efficient Individual Prediction Explanation," Information Systems Frontiers, Springer, vol. 24(1), pages 49-75, February.
Riccardo Colini-Baldeschi & Marco Scarsini & Stefano Vaccari, 2018. "Variance Allocation and Shapley Value," Methodology and Computing in Applied Probability, Springer, vol. 20(3), pages 919-933, September.
Ruiqiao Bai & Jacqueline C. K. Lam & Victor O. K. Li, 2023. "What dictates income in New York City? SHAP analysis of income estimation based on Socio-economic and Spatial Information Gaussian Processes (SSIG)," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 10(1), pages 1-14, December.
repec:jss:jstsof:33:i10 is not listed on IDEAS
Salas, Patricio & De la Fuente, Rodrigo & Astroza, Sebastian & Carrasco, Juan Antonio, 2025. "Analysis of attribute importance in multinomial logit models using Shapley values-based methods," Journal of choice modelling, Elsevier, vol. 54(C).
Yuntao Wu & Ege Mert Akin & Charles Martineau & Vincent Gr'egoire & Andreas Veneris, 2025. "Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns," Papers 2509.24254, arXiv.org, revised Oct 2025.
Liu, Jiefeng & Zhang, Zhenhao & Fan, Xianhao & Zhang, Yiyi & Wang, Jiaqi & Zhou, Ke & Liang, Shuo & Yu, Xiaoyong & Zhang, Wei, 2022. "Power system load forecasting using mobility optimization and multi-task learning in COVID-19," Applied Energy, Elsevier, vol. 310(C).
Yung-Hsiang Ying & Wen-Li Lee & Ying-Chen Chi & Mei-Jung Chen & Koyin Chang, 2022. "Demographics, Socioeconomic Context, and the Spread of Infectious Disease: The Case of COVID-19," IJERPH, MDPI, vol. 19(4), pages 1-24, February.
Jacobs, Martin & Requate, Till, 2016. "Demand rationing in Bertrand-Edgeworth markets with fixed capacities: An experiment," Economics Working Papers 2016-03, Christian-Albrechts-University of Kiel, Department of Economics.
Marcus Buckmann & Andreas Joseph, 2022. "An interpretable machine learning workflow with an application to economic forecasting," Bank of England working papers 984, Bank of England.
Aas Kjersti & Nagler Thomas & Jullum Martin & Løland Anders, 2021. "Explaining predictive models using Shapley values and non-parametric vine copulas," Dependence Modeling, De Gruyter, vol. 9(1), pages 62-81, January.
Borgonovo, Emanuele & Plischke, Elmar & Rabitti, Giovanni, 2024. "The many Shapley values for explainable artificial intelligence: A sensitivity analysis perspective," European Journal of Operational Research, Elsevier, vol. 318(3), pages 911-926.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0269570. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data