IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0269570.html
   My bibliography  Save this article

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Author

Listed:
  • Jin Hayakawa
  • Tomohisa Seki
  • Yoshimasa Kawazoe
  • Kazuhiko Ohe

Abstract

Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.

Suggested Citation

  • Jin Hayakawa & Tomohisa Seki & Yoshimasa Kawazoe & Kazuhiko Ohe, 2022. "Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma," PLOS ONE, Public Library of Science, vol. 17(6), pages 1-17, June.
  • Handle: RePEc:plo:pone00:0269570
    DOI: 10.1371/journal.pone.0269570
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0269570
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0269570&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0269570?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stan Lipovetsky & Michael Conklin, 2001. "Analysis of regression in game theory approach," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 17(4), pages 319-330, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pera, Rebecca & Viglia, Giampaolo & Furlan, Roberto, 2016. "Who Am I? How Compelling Self-storytelling Builds Digital Personal Reputation," Journal of Interactive Marketing, Elsevier, vol. 35(C), pages 44-55.
    2. Hugh Chen & Scott M. Lundberg & Su-In Lee, 2022. "Explaining a series of models by propagating Shapley values," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    3. Emrah Arbak, 2017. "Identifying the provisioning policies of Belgian banks," Working Paper Research 326, National Bank of Belgium.
    4. Xingwei Hu, 2020. "A theory of dichotomous valuation with applications to variable selection," Econometric Reviews, Taylor & Francis Journals, vol. 39(10), pages 1075-1099, November.
    5. Dmitry Sharapov & Paul Kattuman & Diego Rodriguez & F. Javier Velazquez, 2021. "Using the SHAPLEY value approach to variance decomposition in strategy research: Diversification, internationalization, and corporate group effects on affiliate profitability," Strategic Management Journal, Wiley Blackwell, vol. 42(3), pages 608-623, March.
    6. Elena Pokryshevskaya & Evgeny Antipov, 2013. "Importance-performance analysis for internet stores: a system based on publicly available panel data," HSE Working papers WP BRP 08/MAN/2013, National Research University Higher School of Economics.
    7. Pelin Ayranci & Phung Lai & Nhathai Phan & Han Hu & Alexander Kolinowski & David Newman & Deijing Dou, 2022. "OnML: an ontology-based approach for interpretable machine learning," Journal of Combinatorial Optimization, Springer, vol. 44(1), pages 770-793, August.
    8. Khoa Tran & Hai-Canh Vu & Lam Pham & Nassim Boudaoud & Ho-Si-Hung Nguyen, 2024. "Robust-MBDL: A Robust Multi-Branch Deep-Learning-Based Model for Remaining Useful Life Prediction of Rotating Machines," Mathematics, MDPI, vol. 12(10), pages 1-25, May.
    9. Gabriel Ferrettini & Elodie Escriva & Julien Aligon & Jean-Baptiste Excoffier & Chantal Soulé-Dupuy, 2022. "Coalitional Strategies for Efficient Individual Prediction Explanation," Information Systems Frontiers, Springer, vol. 24(1), pages 49-75, February.
    10. Riccardo Colini-Baldeschi & Marco Scarsini & Stefano Vaccari, 2018. "Variance Allocation and Shapley Value," Methodology and Computing in Applied Probability, Springer, vol. 20(3), pages 919-933, September.
    11. Ruiqiao Bai & Jacqueline C. K. Lam & Victor O. K. Li, 2023. "What dictates income in New York City? SHAP analysis of income estimation based on Socio-economic and Spatial Information Gaussian Processes (SSIG)," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-14, December.
    12. repec:jss:jstsof:33:i10 is not listed on IDEAS
    13. Salas, Patricio & De la Fuente, Rodrigo & Astroza, Sebastian & Carrasco, Juan Antonio, 2025. "Analysis of attribute importance in multinomial logit models using Shapley values-based methods," Journal of choice modelling, Elsevier, vol. 54(C).
    14. Liu, Jiefeng & Zhang, Zhenhao & Fan, Xianhao & Zhang, Yiyi & Wang, Jiaqi & Zhou, Ke & Liang, Shuo & Yu, Xiaoyong & Zhang, Wei, 2022. "Power system load forecasting using mobility optimization and multi-task learning in COVID-19," Applied Energy, Elsevier, vol. 310(C).
    15. Yung-Hsiang Ying & Wen-Li Lee & Ying-Chen Chi & Mei-Jung Chen & Koyin Chang, 2022. "Demographics, Socioeconomic Context, and the Spread of Infectious Disease: The Case of COVID-19," IJERPH, MDPI, vol. 19(4), pages 1-24, February.
    16. Jacobs, Martin & Requate, Till, 2016. "Demand rationing in Bertrand-Edgeworth markets with fixed capacities: An experiment," Economics Working Papers 2016-03, Christian-Albrechts-University of Kiel, Department of Economics.
    17. Marcus Buckmann & Andreas Joseph, 2022. "An interpretable machine learning workflow with an application to economic forecasting," Bank of England working papers 984, Bank of England.
    18. Aas Kjersti & Nagler Thomas & Jullum Martin & Løland Anders, 2021. "Explaining predictive models using Shapley values and non-parametric vine copulas," Dependence Modeling, De Gruyter, vol. 9(1), pages 62-81, January.
    19. Borgonovo, Emanuele & Plischke, Elmar & Rabitti, Giovanni, 2024. "The many Shapley values for explainable artificial intelligence: A sensitivity analysis perspective," European Journal of Operational Research, Elsevier, vol. 318(3), pages 911-926.
    20. Stan Lipovetsky, 2021. "Predictor Analysis in Group Decision Making," Stats, MDPI, vol. 4(1), pages 1-14, February.
    21. Viglia, Giampaolo & Abrate, Graziano, 2017. "When distinction does not pay off - Investigating the determinants of European agritourism prices," Journal of Business Research, Elsevier, vol. 80(C), pages 45-52.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0269570. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.