IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0269570.html
   My bibliography  Save this article

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Author

Listed:
  • Jin Hayakawa
  • Tomohisa Seki
  • Yoshimasa Kawazoe
  • Kazuhiko Ohe

Abstract

Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.

Suggested Citation

  • Jin Hayakawa & Tomohisa Seki & Yoshimasa Kawazoe & Kazuhiko Ohe, 2022. "Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma," PLOS ONE, Public Library of Science, vol. 17(6), pages 1-17, June.
  • Handle: RePEc:plo:pone00:0269570
    DOI: 10.1371/journal.pone.0269570
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0269570
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0269570&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0269570?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stan Lipovetsky & Michael Conklin, 2001. "Analysis of regression in game theory approach," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 17(4), pages 319-330, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Borgonovo, Emanuele & Plischke, Elmar & Rabitti, Giovanni, 2024. "The many Shapley values for explainable artificial intelligence: A sensitivity analysis perspective," European Journal of Operational Research, Elsevier, vol. 318(3), pages 911-926.
    2. Pera, Rebecca & Viglia, Giampaolo & Furlan, Roberto, 2016. "Who Am I? How Compelling Self-storytelling Builds Digital Personal Reputation," Journal of Interactive Marketing, Elsevier, vol. 35(C), pages 44-55.
    3. Stan Lipovetsky, 2021. "Predictor Analysis in Group Decision Making," Stats, MDPI, vol. 4(1), pages 1-14, February.
    4. Hugh Chen & Scott M. Lundberg & Su-In Lee, 2022. "Explaining a series of models by propagating Shapley values," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    5. Emrah Arbak, 2017. "Identifying the provisioning policies of Belgian banks," Working Paper Research 326, National Bank of Belgium.
    6. Viglia, Giampaolo & Abrate, Graziano, 2017. "When distinction does not pay off - Investigating the determinants of European agritourism prices," Journal of Business Research, Elsevier, vol. 80(C), pages 45-52.
    7. Xingwei Hu, 2020. "A theory of dichotomous valuation with applications to variable selection," Econometric Reviews, Taylor & Francis Journals, vol. 39(10), pages 1075-1099, November.
    8. Dmitry Sharapov & Paul Kattuman & Diego Rodriguez & F. Javier Velazquez, 2021. "Using the SHAPLEY value approach to variance decomposition in strategy research: Diversification, internationalization, and corporate group effects on affiliate profitability," Strategic Management Journal, Wiley Blackwell, vol. 42(3), pages 608-623, March.
    9. Xingwei Hu, 2018. "A Theory of Dichotomous Valuation with Applications to Variable Selection," Papers 1808.00131, arXiv.org, revised Mar 2020.
    10. Filotto, Umberto & Caratelli, Massimo & Fornezza, Fabrizio, 2021. "Shaping the digital transformation of the retail banking industry. Empirical evidence from Italy," European Management Journal, Elsevier, vol. 39(3), pages 366-375.
    11. Elena Pokryshevskaya & Evgeny Antipov, 2013. "Importance-performance analysis for internet stores: a system based on publicly available panel data," HSE Working papers WP BRP 08/MAN/2013, National Research University Higher School of Economics.
    12. Pelin Ayranci & Phung Lai & Nhathai Phan & Han Hu & Alexander Kolinowski & David Newman & Deijing Dou, 2022. "OnML: an ontology-based approach for interpretable machine learning," Journal of Combinatorial Optimization, Springer, vol. 44(1), pages 770-793, August.
    13. Jeffrey H. Bergstrand & Jordi Paniagua, 2024. "Do Deep Trade Agreements’ Provisions Actually Increase – or Decrease – Trade and/or FDI?," CESifo Working Paper Series 11526, CESifo.
    14. Eranga M. Wimalasiri & Ebrahim Jahanshiri & Tengku Adhwa Syaherah Tengku Mohd Suhairi & Hasika Udayangani & Ranjith B. Mapa & Asha S. Karunaratne & Lal P. Vidhanarachchi & Sayed N. Azam-Ali, 2020. "Basic Soil Data Requirements for Process-Based Crop Models as a Basis for Crop Diversification," Sustainability, MDPI, vol. 12(18), pages 1-20, September.
    15. Khoa Tran & Hai-Canh Vu & Lam Pham & Nassim Boudaoud & Ho-Si-Hung Nguyen, 2024. "Robust-MBDL: A Robust Multi-Branch Deep-Learning-Based Model for Remaining Useful Life Prediction of Rotating Machines," Mathematics, MDPI, vol. 12(10), pages 1-25, May.
    16. Gi-Wook Cha & Choon-Wook Park & Young-Chan Kim & Hyeun Jun Moon, 2023. "Predicting Generation of Different Demolition Waste Types Using Simple Artificial Neural Networks," Sustainability, MDPI, vol. 15(23), pages 1-22, November.
    17. Gabriel Ferrettini & Elodie Escriva & Julien Aligon & Jean-Baptiste Excoffier & Chantal Soulé-Dupuy, 2022. "Coalitional Strategies for Efficient Individual Prediction Explanation," Information Systems Frontiers, Springer, vol. 24(1), pages 49-75, February.
    18. Anton Yang & Jianwei Ai & Costas Arkolakis, 2025. "A Geospatial Approach to Measuring Economic Activity," Cowles Foundation Discussion Papers 2435, Cowles Foundation for Research in Economics, Yale University.
    19. Riccardo Colini-Baldeschi & Marco Scarsini & Stefano Vaccari, 2018. "Variance Allocation and Shapley Value," Methodology and Computing in Applied Probability, Springer, vol. 20(3), pages 919-933, September.
    20. Antoniadis, Anestis & Lambert-Lacroix, Sophie & Poggi, Jean-Michel, 2021. "Random forests for global sensitivity analysis: A selective review," Reliability Engineering and System Safety, Elsevier, vol. 206(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0269570. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.