Author
Listed:
- Bettina Soós
- Gonzalo Nápoles
- Pieter Spronck
- Çiçek Güven
Abstract
This study introduces a post-hoc approach for explainable machine learning on graph-structured data by identifying relevant subgraphs and their corresponding subgraph patterns, the graphlet motifs. Unlike traditional motif detection methods, which rely solely on statistical occurrence frequencies, and therefore can be decoupled from the learning task, our method optimizes important motifs based on fidelity and sparsity, together with sufficiency and necessity, which are deemed key properties when generating explanations for the learning task. The resulting motifs are relevant due to being explanatory in the prediction task and their subgraph coverings correspond to the explanatory subgraphs. The method outperforms state-of-the-art techniques when comparing performance on the explanatory subgraph. Being an NP-hard problem, without constraining motif structure (e.g., fixing motif size), finding subgraph monomorphs of any form, and for all possible motifs, is computationally difficult. Hence, a genetic algorithm is used to search the possible subgraph space with the properties of desirable explanations. Fidelity ensures that the selected subgraphs maintain predictive power, which means that marginalizing or altering the subgraph would significantly impact the model’s output. Sparsity guarantees that the explanation is as concise as possible, hence avoiding redundant or overly complex subgraphs while still capturing the core reasoning behind the prediction. Minimizing fidelity on the part of the graph that is not in the explanation (the non-covering subgraph) supports that the explanations are not only sufficient, but also necessary. We frame the search for explanatory graphlets as a multi-objective optimization problem that balances these properties. The methodology is demonstrated with single motifs as building blocks of subgraph explanations and evaluated on two complementary synthetic datasets designed for graph prediction tasks, also in comparison with state-of-the-art methodologies. The motifs found not only cover relevant structural patterns but also contribute meaningfully to the model’s decision-making process.Author summary: In this work we present an explainability method for machine learning on graphs. Taking neural networks as machine learning models obscures the interpretability of the models’ prediction, given the complicated functions with large number of parameters they implement. Graphs represent entities and their relations that are increasingly used as primary data structures to represent data for machine learning. However, due to the complex nature of graph-structured data, additional considerations need to be taken when applying explainability methods directly on this domain. We use the genetic algorithm to find relevant substructures in the graph that explain the predictions. The explanations remain higher-order with searching for the combination of nodes and edges, i.e., subgraphs, relevant for the predictions. Furthermore, not only the relevant subgraph in the graph, but the repeating pattern in the subgraph, i.e., the graphlet motif, is analyzed for explainability. The genetic algorithm effectively explores the search space and finds meaningful explanations, which are evaluated with three neural networks on two synthetic datasets.
Suggested Citation
Bettina Soós & Gonzalo Nápoles & Pieter Spronck & Çiçek Güven, 2025.
"Determining graphlet explanations for machine learning on graphs,"
PLOS Complex Systems, Public Library of Science, vol. 2(8), pages 1-23, August.
Handle:
RePEc:plo:pcsy00:0000067
DOI: 10.1371/journal.pcsy.0000067
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcsy00:0000067. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: complexsystem (email available below). General contact details of provider: https://journals.plos.org/complexsystems/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.