Author
Listed:
- Kashyap Chhatbar
- Adrian Bird
- Guido Sanguinetti
Abstract
Transcriptional regulation involves complex interactions with chromatin-associated proteins, but disentangling these mechanistically remains challenging. Here, we generate deep learning models to predict RNA Pol-II occupancy from chromatin-associated protein profiles in unperturbed conditions. We evaluate the suitability of Shapley Additive Explanations (SHAP), a widely used explainable AI (XAI) approach, to infer functional relevance and analyse regulatory mechanisms across diverse datasets. We aim to validate these insights using data from degron-based perturbation experiments. Remarkably, genes ranked by SHAP importance predict direct targets of perturbation even from unperturbed data, enabling inference without costly experimental interventions. Our analysis reveals that SHAP not only predicts differential gene expression but also captures the magnitude of transcriptional changes. We validate the cooperative roles of SET1A and ZC3H4 at promoters and uncover novel regulatory contributions of ZC3H4 at gene bodies in influencing transcription. Cross-dataset validation uncovers unexpected connections between ZC3H4, a component of the Restrictor complex, and INTS11, part of the Integrator complex, suggesting crosstalk mediated by H3K4me3 and the SET1/COMPASS complex in transcriptional regulation. These findings highlight the power of integrating predictive modelling and experimental validation to unravel complex context-dependent regulatory networks and generate novel biological hypotheses.Author summary: Genes are turned on or off through complex processes involving many proteins that interact with DNA wrapped histones and modify their structure. These changes, known as epigenetic modifications, help control how genes are expressed without altering the DNA sequence itself. In this study, we wanted to understand how different proteins influence gene activity in mouse stem cells by looking at their positions along the genome, particularly whether they act near the gene’s start site (promoter) or within the gene body. To do this, we used machine learning models and a method called SHAP, which helps explain the model’s decisions. By comparing our predictions to data from experiments where specific proteins were removed, we found that some proteins have context-specific effects, acting not only at the promoter but also along the whole gene body. Our approach highlighted both well-known and unexpected regulators of transcription and revealed that gene body signals, which are often overlooked, can play key roles. These findings show how explainable AI can help uncover new insights into how epigenetic features shape gene regulation, and offer a powerful way to generate testable hypotheses from complex genomic data.
Suggested Citation
Kashyap Chhatbar & Adrian Bird & Guido Sanguinetti, 2025.
"Modelling transcription with explainable AI uncovers context-specific epigenetic gene regulation at promoters and gene bodies,"
PLOS Genetics, Public Library of Science, vol. 21(10), pages 1-22, October.
Handle:
RePEc:plo:pgen00:1011908
DOI: 10.1371/journal.pgen.1011908
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1011908. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.