Author
Listed:
- Paola Cornejo-Páramo
(Victor Chang Cardiac Research Institute
UNSW Sydney, School of Biotechnology and Biomolecular Sciences)
- Xuan Zhang
(Victor Chang Cardiac Research Institute)
- Lithin Louis
(Victor Chang Cardiac Research Institute)
- Zelun Li
(Victor Chang Cardiac Research Institute)
- Yihua Yang
(Victor Chang Cardiac Research Institute)
- Emily S. Wong
(Victor Chang Cardiac Research Institute
UNSW Sydney, School of Biotechnology and Biomolecular Sciences)
Abstract
Deciphering how DNA sequence specifies cell-type-specific regulatory activity is a central challenge in gene regulation. We present Bag-of-Motifs (BOM), a computational framework that represents distal cis-regulatory elements as unordered counts of transcription factor (TF) motifs. This minimalist representation, combined with gradient-boosted trees, enables the accurate prediction of cell-type-specific enhancers across mouse, human, zebrafish, and Arabidopsis datasets. Despite its simplicity, BOM outperforms more complex deep-learning models while using fewer parameters. We validate BOM’s predictions experimentally by constructing synthetic enhancers from the most predictive motifs, demonstrating that these motif sets drive cell-type-specific expression. By providing direct interpretability and broad applicability, BOM reveals a highly predictive sequence code at distal regulatory regions and offers a scalable framework for dissecting cis-regulatory grammar across diverse species and conditions.
Suggested Citation
Paola Cornejo-Páramo & Xuan Zhang & Lithin Louis & Zelun Li & Yihua Yang & Emily S. Wong, 2025.
"Motif-based models accurately predict cell type-specific distal regulatory elements,"
Nature Communications, Nature, vol. 16(1), pages 1-15, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65362-2
DOI: 10.1038/s41467-025-65362-2
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65362-2. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.