IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-65066-7.html
   My bibliography  Save this article

Omnireg-gpt: a high-efficiency foundation model for comprehensive genomic sequence understanding

Author

Listed:
  • Aowen Wang

    (Zhejiang University, College of Computer Science and Technology)

  • Jiaqi Li

    (Zhejiang University Medical Center, Liangzhu Laboratory
    Zhejiang University, Centre for Evolutionary and Organismal Biology)

  • Hongyu Dong

    (Zhejiang University, College of Computer Science and Technology
    Westlake University, School of Engineering
    Zhongguancun Academy)

  • Bocheng Xu

    (Zhejiang University)

  • Qingyu Yin

    (Zhejiang University)

  • Yanchao Xu

    (Zhejiang University, College of Computer Science and Technology)

  • Jie Fu

    (Shanghai Artificial Intelligence Laboratory)

  • Junbo Zhao

    (Zhejiang University, College of Computer Science and Technology)

Abstract

The human genome contains a sophisticated array of elements that regulate gene activity and organismal functions. Developing a large window foundation model capable of efficiently processing long sequence inputs is essential yet challenging for decoding the multi-layered and complex landscape of the cis-regulatory elements. Here, we introduce OmniReg-GPT, a generative foundation model designed for the low-resource pretraining of long genomic sequences by optimized attention mechanism. During pretraining, OmniReg-GPT captures the complete distribution of regulatory elements across nucleotide to megabase scales with efficient training speed and memory usage. We demonstrate exceptional performance in downstream regulotary applications spanning the entire spectrum of genomic scales, including various cis-regulatory elements identification, context dependent gene expression prediction, single-cell chromatin accessibility analysis, and 3D chromatin contact modeling. As a generative model, OmniReg-GPT also holds the potential to generate candidate cell-type-specific enhancers through prompt engineering. Overall, OmniReg-GPT extends the boundaries of foundation models in the genomic field, and provides a valuable pretraining model resource which can be extensively applied for genomic researches.

Suggested Citation

  • Aowen Wang & Jiaqi Li & Hongyu Dong & Bocheng Xu & Qingyu Yin & Yanchao Xu & Jie Fu & Junbo Zhao, 2025. "Omnireg-gpt: a high-efficiency foundation model for comprehensive genomic sequence understanding," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65066-7
    DOI: 10.1038/s41467-025-65066-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-65066-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-65066-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rongxin Fang & Sebastian Preissl & Yang Li & Xiaomeng Hou & Jacinta Lucero & Xinxin Wang & Amir Motamedi & Andrew K. Shiau & Xinzhu Zhou & Fangming Xie & Eran A. Mukamel & Kai Zhang & Yanxiao Zhang & , 2021. "Comprehensive analysis of single cell ATAC-seq data with SnapATAC," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    2. Jason D. Buenrostro & Beijing Wu & Ulrike M. Litzenburger & Dave Ruff & Michael L. Gonzales & Michael P. Snyder & Howard Y. Chang & William J. Greenleaf, 2015. "Single-cell chromatin accessibility reveals principles of regulatory variation," Nature, Nature, vol. 523(7561), pages 486-490, July.
    3. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    4. Robert E. Thurman & Eric Rynes & Richard Humbert & Jeff Vierstra & Matthew T. Maurano & Eric Haugen & Nathan C. Sheffield & Andrew B. Stergachis & Hao Wang & Benjamin Vernot & Kavita Garg & Sam John &, 2012. "The accessible chromatin landscape of the human genome," Nature, Nature, vol. 489(7414), pages 75-82, September.
    5. Zizhen Yao & Hanqing Liu & Fangming Xie & Stephan Fischer & Ricky S. Adkins & Andrew I. Aldridge & Seth A. Ament & Anna Bartlett & M. Margarita Behrens & Koen Berge & Darren Bertagnolli & Hector Roux , 2021. "A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex," Nature, Nature, vol. 598(7879), pages 103-110, October.
    6. Wouter Meuleman & Alexander Muratov & Eric Rynes & Jessica Halow & Kristen Lee & Daniel Bates & Morgan Diegel & Douglas Dunn & Fidencio Neri & Athanasios Teodosiadis & Alex Reynolds & Eric Haugen & Je, 2020. "Index and biological spectrum of human DNase I hypersensitive sites," Nature, Nature, vol. 584(7820), pages 244-251, August.
    7. Siwei Chen & Laurent C. Francioli & Julia K. Goodrich & Ryan L. Collins & Masahiro Kanai & Qingbo Wang & Jessica Alföldi & Nicholas A. Watts & Christopher Vittal & Laura D. Gauthier & Timothy Poterba , 2024. "A genomic mutational constraint map using variation in 76,156 human genomes," Nature, Nature, vol. 625(7993), pages 92-100, January.
    8. Siwei Chen & Laurent C. Francioli & Julia K. Goodrich & Ryan L. Collins & Masahiro Kanai & Qingbo Wang & Jessica Alföldi & Nicholas A. Watts & Christopher Vittal & Laura D. Gauthier & Timothy Poterba , 2024. "Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes," Nature, Nature, vol. 626(7997), pages 1-1, February.
    9. Carl G. Boer & Jussi Taipale, 2024. "Hold out the genome: a roadmap to solving the cis-regulatory code," Nature, Nature, vol. 625(7993), pages 41-50, January.
    10. Bernardo P. Almeida & Christoph Schaub & Michaela Pagani & Stefano Secchia & Eileen E. M. Furlong & Alexander Stark, 2024. "Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo," Nature, Nature, vol. 626(7997), pages 207-211, February.
    11. Sager J. Gosai & Rodrigo I. Castro & Natalia Fuentes & John C. Butts & Kousuke Mouri & Michael Alasoadura & Susan Kales & Thanh Thanh L. Nguyen & Ramil R. Noche & Arya S. Rao & Mary T. Joy & Pardis C., 2024. "Machine-guided design of cell-type-targeting cis-regulatory elements," Nature, Nature, vol. 634(8036), pages 1211-1220, October.
    12. Jill E. Moore & Michael J. Purcaro & Henry E. Pratt & Charles B. Epstein & Noam Shoresh & Jessika Adrian & Trupti Kawli & Carrie A. Davis & Alexander Dobin & Rajinder Kaul & Jessica Halow & Eric L. No, 2020. "Expanded encyclopaedias of DNA elements in the human and mouse genomes," Nature, Nature, vol. 583(7818), pages 699-710, July.
    13. Eeshit Dhaval Vaishnav & Carl G. Boer & Jennifer Molinet & Moran Yassour & Lin Fan & Xian Adiconis & Dawn A. Thompson & Joshua Z. Levin & Francisco A. Cubillos & Aviv Regev, 2022. "The evolution, evolvability and engineering of gene regulatory DNA," Nature, Nature, vol. 603(7901), pages 455-463, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Samir Rachid Zaim & Mark-Phillip Pebworth & Imran McGrath & Lauren Okada & Morgan Weiss & Julian Reading & Julie L. Czartoski & Troy R. Torgerson & M. Juliana McElrath & Thomas F. Bumol & Peter J. Ske, 2024. "MOCHA’s advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts," Nature Communications, Nature, vol. 15(1), pages 1-24, December.
    2. Alan Yue Yang Teo & Jordan W. Squair & Gregoire Courtine & Michael A. Skinnider, 2024. "Best practices for differential accessibility analysis in single-cell epigenomics," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    3. Zhen Miao & Jianqiao Wang & Kernyu Park & Da Kuang & Junhyong Kim, 2025. "Depth-corrected multi-factor dissection of chromatin accessibility for scATAC-seq data with PACS," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    4. Alan Selewa & Kaixuan Luo & Michael Wasney & Linsin Smith & Xiaotong Sun & Chenwei Tang & Heather Eckart & Ivan P. Moskowitz & Anindita Basu & Xin He & Sebastian Pott, 2023. "Single-cell genomics improves the discovery of risk variants and genes of atrial fibrillation," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    5. Leslie A. Smith & James A. Cahill & Ji-Hyun Lee & Kiley Graim, 2025. "Equitable machine learning counteracts ancestral bias in precision medicine," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    6. Amelia K. Haj & David S. Paul & Sean J. Jurgens & Harish Eswaran & Lu-Chen Weng & Justine Ryu & Alfonso Rodriguez Espada & Sharjeel Chaudhry & Louis M. Feingold & Kristen Burke & Satoshi Koyama & Xin , 2025. "Coagulation factor XII haploinsufficiency is protective against venous thromboembolism in a population-scale multidimensional analysis," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    7. Tianyu Liu & Tinglin Huang & Lijun Wang & Yingxin Lin & Rex Ying & Hongyu Zhao, 2025. "UNICORN: Towards universal cellular expression prediction with a multi-task learning framework," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    8. Julie A. I. Thoms & Feng Yan & Henry R. Hampton & Sarah Davidson & Swapna Joshi & Jesslyn Saw & Chowdhury H. Sarowar & Xin Ying Lim & Andrea C. Nunez & Purvi M. Kakadia & Golam Sarower Bhuyan & Xiaohe, 2025. "Clinical response to azacitidine in MDS is associated with distinct DNA methylation changes in HSPCs," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
    9. Kehui Xiang & David P. Bartel, 2025. "PAL-AI reveals genetic determinants that control poly(A)-tail length during oocyte maturation, with relevance to human fertility," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    10. Parker C. Wilson & Yoshiharu Muto & Haojia Wu & Anil Karihaloo & Sushrut S. Waikar & Benjamin D. Humphreys, 2022. "Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression," Nature Communications, Nature, vol. 13(1), pages 1-20, December.
    11. Jun Pyo Kim & Minyoung Cho & Chanhee Kim & Hyunwoo Lee & Beomjin Jang & Sang-Hyuk Jung & Yujin Kim & In Gyeong Koh & Seoyeon Kim & Daeun Shin & Eun Hye Lee & Jong-Young Lee & YoungChan Park & Hyemin J, 2025. "Whole-genome sequencing analyses suggest novel genetic factors associated with Alzheimer’s disease and a cumulative effects model for risk liability," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    12. Sudha Sunil Rajderkar & Kitt Paraiso & Maria Luisa Amaral & Michael Kosicki & Laura E. Cook & Fabrice Darbellay & Cailyn H. Spurrell & Marco Osterwalder & Yiwen Zhu & Han Wu & Sarah Yasmeen Afzal & Ma, 2024. "Dynamic enhancer landscapes in human craniofacial development," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    13. Fábio J. Ferreira & Mafalda Galhardo & João M. Nogueira & Joana Teixeira & Elsa Logarinho & José Bessa, 2025. "FOXM1 expression reverts aging chromatin profiles through repression of the senescence-associated pioneer factor AP-1," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    14. Shengen Shawn Hu & Lin Liu & Qi Li & Wenjing Ma & Michael J. Guertin & Clifford A. Meyer & Ke Deng & Tingting Zhang & Chongzhi Zang, 2022. "Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    15. Chenhui Zhao & Xueyan Hu & Xiudong Guan & Xiaojun Fu & Tingting Wang & Mengyuan Li & Xinze Liu & Jiarui Zhao & Di Wu & Fan Zhang & Jiaying Fu & Jiang Li & Tieqiang Zhang & Xiaochun Jiang & Changxiang , 2025. "Molecular landscape, subtypes, and therapeutic vulnerabilities of central nervous system solitary fibrous tumors," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    16. Jack W. J. Welland & Henry G. Barrow & Phillip J. Stansfeld & Janet E. Deane, 2025. "Conformational dynamics and membrane insertion mechanism of B4GALNT1 in ganglioside synthesis," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    17. Marie C. Sadler & Alexander Apostolov & Caterina Cevallos & Chiara Auwerx & Diogo M. Ribeiro & Russ B. Altman & Zoltán Kutalik, 2025. "Leveraging large-scale biobank EHRs to enhance pharmacogenetics of cardiometabolic disease medications," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    18. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    19. Jonathan E. Shoag & Amoolya Srinivasa & Caitlin A. Loh & Mei Hong Liu & Emilie Lassen & Shana Melanaphy & Benjamin M. Costa & Marta Grońska-Pęski & Nisrine T. Jabara & Shany Picciotto & Una Choi & Any, 2025. "Direct measurement of the male germline mutation rate in individuals using sequential sperm samples," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    20. Michael Kosicki & Dianne Laboy Cintrón & Pia Keukeleire & Max Schubach & Nicholas F. Page & Ilias Georgakopoulos-Soares & Jennifer A. Akiyama & Ingrid Plajzer-Frick & Catherine S. Novak & Momoe Kato &, 2025. "Massively parallel reporter assays and mouse transgenic assays provide correlated and complementary information about neuronal enhancer activity," Nature Communications, Nature, vol. 16(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65066-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.