IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-65077-4.html
   My bibliography  Save this article

DNALONGBENCH: a benchmark suite for long-range DNA prediction tasks

Author

Listed:
  • Wenduo Cheng

    (Carnegie Mellon University)

  • Zhenqiao Song

    (Carnegie Mellon University)

  • Yang Zhang

    (Carnegie Mellon University)

  • Shike Wang

    (Carnegie Mellon University)

  • Danqing Wang

    (Carnegie Mellon University)

  • Muyu Yang

    (Carnegie Mellon University)

  • Lei Li

    (Carnegie Mellon University)

  • Jian Ma

    (Carnegie Mellon University)

Abstract

Modeling long-range DNA dependencies is crucial for understanding genome structure and function across diverse biological contexts. However, effectively capturing these dependencies, which may span millions of base pairs in tasks such as three-dimensional (3D) chromatin folding prediction, remains a major challenge. A comprehensive benchmark suite for evaluating tasks that rely on long-range dependencies is notably absent. To address this gap, we introduce DNALONGBENCH, a benchmark dataset covering five key genomics tasks with long-range dependencies up to 1 million base pairs: enhancer-target gene interaction, expression quantitative trait loci, 3D genome organization, regulatory sequence activity, and transcription initiation signals. We assess DNALONGBENCH using five methods: a task-specific expert model, a convolutional neural network (CNN)-based model, and three fine-tuned DNA foundation models – HyenaDNA, Caduceus-Ph, and Caduceus-PS. We envision DNALONGBENCH as a standardized resource to enable comprehensive comparisons and rigorous evaluations of emerging DNA sequence-based deep learning models that account for long-range dependencies.

Suggested Citation

  • Wenduo Cheng & Zhenqiao Song & Yang Zhang & Shike Wang & Danqing Wang & Muyu Yang & Lei Li & Jian Ma, 2025. "DNALONGBENCH: a benchmark suite for long-range DNA prediction tasks," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65077-4
    DOI: 10.1038/s41467-025-65077-4
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-65077-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-65077-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David R Kelley, 2020. "Cross-species regulatory sequence activity prediction," PLOS Computational Biology, Public Library of Science, vol. 16(7), pages 1-27, July.
    2. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Natalie DeForest & Yuqi Wang & Zhiyi Zhu & Jacqueline S. Dron & Ryan Koesterer & Pradeep Natarajan & Jason Flannick & Tiffany Amariuta & Gina M. Peloso & Amit R. Majithia, 2024. "Genome-wide discovery and integrative genomic characterization of insulin resistance loci using serum triglycerides to HDL-cholesterol ratio as a proxy," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    2. Zhenhua Zhang & Wenchao Li & Qiuyao Zhan & Michelle Aillaud & Javier Botey-Bataller & Martijn Zoodsma & Rob Horst & Leo A. B. Joosten & Christoph Bock & Leon N. Schulte & Cheng-Jian Xu & Mihai G. Nete, 2025. "Unveiling genetic signatures of immune response in immune-related diseases through single-cell eQTL analysis across diverse conditions," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    3. Sylvia Hartmann & Summaira Yasmeen & Benjamin M. Jacobs & Spiros Denaxas & Munir Pirmohamed & Eric R. Gamazon & Mark J. Caulfield & Harry Hemingway & Maik Pietzner & Claudia Langenberg, 2023. "ADRA2A and IRX1 are putative risk genes for Raynaud’s phenomenon," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    4. Aowen Wang & Jiaqi Li & Hongyu Dong & Bocheng Xu & Qingyu Yin & Yanchao Xu & Jie Fu & Junbo Zhao, 2025. "Omnireg-gpt: a high-efficiency foundation model for comprehensive genomic sequence understanding," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    5. Emma Hazelwood & Daffodil M. Canson & Benedita Deslandes & Xuemin Wang & Pik Fang Kho & Danny Legge & Andrei-Emil Constantinescu & Matthew A. Lee & D. Timothy Bishop & Andrew T. Chan & Stephen B. Grub, 2025. "Multi-tissue expression and splicing data prioritise anatomical subsite- and sex-specific colorectal cancer susceptibility genes," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    6. Eugene Lin & Yu-Ting Yan & Mu-Hong Chen & Albert C. Yang & Po-Hsiu Kuo & Shih-Jen Tsai, 2025. "Gene clusters linked to insulin resistance identified in a genome-wide study of the Taiwan Biobank population," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    7. Isabelle Austin-Zimmerman & Daniel F. Levey & Olga Giannakopoulou & Joseph D. Deak & Marco Galimberti & Keyrun Adhikari & Hang Zhou & Spiros Denaxas & Haritz Irizar & Karoline Kuchenbaecker & Andrew M, 2023. "Genome-wide association studies and cross-population meta-analyses investigating short and long sleep duration," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    8. Aino Salminen & Kati Hyvärinen & Jarmo Ritari & Jussi M. Leppilahti & Ulla Palotie & Ville Vuollo & Oleg Kambur & Kadri Reis & Anu Reigo & Priit Palta & Markus Perola & Juha Sinisalo & Aki S. Havulinn, 2025. "Genome-wide association study of pulpal and apical diseases," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    9. Nathan LaPierre & Kodi Taraszka & Helen Huang & Rosemary He & Farhad Hormozdiari & Eleazar Eskin, 2021. "Identifying causal variants by fine mapping across multiple studies," PLOS Genetics, Public Library of Science, vol. 17(9), pages 1-19, September.
    10. Bayram Cevdet Akdeniz & Oleksandr Frei & Alexey Shadrin & Dmitry Vetrov & Dmitry Kropotov & Eivind Hovig & Ole A Andreassen & Anders M Dale, 2024. "Finemap-MiXeR: A variational Bayesian approach for genetic finemapping," PLOS Genetics, Public Library of Science, vol. 20(8), pages 1-21, August.
    11. Yunfeng Huang & Dora Bodnar & Chia-Yen Chen & Gabriela Sanchez-Andrade & Mark Sanderson & Jun Shi & Katherine G. Meilleur & Matthew E. Hurles & Sebastian S. Gerety & Ellen A. Tsai & Heiko Runz, 2023. "Rare genetic variants impact muscle strength," Nature Communications, Nature, vol. 14(1), pages 1-8, December.
    12. Huiying He & Yue Leng & Xinglan Cao & Yiwang Zhu & Xiaoxia Li & Qiaoling Yuan & Bin Zhang & Wenchuang He & Hua Wei & Xiangpei Liu & Qiang Xu & Mingliang Guo & Hong Zhang & Longbo Yang & Yang Lv & Xian, 2024. "The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    13. Mingxuan Cai & Zhiwei Wang & Jiashun Xiao & Xianghong Hu & Gang Chen & Can Yang, 2023. "XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    14. Linda Ottensmann & Rubina Tabassum & Sanni E. Ruotsalainen & Mathias J. Gerl & Christian Klose & Elisabeth Widén & Kai Simons & Samuli Ripatti & Matti Pirinen, 2023. "Genome-wide association analysis of plasma lipidome identifies 495 genetic associations," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    15. N. Hernández & J. Soenksen & P. Newcombe & M. Sandhu & I. Barroso & C. Wallace & J. L. Asimit, 2021. "The flashfm approach for fine-mapping multiple quantitative traits," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    16. Priya Gupta & Marco Galimberti & Yue Liu & Sarah Beck & Aliza Wingo & Thomas Wingo & Keyrun Adhikari & Henry R. Kranzler & Murray B. Stein & Joel Gelernter & Daniel F. Levey, 2024. "A genome-wide investigation into the underlying genetic architecture of personality traits and overlap with psychopathology," Nature Human Behaviour, Nature, vol. 8(11), pages 2235-2249, November.
    17. Tzu-Ting Chen & Jaeyoung Kim & Max Lam & Yi-Fang Chuang & Yen-Ling Chiu & Shu-Chin Lin & Sang-Hyuk Jung & Beomsu Kim & Soyeon Kim & Chamlee Cho & Injeong Shim & Sanghyeon Park & Yeeun Ahn & Aysu Okbay, 2024. "Shared genetic architectures of educational attainment in East Asian and European populations," Nature Human Behaviour, Nature, vol. 8(3), pages 562-575, March.
    18. Hongru Li & Jingyi Zhao & Jinglan Dai & Dongfang You & Yang Zhao & David C. Christiani & Feng Chen & Sipeng Shen, 2025. "Multi-ancestry sequencing-based genome-wide association study of C-reactive protein in 513,273 genomes," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    19. Alan E. Murphy & William Beardall & Marek Rei & Mike Phuycharoen & Nathan G. Skene, 2024. "Predicting cell type-specific epigenomic profiles accounting for distal genetic effects," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    20. Jiapei Yuan & Yang Tong & Le Wang & Xiaoxiao Yang & Xiaochuan Liu & Meng Shu & Zekun Li & Wen Jin & Chenchen Guan & Yuting Wang & Qiang Zhang & Yang Yang, 2024. "A compendium of genetic variations associated with promoter usage across 49 human tissues," Nature Communications, Nature, vol. 15(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65077-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.