IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-61264-5.html
   My bibliography  Save this article

Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design

Author

Listed:
  • Carina Imburgia

    (Paul G. Allen School of Computer Science and Engineering)

  • Lee Organick

    (Paul G. Allen School of Computer Science and Engineering)

  • Karen Zhang

    (Paul G. Allen School of Computer Science and Engineering)

  • Nicolas Cardozo

    (Paul G. Allen School of Computer Science and Engineering)

  • Jeff McBride

    (Paul G. Allen School of Computer Science and Engineering)

  • Callista Bee

    (Paul G. Allen School of Computer Science and Engineering)

  • Delaney Wilde

    (Paul G. Allen School of Computer Science and Engineering)

  • Gwendolin Roote

    (Paul G. Allen School of Computer Science and Engineering)

  • Sophia Jorgensen

    (Paul G. Allen School of Computer Science and Engineering)

  • David Ward

    (Paul G. Allen School of Computer Science and Engineering)

  • Charlie Anderson

    (Paul G. Allen School of Computer Science and Engineering)

  • Karin Strauss

    (Microsoft Research)

  • Luis Ceze

    (Paul G. Allen School of Computer Science and Engineering)

  • Jeff Nivala

    (Paul G. Allen School of Computer Science and Engineering
    Molecular Engineering and Sciences Institute)

Abstract

DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low-latency molecular data extraction. We first present a one-pot, multiplexed random access method in which specific data files are selectively cleaved using a CRISPR-Cas9 addressing system and then sequenced via nanopore technology. This approach was validated on a pool of 1.6 million DNA sequences, comprising 25 unique data files. We then developed a molecular similarity-search approach combining machine learning with Cas9-based retrieval. Using a deep neural network, we mapped a database of 1.74 million images into a reduced-dimensional embedding, encoding each embedding as a Cas9 target sequence. These target sequences act as molecular addresses, capturing clusters of semantically related images. By leveraging Cas9’s off-target cleavage activity, query sequences cleave both exact and closely related targets, enabling high-fidelity retrieval of molecular addresses corresponding to in silico image clusters similar to the query. These approaches move towards addressing key challenges in molecular data retrieval by offering simplified, rapid isothermal protocols and new DNA data access capabilities.

Suggested Citation

  • Carina Imburgia & Lee Organick & Karen Zhang & Nicolas Cardozo & Jeff McBride & Callista Bee & Delaney Wilde & Gwendolin Roote & Sophia Jorgensen & David Ward & Charlie Anderson & Karin Strauss & Luis, 2025. "Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61264-5
    DOI: 10.1038/s41467-025-61264-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-61264-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-61264-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jiongyu Zhang & Chengyu Hou & Changchun Liu, 2024. "CRISPR-powered quantitative keyword search engine in DNA data storage," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    2. Nick Goldman & Paul Bertone & Siyuan Chen & Christophe Dessimoz & Emily M. LeProust & Botond Sipos & Ewan Birney, 2013. "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA," Nature, Nature, vol. 494(7435), pages 77-80, February.
    3. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    4. Callista Bee & Yuan-Jyue Chen & Melissa Queen & David Ward & Xiaomeng Liu & Lee Organick & Georg Seelig & Karin Strauss & Luis Ceze, 2021. "Molecular-level similarity search brings computing to DNA data storage," Nature Communications, Nature, vol. 12(1), pages 1-9, December.
    5. Jay Shendure & Shankar Balasubramanian & George M. Church & Walter Gilbert & Jane Rogers & Jeffery A. Schloss & Robert H. Waterston, 2017. "DNA sequencing at 40: past, present and future," Nature, Nature, vol. 550(7676), pages 345-353, October.
    6. Lee Organick & Yuan-Jyue Chen & Siena Dumas Ang & Randolph Lopez & Xiaomeng Liu & Karin Strauss & Luis Ceze, 2020. "Author Correction: Probing the physical limits of reliable DNA data retrieval," Nature Communications, Nature, vol. 11(1), pages 1-1, December.
    7. Janice S. Chen & Yavuz S. Dagdas & Benjamin P. Kleinstiver & Moira M. Welch & Alexander A. Sousa & Lucas B. Harrington & Samuel H. Sternberg & J. Keith Joung & Ahmet Yildiz & Jennifer A. Doudna, 2017. "Enhanced proofreading governs CRISPR–Cas9 targeting accuracy," Nature, Nature, vol. 550(7676), pages 407-410, October.
    8. Lee Organick & Yuan-Jyue Chen & Siena Dumas Ang & Randolph Lopez & Xiaomeng Liu & Karin Strauss & Luis Ceze, 2020. "Probing the physical limits of reliable DNA data retrieval," Nature Communications, Nature, vol. 11(1), pages 1-7, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christopher N. Takahashi & David P. Ward & Carlo Cazzaniga & Christopher Frost & Paolo Rech & Kumkum Ganguly & Sean Blanchard & Steve Wender & Bichlien H. Nguyen & Jake A. Smith, 2024. "Evaluating the risk of data loss due to particle radiation damage in a DNA data storage system," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    2. Jiongyu Zhang & Chengyu Hou & Changchun Liu, 2024. "CRISPR-powered quantitative keyword search engine in DNA data storage," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    3. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    4. Zhi Weng & Jiangxue Li & Yi Wu & Xuehao Xiu & Fei Wang & Xiaolei Zuo & Ping Song & Chunhai Fan, 2025. "Massively parallel homogeneous amplification of chip-scale DNA for DNA information storage (MPHAC-DIS)," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    5. Cheng Kai Lim & Jing Wui Yeoh & Aurelius Andrew Kunartama & Wen Shan Yew & Chueh Loo Poh, 2023. "A biological camera that captures and stores images directly into DNA," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    6. Jianxia Zhang, 2022. "Levy Equilibrium Optimizer algorithm for the DNA storage code set," PLOS ONE, Public Library of Science, vol. 17(11), pages 1-14, November.
    7. Christoph Ziegenhain & Rickard Sandberg, 2021. "BAMboozle removes genetic variation from human sequence data for open data sharing," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    8. Jiajia Lin & Ming Jin & Dong Yang & Zhifang Li & Yu Zhang & Qingquan Xiao & Yin Wang & Yuyang Yu & Xiumei Zhang & Zhurui Shao & Linyu Shi & Shu Zhang & Wan-jin Chen & Ning Wang & Shiwen Wu & Hui Yang , 2024. "Adenine base editing-mediated exon skipping restores dystrophin in humanized Duchenne mouse model," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    9. Burcu Bestas & Sandra Wimberger & Dmitrii Degtev & Alexandra Madsen & Antje K. Rottner & Fredrik Karlsson & Sergey Naumenko & Megan Callahan & Julia Liz Touza & Margherita Francescatto & Carl Ivar Möl, 2023. "A Type II-B Cas9 nuclease with minimized off-targets and reduced chromosomal translocations in vivo," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    10. Abdur Rasool & Qiang Qu & Yang Wang & Qingshan Jiang, 2022. "Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage," Mathematics, MDPI, vol. 10(5), pages 1-21, March.
    11. Péter István Kulcsár & András Tálas & Zoltán Ligeti & Eszter Tóth & Zsófia Rakvács & Zsuzsa Bartos & Sarah Laura Krausz & Ágnes Welker & Vanessza Laura Végi & Krisztina Huszár & Ervin Welker, 2023. "A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    12. Giulia I. Corsi & Kunli Qu & Ferhat Alkan & Xiaoguang Pan & Yonglun Luo & Jan Gorodkin, 2022. "CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    13. Zsolt Bodai & Alena L. Bishop & Valentino M. Gantz & Alexis C. Komor, 2022. "Targeting double-strand break indel byproducts with secondary guide RNAs improves Cas9 HDR-mediated genome editing efficiencies," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    14. Dawn G. L. Thean & Hoi Yee Chu & John H. C. Fong & Becky K. C. Chan & Peng Zhou & Cynthia C. S. Kwok & Yee Man Chan & Silvia Y. L. Mak & Gigi C. G. Choi & Joshua W. K. Ho & Zongli Zheng & Alan S. L. W, 2022. "Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    15. Jian Wang & Ke Wang & Zhe Deng & Zhiyu Zhong & Guo Sun & Qing Mei & Fuling Zhou & Zixin Deng & Yuhui Sun, 2024. "Engineered cytosine base editor enabling broad-scope and high-fidelity gene editing in Streptomyces," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    16. Susanne Theuerl & Christiane Herrmann & Monika Heiermann & Philipp Grundmann & Niels Landwehr & Ulrich Kreidenweis & Annette Prochnow, 2019. "The Future Agricultural Biogas Plant in Germany: A Vision," Energies, MDPI, vol. 12(3), pages 1-32, January.
    17. Yanbo Wang & W. Taylor Cottle & Haobo Wang & Momcilo Gavrilov & Roger S. Zou & Minh-Tam Pham & Srinivasan Yegnasubramanian & Scott Bailey & Taekjip Ha, 2022. "Achieving single nucleotide sensitivity in direct hybridization genome imaging," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    18. Jianli Tao & Daniel E. Bauer & Roberto Chiarle, 2023. "Assessing and advancing the safety of CRISPR-Cas tools: from DNA to RNA editing," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    19. Woojin Kim & Mingweon Chon & Yoonhae Koh & Hansol Choi & Eunjin Choi & Hyewon Park & Yushin Jung & Taehoon Ryu & Sunghoon Kwon & Yeongjae Choi, 2025. "Oligonucleotide subsets selection by single nucleotide resolution barcode identification," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
    20. Annabel K. Sangree & Audrey L. Griffith & Zsofia M. Szegletes & Priyanka Roy & Peter C. DeWeirdt & Mudra Hegde & Abby V. McGee & Ruth E. Hanna & John G. Doench, 2022. "Benchmarking of SpCas9 variants enables deeper base editor screens of BRCA1 and BCL2," Nature Communications, Nature, vol. 13(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61264-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.