IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i5p845-d765759.html
   My bibliography  Save this article

Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage

Author

Listed:
  • Abdur Rasool

    (Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
    Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen 518055, China)

  • Qiang Qu

    (Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China)

  • Yang Wang

    (Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China)

  • Qingshan Jiang

    (Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China)

Abstract

DNA has evolved as a cutting-edge medium for digital information storage due to its extremely high density and durable preservation to accommodate the data explosion. However, the strings of DNA are prone to errors during the hybridization process. In addition, DNA synthesis and sequences come with a cost that depends on the number of nucleotides present. An efficient model to store a large amount of data in a small number of nucleotides is essential, and it must control the hybridization errors among the base pairs. In this paper, a novel computational model is presented to design large DNA libraries of oligonucleotides. It is established by integrating a neural network (NN) with combinatorial biological constraints, including constant GC-content and satisfying Hamming distance and reverse-complement constraints. We develop a simple and efficient implementation of NNs to produce the optimal DNA codes, which opens the door to applying neural networks for DNA-based data storage. Further, the combinatorial bio-constraints are introduced to improve the lower bounds and to avoid the occurrence of errors in the DNA codes. Our goal is to compute large DNA codes in shorter sequences, which should avoid non-specific hybridization errors by satisfying the bio-constrained coding. The proposed model yields a significant improvement in the DNA library by explicitly constructing larger codes than the prior published codes.

Suggested Citation

  • Abdur Rasool & Qiang Qu & Yang Wang & Qingshan Jiang, 2022. "Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage," Mathematics, MDPI, vol. 10(5), pages 1-21, March.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:5:p:845-:d:765759
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/5/845/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/5/845/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Qian Liu & Li Fang & Guoliang Yu & Depeng Wang & Chuan-Le Xiao & Kai Wang, 2019. "Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data," Nature Communications, Nature, vol. 10(1), pages 1-11, December.
    2. Nick Goldman & Paul Bertone & Siyuan Chen & Christophe Dessimoz & Emily M. LeProust & Botond Sipos & Ewan Birney, 2013. "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA," Nature, Nature, vol. 494(7435), pages 77-80, February.
    3. Jin, Xin & Nie, Rencan & Zhou, Dongming & Yao, Shaowen & Chen, Yanyan & Yu, Jiefu & Wang, Quan, 2016. "A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 461(C), pages 325-338.
    4. Jinny X. Zhang & Boyan Yordanov & Alexander Gaunt & Michael X. Wang & Peng Dai & Yuan-Jyue Chen & Kerou Zhang & John Z. Fang & Neil Dalchau & Jiaming Li & Andrew Phillips & David Yu Zhang, 2021. "A deep learning model for predicting next-generation sequencing depth from DNA sequence," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eseosa Halima Ighile & Hiroaki Shirakawa & Hiroki Tanikawa, 2022. "Application of GIS and Machine Learning to Predict Flood Areas in Nigeria," Sustainability, MDPI, vol. 14(9), pages 1-33, April.
    2. Mian Umair Ahsan & Anagha Gouru & Joe Chan & Wanding Zhou & Kai Wang, 2024. "A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    3. Chao Pan & S. Kasra Tabatabaei & S. M. Hossein Tabatabaei Yazdi & Alvaro G. Hernandez & Charles M. Schroeder & Olgica Milenkovic, 2022. "Rewritable two-dimensional DNA-based data storage with machine learning reconstruction," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    4. Cheng Kai Lim & Jing Wui Yeoh & Aurelius Andrew Kunartama & Wen Shan Yew & Chueh Loo Poh, 2023. "A biological camera that captures and stores images directly into DNA," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    5. Gaolian Xu & Hao Yang & Jiani Qiu & Julien Reboud & Linqing Zhen & Wei Ren & Hong Xu & Jonathan M. Cooper & Hongchen Gu, 2023. "Sequence terminus dependent PCR for site-specific mutation and modification detection," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    6. Shekaari, Ashkan & Jafari, Mahmoud, 2019. "Statistical mechanical modeling of a DNA nanobiostructure at the base-pair level," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 518(C), pages 80-88.
    7. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:5:p:845-:d:765759. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.