IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-62347-z.html
   My bibliography  Save this article

Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning

Author

Listed:
  • Mauricio Perez

    (Genome)

  • Michiko Kimoto

    (Nanos
    #02-08 The Cavendish)

  • Priscilla Rajakumar

    (Genome)

  • Chayaporn Suphavilai

    (Genome)

  • Rafael Peres da Silva

    (Genome)

  • Hui Pen Tan

    (Nanos
    #02-08 The Cavendish)

  • Nicholas Ting Xun Ong

    (Genome)

  • Hannah Nicholas

    (Genome)

  • Ichiro Hirao

    (Nanos
    #02-08 The Cavendish)

  • Wei Leong Chew

    (Genome
    National University of Singapore)

  • Niranjan Nagarajan

    (Genome
    National University of Singapore)

Abstract

The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. We demonstrate that XNAs with NCBs can be robustly sequenced on a MinION system ( > 2.3×106 reads/flowcell) to obtain significantly distinct signals from controls (median fold-change >6×). To enable AI-model training, we synthesized and sequenced a complex pool of 1,024 NCB-containing oligonucleotides with varied 6-mer contexts and high purity ( > 90%). Bootstrapped models assisted in data preparation, and data augmentation with spliced reads provided high context diversity, enabling learning of generalizable models to decipher NCB-containing sequences with high accuracy ( > 80%) and specificity (99%). These results highlight the versatility of nanopore sequencing for interrogating unusual nucleic acids, and the potential to transform the study of genetic material beyond those that use canonical bases.

Suggested Citation

  • Mauricio Perez & Michiko Kimoto & Priscilla Rajakumar & Chayaporn Suphavilai & Rafael Peres da Silva & Hui Pen Tan & Nicholas Ting Xun Ong & Hannah Nicholas & Ichiro Hirao & Wei Leong Chew & Niranjan , 2025. "Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-62347-z
    DOI: 10.1038/s41467-025-62347-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-62347-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-62347-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Denis A. Malyshev & Kirandeep Dhami & Thomas Lavergne & Tingjian Chen & Nan Dai & Jeremy M. Foster & Ivan R. Corrêa & Floyd E. Romesberg, 2014. "A semi-synthetic organism with an expanded genetic alphabet," Nature, Nature, vol. 509(7500), pages 385-388, May.
    2. Nick Goldman & Paul Bertone & Siyuan Chen & Christophe Dessimoz & Emily M. LeProust & Botond Sipos & Ewan Birney, 2013. "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA," Nature, Nature, vol. 494(7435), pages 77-80, February.
    3. Yorke Zhang & Jerod L. Ptacin & Emil C. Fischer & Hans R. Aerni & Carolina E. Caffaro & Kristine San Jose & Aaron W. Feldman & Court R. Turner & Floyd E. Romesberg, 2017. "A semi-synthetic organism that stores and retrieves increased genetic information," Nature, Nature, vol. 551(7682), pages 644-647, November.
    4. Giorgino, Toni, 2009. "Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 31(i07).
    5. Hinako Kawabe & Christopher A. Thomas & Shuichi Hoshika & Myong-Jung Kim & Myong-Sang Kim & Logan Miessner & Nicholas Kaplan & Jonathan M. Craig & Jens H. Gundlach & Andrew H. Laszlo & Steven A. Benne, 2023. "Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christopher A. Thomas & Henry Brinkerhoff & Jonathan M. Craig & Shuichi Hoshika & Desislava Mihaylova & Akira M. Pfeffer & Michaela C. Franzi & Sarah J. Abell & Jessica D. Carrasco & Jens H. Gundlach , 2025. "Sequencing a DNA analog composed of artificial bases," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    2. Bang Wang & Kevin M. Bradley & Myong-Jung Kim & Roberto Laos & Cen Chen & Dietlind L. Gerloff & Luran Manfio & Zunyi Yang & Steven A. Benner, 2024. "Enzyme-assisted high throughput sequencing of an expanded genetic alphabet at single base resolution," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    3. Amato, Umberto & Antoniadis, Anestis & De Feis, Italia & Goude, Yannig & Lagache, Audrey, 2021. "Forecasting high resolution electricity demand data with additive models including smooth and jagged components," International Journal of Forecasting, Elsevier, vol. 37(1), pages 171-185.
    4. Mastroeni, Loretta & Mazzoccoli, Alessandro & Quaresima, Greta & Vellucci, Pierluigi, 2021. "Decoupling and recoupling in the crude oil price benchmarks: An investigation of similarity patterns," Energy Economics, Elsevier, vol. 94(C).
    5. Christoph J. Borner & Ingo Hoffmann & Jonas Krettek & Lars M. Kurzinger & Tim Schmitz, 2021. "Bitcoin: Like a Satellite or Always Hardcore? A Core-Satellite Identification in the Cryptocurrency Market," Papers 2105.12336, arXiv.org.
    6. Qingyuan Fan & Xuyang Zhao & Junyao Li & Ronghui Liu & Ming Liu & Qishun Feng & Yanping Long & Yang Fu & Jixian Zhai & Qing Pan & Yi Li, 2025. "De novo non-canonical nanopore basecalling enables private communication using heavily-modified DNA data at single-molecule level," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    7. Vatsa, Puneet & Miljkovic, Tatjana & Miljkovic, Dragan, 2024. "Price discovery redux—Analyzing energy spot and futures prices using a dynamic programming approach," Energy Economics, Elsevier, vol. 140(C).
    8. Hanjo Odendaal & Monique Reid & Johann F. Kirsten, 2020. "Media‐Based Sentiment Indices as an Alternative Measure of Consumer Confidence," South African Journal of Economics, Economic Society of South Africa, vol. 88(4), pages 409-434, December.
    9. Krzysztof Dmytrow & Beata Bieszk-Stolorz, 2021. "Comparison of changes in the labour markets of post-communist countries with other EU member states," Equilibrium. Quarterly Journal of Economics and Economic Policy, Institute of Economic Research, vol. 16(4), pages 741-764, December.
    10. Chao Pan & S. Kasra Tabatabaei & S. M. Hossein Tabatabaei Yazdi & Alvaro G. Hernandez & Charles M. Schroeder & Olgica Milenkovic, 2022. "Rewritable two-dimensional DNA-based data storage with machine learning reconstruction," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Yangchen Di & Mingyue Lu & Min Chen & Zhangjian Chen & Zaiyang Ma & Manzhu Yu, 2022. "A quantitative method for the similarity assessment of typhoon tracks," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 112(1), pages 587-602, May.
    12. De Gregorio, Alessandro & Maria Iacus, Stefano, 2010. "Clustering of discretely observed diffusion processes," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 598-606, February.
    13. Szczepocki Piotr, 2019. "Clustering Companies Listed on the Warsaw Stock Exchange According to Time-Varying Beta," Econometrics. Advances in Applied Data Analysis, Sciendo, vol. 23(2), pages 63-79, June.
    14. Klose, Jens & Barry, Mamdou-Lamine & Bruns, Brenton Joey & Kandemir, Sinem & Smirnov, Victor & Tillmann, Peter, 2025. "The Emotions of Monetary Policy," VfS Annual Conference 2025 (Cologne): Revival of Industrial Policy 325365, Verein für Socialpolitik / German Economic Association.
    15. Corey Ducharme & Bruno Agard & Martin Trépanier, 2024. "Improving demand forecasting for customers with missing downstream data in intermittent demand supply chains with supervised multivariate clustering," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(5), pages 1661-1681, August.
    16. Sokhna Dieng & Pierre Michel & Abdoulaye Guindo & Kankoe Sallah & El-Hadj Ba & Badara Cissé & Maria Patrizia Carrieri & Cheikh Sokhna & Paul Milligan & Jean Gaudart, 2020. "Application of Functional Data Analysis to Identify Patterns of Malaria Incidence, to Guide Targeted Control Strategies," IJERPH, MDPI, vol. 17(11), pages 1-23, June.
    17. Krzysztof Dmytrów & Joanna Landmesser & Beata Bieszk-Stolorz, 2021. "The Connections between COVID-19 and the Energy Commodities Prices: Evidence through the Dynamic Time Warping Method," Energies, MDPI, vol. 14(13), pages 1-23, July.
    18. Jingwei Hong & Abdur Rasool & Shuo Wang & Djemel Ziou & Qingshan Jiang, 2024. "VSD: A Novel Method for Video Segmentation and Storage in DNA Using RS Code," Mathematics, MDPI, vol. 12(8), pages 1-21, April.
    19. Abdur Rasool & Qiang Qu & Yang Wang & Qingshan Jiang, 2022. "Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage," Mathematics, MDPI, vol. 10(5), pages 1-21, March.
    20. Lei Pei & Michele Garfinkel & Markus Schmidt, 2022. "Bottlenecks and opportunities for synthetic biology biosafety standards," Nature Communications, Nature, vol. 13(1), pages 1-4, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-62347-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.