Author
Listed:
- Jun Mencius
(Fudan University)
- Wenjun Chen
(Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine)
- Youqi Zheng
(Fudan University)
- Tingyi An
(Fudan University)
- Yongguo Yu
(Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine)
- Kun Sun
(Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine
Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine)
- Huijuan Feng
(Fudan University)
- Zhixing Feng
(Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine)
Abstract
As nanopore sequencing has been widely adopted, data accumulation has surged, resulting in over 700,000 public datasets. While these data hold immense potential for advancing genomic research, their utility is compromised by the absence of flowcell type and basecaller configuration in about 85% of the data and associated publications. These parameters are essential for many analysis algorithms, and their misapplication can lead to significant drops in performance. To address this issue, we present LongBow, designed to infer flowcell type and basecaller configuration directly from the base quality value patterns of FASTQ files. LongBow has been tested on 66 in-house basecalled FAST5/POD5 datasets and 1989 public FASTQ datasets, achieving accuracies of 95.33% and 91.45%, respectively. We demonstrate its utility by reanalyzing nanopore sequencing data from the COVID-19 Genomics UK (COG-UK) project. The results show that LongBow is essential for reproducing reported genomic variants and, through a LongBow-based analysis pipeline, we discovered substantially more functionally important variants while improving accuracy in lineage assignment. Overall, LongBow is poised to play a critical role in maximizing the utility of public nanopore sequencing data, while significantly enhancing the reproducibility of related research.
Suggested Citation
Jun Mencius & Wenjun Chen & Youqi Zheng & Tingyi An & Yongguo Yu & Kun Sun & Huijuan Feng & Zhixing Feng, 2025.
"Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data,"
Nature Communications, Nature, vol. 16(1), pages 1-19, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59378-x
DOI: 10.1038/s41467-025-59378-x
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59378-x. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.