Author
Listed:
- Yongtao Ye
- Marcus H Shum
- Joseph L Tsui
- Guangchuang Yu
- David K Smith
- Huachen Zhu
- Joseph T Wu
- Yi Guan
- Tommy Tsan-Yuk Lam
Abstract
Massive sequencing of SARS-CoV-2 genomes has urged novel methods that employ existing phylogenies to add new samples efficiently instead of de novo inference. ‘TIPars’ was developed for such challenge integrating parsimony analysis with pre-computed ancestral sequences. It took about 21 seconds to insert 100 SARS-CoV-2 genomes into a 100k-taxa reference tree using 1.4 gigabytes. Benchmarking on four datasets, TIPars achieved the highest accuracy for phylogenies of moderately similar sequences. For highly similar and divergent scenarios, fully parsimony-based and likelihood-based phylogenetic placement methods performed the best respectively while TIPars was the second best. TIPars accomplished efficient and accurate expansion of phylogenies of both similar and divergent sequences, which would have broad biological applications beyond SARS-CoV-2. TIPars is accessible from https://tipars.hku.hk/ and source codes are available at https://github.com/id-bioinfo/TIPars.Author summary: Since the beginning of the COVID-19 pandemic, over 15 million SARS-CoV-2 genome sequences have been made publicly available. As sequencing cost decreases, the rate of genome sequencing is expected to greatly increase in the future and will generate numerous sequences where conventional de novo phylogenetic inference may no longer be suitable. TIPars allows rapid and memory-efficient expansion of phylogeny at high accuracy. This enables real-time monitoring of pathogen transmission during a pandemic using large-scale global phylogenetic analysis such as the ever-increasing SARS-CoV-2 genome sequences. We believe that the development of next-generation phylogenetic methods is imperative for analysing enormous, fast-growing genome sequence datasets to gain critical evolutionary insights that, as evident in this pandemic, have real-world applications.
Suggested Citation
Yongtao Ye & Marcus H Shum & Joseph L Tsui & Guangchuang Yu & David K Smith & Huachen Zhu & Joseph T Wu & Yi Guan & Tommy Tsan-Yuk Lam, 2024.
"Robust expansion of phylogeny for fast-growing genome sequence data,"
PLOS Computational Biology, Public Library of Science, vol. 20(2), pages 1-22, February.
Handle:
RePEc:plo:pcbi00:1011871
DOI: 10.1371/journal.pcbi.1011871
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011871. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.