IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v11y2020i1d10.1038_s41467-020-19777-8.html
   My bibliography  Save this article

Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ

Author

Listed:
  • Ilia Minkin

    (Department of Computer Science and Engineering, The Pennsylvania State University)

  • Paul Medvedev

    (Department of Computer Science and Engineering, The Pennsylvania State University
    Department of Biochemistry and Molecular Biology, The Pennsylvania State University
    Center for Computational Biology and Bioinformatics, The Pennsylvania State University)

Abstract

Multiple whole-genome alignment is a challenging problem in bioinformatics. Despite many successes, current methods are not able to keep up with the growing number, length, and complexity of assembled genomes, especially when computational resources are limited. Approaches based on compacted de Bruijn graphs to identify and extend anchors into locally collinear blocks have potential for scalability, but current methods do not scale to mammalian genomes. We present an algorithm, SibeliaZ-LCB, for identifying collinear blocks in closely related genomes based on analysis of the de Bruijn graph. We further incorporate this into a multiple whole-genome alignment pipeline called SibeliaZ. SibeliaZ shows run-time improvements over other methods while maintaining accuracy. On sixteen recently-assembled strains of mice, SibeliaZ runs in under 16 hours on a single machine, while other tools did not run to completion for eight mice within a week. SibeliaZ makes a significant step towards improving scalability of multiple whole-genome alignment and collinear block reconstruction algorithms on a single machine.

Suggested Citation

  • Ilia Minkin & Paul Medvedev, 2020. "Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-19777-8
    DOI: 10.1038/s41467-020-19777-8
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-020-19777-8
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-020-19777-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. A. Talenti & J. Powell & J. D. Hemmink & E. A. J. Cook & D. Wragg & S. Jayaraman & E. Paxton & C. Ezeasor & E. T. Obishakin & E. R. Agusi & A. Tijjani & W. Amanyire & D. Muhanguzi & K. Marshall & A. F, 2022. "A cattle graph genome incorporating global breed diversity," Nature Communications, Nature, vol. 13(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-19777-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.