IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0132868.html
   My bibliography  Save this article

elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

Author

Listed:
  • Charlotte Herzeel
  • Pascal Costanza
  • Dries Decap
  • Jan Fostier
  • Joke Reumers

Abstract

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

Suggested Citation

  • Charlotte Herzeel & Pascal Costanza & Dries Decap & Jan Fostier & Joke Reumers, 2015. "elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
  • Handle: RePEc:plo:pone00:0132868
    DOI: 10.1371/journal.pone.0132868
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132868
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0132868&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0132868?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Charlotte Herzeel & Pascal Costanza & Dries Decap & Jan Fostier & Wilfried Verachtert, 2019. "elPrep 4: A multithreaded framework for sequence analysis," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-16, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0132868. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.