IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0211608.html
   My bibliography  Save this article

Managing genomic variant calling workflows with Swift/T

Author

Listed:
  • Azza E Ahmed
  • Jacob Heldenbrand
  • Yan Asmann
  • Faisal M Fadlelmola
  • Daniel S Katz
  • Katherine Kendig
  • Matthew C Kendzior
  • Tiffany Li
  • Yingxue Ren
  • Elliott Rodriguez
  • Matthew R Weber
  • Justin M Wozniak
  • Jennie Zermeno
  • Liudmila S Mainzer

Abstract

Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the “best” workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T’s data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/.

Suggested Citation

  • Azza E Ahmed & Jacob Heldenbrand & Yan Asmann & Faisal M Fadlelmola & Daniel S Katz & Katherine Kendig & Matthew C Kendzior & Tiffany Li & Yingxue Ren & Elliott Rodriguez & Matthew R Weber & Justin M , 2019. "Managing genomic variant calling workflows with Swift/T," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-20, July.
  • Handle: RePEc:plo:pone00:0211608
    DOI: 10.1371/journal.pone.0211608
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0211608
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0211608&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0211608?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David K Brown & David L Penkler & Thommas M Musyoka & Özlem Tastan Bishop, 2015. "JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-25, August.
    2. Amit Kawalia & Susanne Motameny & Stephan Wonczak & Holger Thiele & Lech Nieroda & Kamel Jabbari & Stefan Borowski & Vishal Sinha & Wilfried Gunia & Ulrich Lang & Viktor Achter & Peter Nürnberg, 2015. "Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-16, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vargas, Paola & Tien, Iris, 2023. "Impacts of 5G on cyber-physical risks for interdependent connected smart critical infrastructure systems," International Journal of Critical Infrastructure Protection, Elsevier, vol. 42(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0211608. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.