IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006468.html
   My bibliography  Save this article

Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor

Author

Listed:
  • Richard A Erickson
  • Michael N Fienen
  • S Grace McCalla
  • Emily L Weiser
  • Melvin L Bower
  • Jonathan M Knudson
  • Greg Thain

Abstract

Biologists and environmental scientists now routinely solve computational problems that were unimaginable a generation ago. Examples include processing geospatial data, analyzing -omics data, and running large-scale simulations. Conventional desktop computing cannot handle these tasks when they are large, and high-performance computing is not always available nor the most appropriate solution for all computationally intense problems. High-throughput computing (HTC) is one method for handling computationally intense research. In contrast to high-performance computing, which uses a single "supercomputer," HTC can distribute tasks over many computers (e.g., idle desktop computers, dedicated servers, or cloud-based resources). HTC facilities exist at many academic and government institutes and are relatively easy to create from commodity hardware. Additionally, consortia such as Open Science Grid facilitate HTC, and commercial entities sell cloud-based solutions for researchers who lack HTC at their institution. We provide an introduction to HTC for biologists and environmental scientists. Our examples from biology and the environmental sciences use HTCondor, an open source HTC system.Author summary: Computational biology often requires processing large amounts of data, running many simulations, or other computationally intensive tasks. In this hybrid primer/tutorial, we describe how high-throughput computing (HTC) can be used to solve these problems. First, we present an overview of high-throughput computing. Second, we describe how to break jobs down so that they can run with HTC. Third, we describe how to use HTCondor software as a method for HTC. Fourth, we describe how HTCondor may be applied to other situations and a series of online tutorials.

Suggested Citation

  • Richard A Erickson & Michael N Fienen & S Grace McCalla & Emily L Weiser & Melvin L Bower & Jonathan M Knudson & Greg Thain, 2018. "Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-8, October.
  • Handle: RePEc:plo:pcbi00:1006468
    DOI: 10.1371/journal.pcbi.1006468
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006468
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006468&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006468?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Berk Ekmekci & Charles E McAnany & Cameron Mura, 2016. "An Introduction to Programming for Bioscientists: A Python-Based Primer," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-43, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cameron Mura & Mike Chalupa & Abigail M Newbury & Jack Chalupa & Philip E Bourne, 2020. "Ten simple rules for starting research in your late teens," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-11, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006468. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.