IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v064i11.html
   My bibliography  Save this article

BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments

Author

Listed:
  • Bischl, Bernd
  • Lang, Michel
  • Mersmann, Olaf
  • Rahnenführer, Jörg
  • Weihs, Claus

Abstract

Empirical analysis of statistical algorithms often demands time-consuming experiments. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control any batch cluster from within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. Computations are performed asynchronously and all job states are persistently stored in a database, which can be queried at any point in time. The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends package BatchJobs by letting the user define an array of jobs of the kind “apply algorithm A to problem instance P and store results”. It is possible to associate statistical designs with parameters of problems and algorithms and therefore to systematically study their influence on the results.The packages’ main features are: (a) Convenient usage: All relevant batch system operations are either handled internally or mapped to simple R functions. (b) Portability: Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (c) Reproducibility: Every computational part has an associated seed to ensure reproducibility even when the underlying batch system changes. (d) Abstraction and good software design: The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.

Suggested Citation

  • Bischl, Bernd & Lang, Michel & Mersmann, Olaf & Rahnenführer, Jörg & Weihs, Claus, 2015. "BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 64(i11).
  • Handle: RePEc:jss:jstsof:v:064:i11
    DOI: http://hdl.handle.net/10.18637/jss.v064.i11
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v064i11/v64i11.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v064i11/BatchJobs_1.6.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v064i11/BatchExperiments_1.4.1.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v064i11/v64i11.R
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v064.i11?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kane, Michael & Emerson, John W. & Weston, Stephen, 2013. "Scalable Strategies for Computing with Massive Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 55(i14).
    2. Hoffmann, Thomas J., 2011. "Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(c01).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daniel Horn & Aydın Demircioğlu & Bernd Bischl & Tobias Glasmachers & Claus Weihs, 2018. "A comparative study on large scale kernelized support vector machines," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 867-883, December.
    2. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aaron T L Lun & Hervé Pagès & Mike L Smith, 2018. "beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types," PLOS Computational Biology, Public Library of Science, vol. 14(5), pages 1-15, May.
    2. Fulya Gokalp Yavuz & Barret Schloerke, 2020. "Parallel computing in linear mixed models," Computational Statistics, Springer, vol. 35(3), pages 1273-1289, September.
    3. Hofert, Marius & Mächler, Martin, 2016. "Parallel and Other Simulations in R Made Easy: An End-to-End Study," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 69(i04).
    4. Lining Yu & Wolfgang Karl Hardle & Lukas Borke & Thijs Benschop, 2020. "An AI approach to measuring financial risk," Papers 2009.13222, arXiv.org.
    5. Junyang Qian & Yosuke Tanigawa & Wenfei Du & Matthew Aguirre & Chris Chang & Robert Tibshirani & Manuel A Rivas & Trevor Hastie, 2020. "A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-30, October.
    6. Bandara, Kanchana & Varpe, Øystein & Ji, Rubao & Eiane, Ketil, 2018. "A high-resolution modeling study on diel and seasonal vertical migrations of high-latitude copepods," Ecological Modelling, Elsevier, vol. 368(C), pages 357-376.
    7. Fang, Jianglin, 2023. "A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
    8. Marin FOTACHE, 2016. "Data Processing Languages for Business Intelligence. SQL vs. R," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 20(1), pages 48-61.
    9. Lining Yu & Wolfgang Karl Härdle & Lukas Borke & Thijs Benschop, 2017. "FRM: a Financial Risk Meter based on penalizing tail events occurrence," SFB 649 Discussion Papers SFB649DP2017-003, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:064:i11. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.