IDEAS home Printed from https://ideas.repec.org/p/boc/scon19/49.html
   My bibliography  Save this paper

Fitting generalized linear models when the data exceeds available memory

Author

Listed:
  • Joseph Canner

    (Johns Hopkins University School of Medicine, Department of Surgery)

  • Krisztian Sebestyen

    (Johns Hopkins University School of Medicine, Department of Surgery)

Abstract

Despite the increase in random access memory (RAM) capacity and the decrease in RAM prices in the years since Stata was first released, the increase in the size of data sets in recent years can still exceed available RAM. This is particularly true for those who are using Stata on a personal laptop or desktop instead of an enterprise server. Accordingly, there is a need for statistical tools that can read small chunks of data from disk, perform calculations on those chunks, accumulate intermediate results, and produce final results that are the same as those obtained by performing the entire calculation in memory. The most ubiquitous statistical method is the generalized linear model (GLM), and mathematical methods have been available for many years to update the Q-R or Cholesky decomposition matrices with small chunks of data. Thomas Lumley’s R command bigglm uses Fortran functions published by Alan J. Miller in 1992 and freely available as Algorithm AS 274. We have developed –bigglm- for Stata using the same functions, as well as expanding the library of available family and link functions. The current version can read Stata datasets as well as import data from an ODBC source. In the presentation we will discuss the limitations of the current approach and suggest areas for improvement.

Suggested Citation

  • Joseph Canner & Krisztian Sebestyen, 2019. "Fitting generalized linear models when the data exceeds available memory," 2019 Stata Conference 49, Stata Users Group.
  • Handle: RePEc:boc:scon19:49
    as

    Download full text from publisher

    File URL: http://fmwww.bc.edu/repec/scon2019/chicago19_Canner.pdf
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:scon19:49. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/stataea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.