IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v22y2025i4p464-d1617354.html
   My bibliography  Save this article

Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression

Author

Listed:
  • Kerry A. Howard

    (Department of Public Health Sciences, Clemson University, Clemson, SC 29634, USA
    Center for Public Health Modeling and Response, Clemson University, Clemson, SC 29634, USA
    Co-first authors.)

  • Wes Anderson

    (Critical Path Institute, Tucson, AZ 85718, USA
    Co-first authors.)

  • Jagdeep T. Podichetty

    (Critical Path Institute, Tucson, AZ 85718, USA)

  • Ruth Gould

    (Centers of Disease Control and Prevention, Atlanta, GA 30329, USA)

  • Danielle Boyce

    (Tufts University School of Medicine, Tufts University, Medford, MA 02155, USA)

  • Pam Dasher

    (Critical Path Institute, Tucson, AZ 85718, USA)

  • Laura Evans

    (Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington, Seattle, WA 98195, USA)

  • Cindy Kao

    (IR Research & Academic Systems, University of Texas Southwestern, Dallas, TX 75390, USA)

  • Vishakha K. Kumar

    (Society of Critical Care Medicine, Mount Prospect, IL 60056, USA)

  • Chase Hamilton

    (Society of Critical Care Medicine, Mount Prospect, IL 60056, USA)

  • Ewy Mathé

    (National Institutes of Health National Center for Advancing Translational Sciences (NCATS), Rockville, MD 20850, USA)

  • Philippe J. Guerin

    (Infectious Diseases Data Observatory (IDDO), Nuffield Department of Medicine, University of Oxford, Oxford, Oxfordshire OX3 LF, UK)

  • Kenneth Dodd

    (Department of Emergency Medicine, Advocate Christ Medical Center, Oak Lawn, IL 60453, USA)

  • Aneesh K. Mehta

    (Department of Medicine, Emory University, Atlanta, GA 30322, USA)

  • Chris Ortman

    (Institute for Translational and Clinical Science, University of Iowa, Iowa City, IA 52242, USA)

  • Namrata Patil

    (Brigham and Women’s Hospital, Boston, MA 02115, USA)

  • Jeselyn Rhodes

    (Department of Medicine, Emory University, Atlanta, GA 30322, USA)

  • Matthew Robinson

    (Division of Infectious Diseases, Johns Hopkins University, Baltimore, MD 21205, USA)

  • Heather Stone

    (US Food and Drug Administration, Silver Spring, MD 20993, USA)

  • Smith F. Heavner

    (Department of Public Health Sciences, Clemson University, Clemson, SC 29634, USA
    Critical Path Institute, Tucson, AZ 85718, USA
    Department of Biomedical Sciences, University of South Carolina School of Medicine Greenville, Greenville, SC 29605, USA)

Abstract

Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to facilitate the continued use of data generated during routine clinical care for research, but in an organized, accelerated, and shared manner, is crucial. This study investigates the potential of CURE ID, an open-source platform to accelerate drug-repurposing research for difficult-to-treat diseases, with COVID-19 as a use case. Data from eight US health systems were analyzed using least absolute shrinkage and selection operator (LASSO) regression to identify key predictors of 28-day all-cause mortality in COVID-19 patients, including demographics, comorbidities, treatments, and laboratory measurements captured during the first two days of hospitalization. Key findings indicate that age, laboratory measures, severity of illness indicators, oxygen support administration, and comorbidities significantly influenced all-cause 28-day mortality, aligning with previous studies. This work underscores the value of collaborative repositories like CURE ID in providing robust datasets for prognostic research and the importance of factor selection in identifying key variables, helping to streamline future research and drug-repurposing efforts.

Suggested Citation

  • Kerry A. Howard & Wes Anderson & Jagdeep T. Podichetty & Ruth Gould & Danielle Boyce & Pam Dasher & Laura Evans & Cindy Kao & Vishakha K. Kumar & Chase Hamilton & Ewy Mathé & Philippe J. Guerin & Kenn, 2025. "Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression," IJERPH, MDPI, vol. 22(4), pages 1-13, March.
  • Handle: RePEc:gam:jijerp:v:22:y:2025:i:4:p:464-:d:1617354
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/22/4/464/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/22/4/464/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:22:y:2025:i:4:p:464-:d:1617354. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.