Author
Listed:
- Xueda Shen
- Aaron Rumack
- Bryan Wilder
- Ryan J Tibshirani
Abstract
We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds under a hypothetical scenario in which, during the Delta wave, states only report data on the first day of each month, and on this day, report COVID-19 hospitalization counts for each day in the previous month. In this hypothetical scenario (just as in reality), medical insurance claims data continues to be available daily. At the beginning of each month, we train a regression model, using all data available thus far, to predict hospitalization counts from medical insurance claims. We then use this model to nowcast the (unseen) values of COVID-19 hospitalization counts from medical insurance claims, at each day in the following month. Our analysis uses properly-versioned data, which would have been available in real-time at the time predictions are produced (instead of using data that would have only been available in hindsight). In spite of the difficulties inherent to real-time estimation (e.g., latency and backfill) and the complex dynamics behind COVID-19 hospitalizations themselves, we find altogether that medical insurance claims can be an accurate predictor of hospitalization reports, with mean absolute errors typically around 0.4 hospitalizations per 100,000 people, i.e., proportion of variance explained around 75%. Perhaps more importantly, we find that nowcasts made using medical insurance claims are able to qualitatively capture the dynamics (upswings and downswings) of hospitalization waves, which are key features that inform public health decision-making.Author summary: Daily reported COVID-19 hospitalizations have been a topline indicator throughout the pandemic in the US, and an up-to-date awareness of the load on the hospital system has been a key factor in public health decision-making. However, collecting and maintaining this indicator comes at a high price, as frequent reporting of hospitalizations is itself burdensome on the health system. This is especially true at times when it is needed the most: staff shortages in hospitals tended to coincide with surges in hospitalizations, making reporting even more challenging in peak times. In this paper, we explore the use of auxiliary indicators based on de-identified, aggregated medical insurance claims data, and build relatively simple statistical models to track hospitalizations using these auxiliary indicators, so that reporting may be (hypothetically) reduced in frequency, thereby reducing the burden on hospitals. We find that these models can track reported hospitalizations closely, even in critical times (surges), suggesting that our approach and similar ones may be good candidates for reducing reporting frequency in future public health crises.
Suggested Citation
Xueda Shen & Aaron Rumack & Bryan Wilder & Ryan J Tibshirani, 2025.
"Nowcasting reported covid-19 hospitalizations using de-identified, aggregated medical insurance claims data,"
PLOS Computational Biology, Public Library of Science, vol. 21(2), pages 1-26, February.
Handle:
RePEc:plo:pcbi00:1012717
DOI: 10.1371/journal.pcbi.1012717
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012717. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.