IDEAS home Printed from https://ideas.repec.org/a/wly/jforec/v44y2025i8p2405-2424.html

Integrating Google Mobility Indices for Forecasting Infectious Diseases Incidence: A Multi‐Country Study on COVID‐19 With LightGBM

Author

Listed:
  • Milton Soto‐Ferrari

Abstract

Reliable forecasts of infectious disease trajectories are indispensable for timely public health action and allocation of medical resources. However, most time‐series forecasting frameworks still rely solely on historical case counts and thus struggle to capture sudden shifts in population behavior. Therefore, to quantify the value of external behavioral signals during the COVID‐19 pandemic, this research assembled a 124‐week (from May 31, 2020, to October 9, 2022) panel that fuses Google Community‐Mobility indices with standard surveillance indicators such as new cases, deaths, tests, and vaccinations plus information about population density and the Oxford policy‐stringency score for 20 countries spanning six continents. We proceed to assess two forecasting methodological families for predicting new cases using an 8‐week hold‐out window. The target‐variable‐only family comprised models using a 4‐week rolling average, autoregressive integrated moving average (ARIMA), Prophet, and long short‐term memory (LSTM) approaches. In contrast, the data‐integration family employs distinct light gradient boosting machine (LightGBM) variants: LightGBM‐Direct, which learns a single multi‐output mapping for all periods in the horizon, and LightGBM‐Recursive, which updates a one‐step model and rolls its predictions forward. Performance is evaluated using root mean square error (RMSE) and two optimized weight indices (OWIs), which benchmark improvements over the rolling‐average baseline and ARIMA, respectively. The results demonstrate that a mobility‐enhanced LightGBM achieves the lowest RMSE in every country, reducing the overall median error by 83% compared with the baseline and by 87% against ARIMA. LightGBM‐Direct excels in twelve nations, characterized by smoother trends, whereas LightGBM‐Recursive dominates in the remaining eight, which exhibit rapid fluctuations in incidence. Notably, SHapley Additive exPlanations (TreeSHAP) identifies workplace and transit‐station mobility, testing intensity, vaccinations, and policy stringency as the most influential predictors, denoting the importance of external behavioral signals in improving pandemic forecast accuracy.

Suggested Citation

  • Milton Soto‐Ferrari, 2025. "Integrating Google Mobility Indices for Forecasting Infectious Diseases Incidence: A Multi‐Country Study on COVID‐19 With LightGBM," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 44(8), pages 2405-2424, December.
  • Handle: RePEc:wly:jforec:v:44:y:2025:i:8:p:2405-2424
    DOI: 10.1002/for.70006
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/for.70006
    Download Restriction: no

    File URL: https://libkey.io/10.1002/for.70006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hyndman, Rob J. & Khandakar, Yeasmin, 2008. "Automatic Time Series Forecasting: The forecast Package for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i03).
    2. Gianluca Bontempi & Souhaib Ben Taieb & Yann-Aël Le Borgne, 2013. "Machine learning strategies for time series forecasting," ULB Institutional Repository 2013/167761, ULB -- Universite Libre de Bruxelles.
    3. Bojer, Casper Solheim & Meldgaard, Jens Peder, 2021. "Kaggle forecasting competitions: An overlooked learning opportunity," International Journal of Forecasting, Elsevier, vol. 37(2), pages 587-603.
    4. Geri Skenderi & Christian Joppi & Matteo Denitto & Marco Cristani, 2024. "Well googled is half done: Multimodal forecasting of new fashion product sales with image‐based google trends," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(6), pages 1982-1997, September.
    5. Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilios, 2022. "M5 accuracy competition: Results, findings, and conclusions," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1346-1364.
    6. D’Amuri, Francesco & Marcucci, Juri, 2017. "The predictive power of Google searches in forecasting US unemployment," International Journal of Forecasting, Elsevier, vol. 33(4), pages 801-816.
    7. Yue Teng & Dehua Bi & Guigang Xie & Yuan Jin & Yong Huang & Baihan Lin & Xiaoping An & Dan Feng & Yigang Tong, 2017. "Dynamic Forecasting of Zika Epidemics Using Google Trends," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-10, January.
    8. Simeon Vosen & Torsten Schmidt, 2011. "Forecasting private consumption: survey‐based indicators vs. Google trends," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 30(6), pages 565-578, September.
    9. Jaemin Woo & Ann L. Owen, 2019. "Forecasting private consumption with Google Trends data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 38(2), pages 81-91, March.
    10. Milton Soto-Ferrari & Alejandro Carrasco-Pena & Diana Prieto, 2023. "AGGFORCLUS: A hybrid methodology integrating forecasting with clustering to assess mitigation plans and contagion risk in pandemic outbreaks: the COVID-19 Case Study," Journal of Business Analytics, Taylor & Francis Journals, vol. 6(3), pages 217-242, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kohns, David & Bhattacharjee, Arnab, 2023. "Nowcasting growth using Google Trends data: A Bayesian Structural Time Series model," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1384-1412.
    2. Zhongchen Song & Tom Coupé, 2023. "Predicting Chinese consumption series with Baidu," Journal of Chinese Economic and Business Studies, Taylor & Francis Journals, vol. 21(3), pages 429-463, July.
    3. Schaer, Oliver & Kourentzes, Nikolaos & Fildes, Robert, 2019. "Demand forecasting with user-generated online information," International Journal of Forecasting, Elsevier, vol. 35(1), pages 197-212.
    4. Vera Z. Eichenauer & Ronald Indergand & Isabel Z. Martínez & Christoph Sax, 2022. "Obtaining consistent time series from Google Trends," Economic Inquiry, Western Economic Association International, vol. 60(2), pages 694-705, April.
    5. Jiam Song & Kwangmin Jung & Jonghun Kam, 2023. "Correction: Evidence of the time-varying impacts of the COVID-19 pandemic on online search activities relating to shopping products in South Korea," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 10(1), pages 1-1, December.
    6. Fu, Chun & Miller, Clayton, 2022. "Using Google Trends as a proxy for occupant behavior to predict building energy consumption," Applied Energy, Elsevier, vol. 310(C).
    7. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).
    8. Oikonomou, Konstantinos & Damigos, Dimitris & Dimitriou, Dimitrios, 2025. "Globality in the metal markets: Leveraging cross-learning to forecast aluminum and copper prices," Resources Policy, Elsevier, vol. 103(C).
    9. Monge, Manuel & Claudio-Quiroga, Gloria & Poza, Carlos, 2024. "Chinese economic behavior in times of covid-19. A new leading economic indicator based on Google trends," International Economics, Elsevier, vol. 177(C).
    10. Spiliotis, Evangelos & Petropoulos, Fotios, 2024. "On the update frequency of univariate forecasting models," European Journal of Operational Research, Elsevier, vol. 314(1), pages 111-121.
    11. Wang, Shengjie & Kang, Yanfei & Petropoulos, Fotios, 2024. "Combining probabilistic forecasts of intermittent demand," European Journal of Operational Research, Elsevier, vol. 315(3), pages 1038-1048.
    12. Vaia I. Kontopoulou & Athanasios D. Panagopoulos & Ioannis Kakkos & George K. Matsopoulos, 2023. "A Review of ARIMA vs. Machine Learning Approaches for Time Series Forecasting in Data Driven Networks," Future Internet, MDPI, vol. 15(8), pages 1-31, July.
    13. Voyant, Cyril & Notton, Gilles & Duchaud, Jean-Laurent & Gutiérrez, Luis Antonio García & Bright, Jamie M. & Yang, Dazhi, 2022. "Benchmarks for solar radiation time series forecasting," Renewable Energy, Elsevier, vol. 191(C), pages 747-762.
    14. Puhr, Harald & Müllner, Jakob, 2024. "Vox populi, vox dei: A concept and measure for grassroots socio-political risk using Google Trends," Journal of International Management, Elsevier, vol. 30(2).
    15. Long Wen & Chang Liu & Haiyan Song, 2019. "Forecasting tourism demand using search query data: A hybrid modelling approach," Tourism Economics, , vol. 25(3), pages 309-329, May.
    16. Caperna, Giulio & Colagrossi, Marco & Geraci, Andrea & Mazzarella, Gianluca, 2022. "A babel of web-searches: Googling unemployment during the pandemic," Labour Economics, Elsevier, vol. 74(C).
    17. N. F. Dyachkova & E. V. Sinelnikova-Muryleva, 2026. "Calculation and Application of High-Frequency Macroeconomic Indicators: A Case Study Using Russian Data," Studies on Russian Economic Development, Springer, vol. 37(2), pages 206-216, April.
    18. Diebold, Céline, 2025. "Using Google search data to examine factory automation and its effect on employment," Economic Analysis and Policy, Elsevier, vol. 86(C), pages 1301-1328.
    19. Sarun Kamolthip, 2021. "Macroeconomic Forecasting with LSTM and Mixed Frequency Time Series Data," PIER Discussion Papers 165, Puey Ungphakorn Institute for Economic Research.
    20. Gillmann, Niels & Kim, Alisa, 2021. "Quantification of Economic Uncertainty: a deep learning approach," VfS Annual Conference 2021 (Virtual Conference): Climate Economics 242421, Verein für Socialpolitik / German Economic Association.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:jforec:v:44:y:2025:i:8:p:2405-2424. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www3.interscience.wiley.com/cgi-bin/jhome/2966 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.