IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i13p6750-d580451.html

Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19

Author

Listed:
  • Vito Janko

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Gašper Slapničar

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Erik Dovgan

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Nina Reščič

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Tine Kolenik

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Martin Gjoreski

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Maj Smerkol

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Matjaž Gams

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Mitja Luštrek

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

Abstract

The COVID-19 pandemic affected the whole world, but not all countries were impacted equally. This opens the question of what factors can explain the initial faster spread in some countries compared to others. Many such factors are overshadowed by the effect of the countermeasures, so we studied the early phases of the infection when countermeasures had not yet taken place. We collected the most diverse dataset of potentially relevant factors and infection metrics to date for this task. Using it, we show the importance of different factors and factor categories as determined by both statistical methods and machine learning (ML) feature selection (FS) approaches. Factors related to culture (e.g., individualism, openness), development, and travel proved the most important. A more thorough factor analysis was then made using a novel rule discovery algorithm. We also show how interconnected these factors are and caution against relying on ML analysis in isolation. Importantly, we explore potential pitfalls found in the methodology of similar work and demonstrate their impact on COVID-19 data analysis. Our best models using the decision tree classifier can predict the infection class with roughly 80% accuracy.

Suggested Citation

  • Vito Janko & Gašper Slapničar & Erik Dovgan & Nina Reščič & Tine Kolenik & Martin Gjoreski & Maj Smerkol & Matjaž Gams & Mitja Luštrek, 2021. "Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19," IJERPH, MDPI, vol. 18(13), pages 1-33, June.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:13:p:6750-:d:580451
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/13/6750/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/13/6750/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chimmula, Vinay Kumar Reddy & Zhang, Lei, 2020. "Time series forecasting of COVID-19 transmission in Canada using LSTM networks," Chaos, Solitons & Fractals, Elsevier, vol. 135(C).
    2. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    3. Enrico Spolaore & Romain Wacziarg, 2018. "Ancestry and development: New evidence," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 33(5), pages 748-762, August.
    4. Yun Qiu & Xi Chen & Wei Shi, 2020. "Impacts of social and economic factors on the transmission of coronavirus disease 2019 (COVID-19) in China," Journal of Population Economics, Springer;European Society for Population Economics, vol. 33(4), pages 1127-1172, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Peipei & Zheng, Xinqi & Ai, Gang & Liu, Dongya & Zhu, Bangren, 2020. "Time series prediction for the epidemic trends of COVID-19 using the improved LSTM deep learning method: Case studies in Russia, Peru and Iran," Chaos, Solitons & Fractals, Elsevier, vol. 140(C).
    2. Tong, Jianfeng & Liu, Zhenxing & Zhang, Yong & Zheng, Xiujuan & Jin, Junyang, 2023. "Improved multi-gate mixture-of-experts framework for multi-step prediction of gas load," Energy, Elsevier, vol. 282(C).
    3. Asma Shaheen & Javed Iqbal, 2018. "Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Profiles Using Geostatistics and Random Forest, Boruta Algorithm," Sustainability, MDPI, vol. 10(3), pages 1-20, March.
    4. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    5. Chen, Simiao & Jin, Zhangfeng & Bloom, David E., 2020. "Act Early to Prevent Infections and Save Lives: Causal Impact of Diagnostic Efficiency on the COVID-19 Pandemic," IZA Discussion Papers 13749, IZA Network @ LISER.
    6. Yulan Li & Kun Ma, 2022. "A Hybrid Model Based on Improved Transformer and Graph Convolutional Network for COVID-19 Forecasting," IJERPH, MDPI, vol. 19(19), pages 1-17, September.
    7. Catalina Amuedo-Dorantes & Neeraj Kaushal & Ashley N. Muchow, 2021. "Timing of social distancing policies and COVID-19 mortality: county-level evidence from the U.S," Journal of Population Economics, Springer;European Society for Population Economics, vol. 34(4), pages 1445-1472, October.
    8. Yang Zhao & Denise Gorse, 2024. "Earthquake prediction from seismic indicators using tree-based ensemble learning," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 120(3), pages 2283-2309, February.
    9. Cui Zhang & Dandan Zhang, 2023. "Spatial Interactions and the Spread of COVID-19: A Network Perspective," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 383-405, June.
    10. Difang Huang & Ying Liang & Boyao Wu & Yanyi Ye, 2025. "Estimating the impact of social distance policy in mitigating COVID-19 spread with factor-based imputation approach," Empirical Economics, Springer, vol. 68(2), pages 585-601, February.
    11. Yunhan Huang & Quanyan Zhu, 2022. "Game-Theoretic Frameworks for Epidemic Spreading and Human Decision-Making: A Review," Dynamic Games and Applications, Springer, vol. 12(1), pages 7-48, March.
    12. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    13. Abu Bakkar Siddique & Kingsley E. Haynes & Rajendra Kulkarni & Meng-Hao Li, 2023. "Regional poverty and infection disease: early exploratory evidence from the COVID-19 pandemic," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 70(1), pages 209-236, February.
    14. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    15. Baihan Wang & Alfred Pozarickij & Mohsen Mazidi & Neil Wright & Pang Yao & Saredo Said & Andri Iona & Christiana Kartsonaki & Hannah Fry & Kuang Lin & Yiping Chen & Huaidong Du & Daniel Avery & Dan Sc, 2025. "Comparative studies of 2168 plasma proteins measured by two affinity-based platforms in 4000 Chinese adults," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    16. Zahra Dehghan Shabani & Rouhollah Shahnazi, 2020. "Spatial distribution dynamics and prediction of COVID‐19 in Asian countries: spatial Markov chain approach," Regional Science Policy & Practice, Wiley Blackwell, vol. 12(6), pages 1005-1025, December.
    17. Zhao-Yue Chen & Hervé Petetin & Raúl Fernando Méndez Turrubiates & Hicham Achebak & Carlos Pérez García-Pando & Joan Ballester, 2024. "Population exposure to multiple air pollutants and its compound episodes in Europe," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    18. Indrikis A. Krams & Priit Jõers & Severi Luoto & Giedrius Trakimas & Vilnis Lietuvietis & Ronalds Krams & Irena Kaminska & Markus J. Rantala & Tatjana Krama, 2021. "The Obesity Paradox Predicts the Second Wave of COVID-19 to Be Severe in Western Countries," IJERPH, MDPI, vol. 18(3), pages 1-10, January.
    19. Schrader, Silja & Graham, Sonia & Campbell, Rebecca & Height, Kaitlyn & Hawkes, Gina, 2024. "Grower attitudes and practices toward area-wide management of cropping weeds in Australia," Land Use Policy, Elsevier, vol. 137(C).
    20. Lewandowski, Piotr, 2020. "Occupational Exposure to Contagion and the Spread of COVID-19 in Europe," IZA Discussion Papers 13227, IZA Network @ LISER.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:13:p:6750-:d:580451. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.