IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v295y2021i2p648-663.html
   My bibliography  Save this article

On sparse ensemble methods: An application to short-term predictions of the evolution of COVID-19

Author

Listed:
  • Benítez-Peña, Sandra
  • Carrizosa, Emilio
  • Guerrero, Vanesa
  • Jiménez-Gamero, M. Dolores
  • Martín-Barragán, Belén
  • Molero-Río, Cristina
  • Ramírez-Cobo, Pepa
  • Romero Morales, Dolores
  • Sillero-Denamiel, M. Remedios

Abstract

Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components, but at the same time it may overfit, it may be distorted by base regressors with low accuracy, and it may be too complex to understand and explain. This paper proposes and studies a novel Mathematical Optimization model to build a sparse ensemble, which trades off the accuracy of the ensemble and the number of base regressors used. The latter is controlled by means of a regularization term that penalizes regressors with a poor individual performance. Our approach is flexible to incorporate desirable properties one may have on the ensemble, such as controlling the performance of the ensemble in critical groups of records, or the costs associated with the base regressors involved in the ensemble. We illustrate our approach with real data sets arising in the COVID-19 context.

Suggested Citation

  • Benítez-Peña, Sandra & Carrizosa, Emilio & Guerrero, Vanesa & Jiménez-Gamero, M. Dolores & Martín-Barragán, Belén & Molero-Río, Cristina & Ramírez-Cobo, Pepa & Romero Morales, Dolores & Sillero-Denami, 2021. "On sparse ensemble methods: An application to short-term predictions of the evolution of COVID-19," European Journal of Operational Research, Elsevier, vol. 295(2), pages 648-663.
  • Handle: RePEc:eee:ejores:v:295:y:2021:i:2:p:648-663
    DOI: 10.1016/j.ejor.2021.04.016
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221721003283
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2021.04.016?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Romero Morales, Dolores, 2020. "Sparsity in optimal randomized classification trees," European Journal of Operational Research, Elsevier, vol. 284(1), pages 255-272.
    2. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 636-655, April.
    3. Emilio Carrizosa & Belen Martin-Barragan & Dolores Romero Morales, 2010. "Binarized Support Vector Machines," INFORMS Journal on Computing, INFORMS, vol. 22(1), pages 154-167, February.
    4. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 28-59, December.
    5. Martin-Barragan, Belen & Lillo, Rosa & Romo, Juan, 2014. "Interpretable support vector machines for functional data," European Journal of Operational Research, Elsevier, vol. 232(1), pages 146-155.
    6. Carrizosa, Emilio & Martín-Barragán, Belén & Morales, Dolores Romero, 2011. "Detecting relevant variables and interactions in supervised classification," European Journal of Operational Research, Elsevier, vol. 213(1), pages 260-269, August.
    7. Tomohiro Ando & Ker-Chau Li, 2014. "A Model-Averaging Approach for High-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 254-265, March.
    8. Nikolopoulos, Konstantinos & Punia, Sushil & Schäfers, Andreas & Tsinopoulos, Christos & Vasilakis, Chrysovalantis, 2021. "Forecasting and planning during a pandemic: COVID-19 growth rates, supply chain disruptions, and governmental decisions," European Journal of Operational Research, Elsevier, vol. 290(1), pages 99-115.
    9. Carrizosa, Emilio & Nogales-Gómez, Amaya & Romero Morales, Dolores, 2017. "Clustering categories in support vector machines," Omega, Elsevier, vol. 66(PA), pages 28-37.
    10. Sandra Benítez-Peña & Rafael Blanquero & Emilio Carrizosa & Pepa Ramírez-Cobo, 2019. "On support vector machines under a multiple-cost scenario," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 663-682, September.
    11. Roger Koenker & Kevin F. Hallock, 2001. "Quantile Regression," Journal of Economic Perspectives, American Economic Association, vol. 15(4), pages 143-156, Fall.
    12. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Navarro-García, Manuel & Guerrero, Vanesa & Durban, María, 2023. "On constrained smoothing and out-of-range prediction using P-splines: A conic optimization approach," Applied Mathematics and Computation, Elsevier, vol. 441(C).
    2. Li, Dong & Dong, Chuanwen, 2022. "Government regulations to mitigate the shortage of life-saving goods in the face of a pandemic," European Journal of Operational Research, Elsevier, vol. 301(3), pages 942-955.
    3. Víctor Blanco & Ricardo Gázquez & Marina Leal, 2023. "Mathematical optimization models for reallocating and sharing health equipment in pandemic situations," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 355-390, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Morales, Dolores Romero, 2022. "On sparse optimal regression trees," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1045-1054.
    2. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    3. Manski, Charles F., 2023. "Probabilistic prediction for binary treatment choice: With focus on personalized medicine," Journal of Econometrics, Elsevier, vol. 234(2), pages 647-663.
    4. Weishampel, Anthony & Staicu, Ana-Maria & Rand, William, 2023. "Classification of social media users with generalized functional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    5. Rich, Jeppe & Myhrmann, Marcus Skyum & Mabit, Stefan Eriksen, 2023. "Our children cycle less - A Danish pseudo-panel analysis," Journal of Transport Geography, Elsevier, vol. 106(C).
    6. Jack Jewson & David Rossell, 2022. "General Bayesian loss function selection and the use of improper models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1640-1665, November.
    7. Pedro Duarte Silva, A., 2017. "Optimization approaches to Supervised Classification," European Journal of Operational Research, Elsevier, vol. 261(2), pages 772-788.
    8. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    9. Martin-Barragan, Belen & Lillo, Rosa & Romo, Juan, 2014. "Interpretable support vector machines for functional data," European Journal of Operational Research, Elsevier, vol. 232(1), pages 146-155.
    10. Nelson P. Rayl & Nitish R. Sinha, 2022. "Integrating Prediction and Attribution to Classify News," Finance and Economics Discussion Series 2022-042, Board of Governors of the Federal Reserve System (U.S.).
    11. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Romero Morales, Dolores, 2020. "Sparsity in optimal randomized classification trees," European Journal of Operational Research, Elsevier, vol. 284(1), pages 255-272.
    12. Siyi Wang & Xing Yan & Bangqi Zheng & Hu Wang & Wangli Xu & Nanbo Peng & Qi Wu, 2021. "Risk and return prediction for pricing portfolios of non-performing consumer credit," Papers 2110.15102, arXiv.org.
    13. Denis A Shah & Erick D De Wolf & Pierce A Paul & Laurence V Madden, 2021. "Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-23, March.
    14. Carrizosa, Emilio & Kurishchenko, Kseniia & Marín, Alfredo & Romero Morales, Dolores, 2022. "Interpreting clusters via prototype optimization," Omega, Elsevier, vol. 107(C).
    15. Maurizio Maravalle & Federica Ricca & Bruno Simeone & Vincenzo Spinelli, 2015. "Carpal Tunnel Syndrome automatic classification: electromyography vs. ultrasound imaging," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(1), pages 100-123, April.
    16. M. Merz & R. Richman & T. Tsanakas & M. V. Wuthrich, 2021. "Interpreting Deep Learning Models with Marginal Attribution by Conditioning on Quantiles," Papers 2103.11706, arXiv.org.
    17. Chun Chieh Fan & Robert Loughnan & Carolina Makowski & Diliana Pecheva & Chi-Hua Chen & Donald J. Hagler & Wesley K. Thompson & Nadine Parker & Dennis van der Meer & Oleksandr Frei & Ole A. Andreassen, 2022. "Multivariate genome-wide association study on tissue-sensitive diffusion metrics highlights pathways that shape the human brain," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    18. Emilio Carrizosa & Vanesa Guerrero & Dolores Romero Morales, 2023. "On mathematical optimization for clustering categories in contingency tables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 407-429, June.
    19. Anna Gottard & Giulia Vannucci & Leonardo Grilli & Carla Rampichini, 2023. "Mixed-effect models with trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 431-461, June.
    20. COJOCARIU Irina-Cristina, 2023. "Analysis Of Sports Performances Using Machine Learning And Statistical Models - A General Analysis Of The Literature," Revista Economica, Lucian Blaga University of Sibiu, Faculty of Economic Sciences, vol. 75(2), pages 34-39, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:295:y:2021:i:2:p:648-663. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.