IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v323y2025i1p224-240.html
   My bibliography  Save this article

Estimating non-overfitted convex production technologies: A stochastic machine learning approach

Author

Listed:
  • Guillen, Maria D.
  • Charles, Vincent
  • Aparicio, Juan

Abstract

Overfitting is a classical statistical issue that occurs when a model fits a particular observed data sample too closely, potentially limiting its generalizability. While Data Envelopment Analysis (DEA) is a powerful non-parametric method for assessing the relative efficiency of decision-making units (DMUs), its reliance on the minimal extrapolation principle can lead to concerns about overfitting, particularly when the goal extends beyond evaluating the specific DMUs in the sample to making broader inferences. In this paper, we propose an adaptation of Stochastic Gradient Boosting to estimate production possibility sets that mitigate overfitting while satisfying shape constraints such as convexity and free disposability. Our approach is not intended to replace DEA but to complement it, offering an additional tool for scenarios where generalization is important. Through simulation experiments, we demonstrate that the proposed method performs well compared to DEA, especially in high-dimensional settings. Furthermore, the new machine learning-based technique is compared to the Corrected Concave Non-parametric Least Squares (C2NLS), showing competitive performance. We also illustrate how the usual efficiency measures in DEA can be implemented under our approach. Finally, we provide an empirical example based on data from the Program for International Student Assessment (PISA) to demonstrate the applicability of the new method.

Suggested Citation

  • Guillen, Maria D. & Charles, Vincent & Aparicio, Juan, 2025. "Estimating non-overfitted convex production technologies: A stochastic machine learning approach," European Journal of Operational Research, Elsevier, vol. 323(1), pages 224-240.
  • Handle: RePEc:eee:ejores:v:323:y:2025:i:1:p:224-240
    DOI: 10.1016/j.ejor.2024.11.030
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221724008993
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2024.11.030?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Leopold Simar & Paul Wilson, 2000. "A general methodology for bootstrapping in non-parametric frontier models," Journal of Applied Statistics, Taylor & Francis Journals, vol. 27(6), pages 779-802.
    2. Tsionas, Mike, 2022. "Efficiency estimation using probabilistic regression trees with an application to Chilean manufacturing industries," International Journal of Production Economics, Elsevier, vol. 249(C).
    3. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 636-655, April.
    4. Kneip, Alois & Simar, Léopold & Wilson, Paul W., 2008. "Asymptotics And Consistent Bootstraps For Dea Estimators In Nonparametric Frontier Models," Econometric Theory, Cambridge University Press, vol. 24(6), pages 1663-1697, December.
    5. Cinzia Daraio & Léopold Simar, 2005. "Introducing Environmental Variables in Nonparametric Frontier Models: a Probabilistic Approach," Journal of Productivity Analysis, Springer, vol. 24(1), pages 93-121, September.
    6. Valero-Carreras, Daniel & Aparicio, Juan & Guerrero, Nadia M., 2021. "Support vector frontiers: A new approach for estimating production functions through support vector machines," Omega, Elsevier, vol. 104(C).
    7. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 28-59, December.
    8. Moragues, Raul & Aparicio, Juan & Esteve, Miriam, 2023. "An unsupervised learning-based generalization of Data Envelopment Analysis," Operations Research Perspectives, Elsevier, vol. 11(C).
    9. Abdelaati Daouia & Hohsuk Noh & Byeong U. Park, 2016. "Data envelope fitting with constrained polynomial splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 3-30, January.
    10. Charnes, A. & Cooper, W. W. & Rhodes, E., 1978. "Measuring the efficiency of decision making units," European Journal of Operational Research, Elsevier, vol. 2(6), pages 429-444, November.
    11. Olesen, O.B. & Ruggiero, J., 2022. "The hinging hyperplanes: An alternative nonparametric representation of a production function," European Journal of Operational Research, Elsevier, vol. 296(1), pages 254-266.
    12. Léopold Simar & Paul W. Wilson, 1998. "Sensitivity Analysis of Efficiency Scores: How to Bootstrap in Nonparametric Frontier Models," Management Science, INFORMS, vol. 44(1), pages 49-61, January.
    13. Dominique Deprins & Léopold Simar & Henry Tulkens, 2006. "Measuring Labor-Efficiency in Post Offices," Springer Books, in: Parkash Chander & Jacques Drèze & C. Knox Lovell & Jack Mintz (ed.), Public goods, environmental externalities and fiscal competition, chapter 0, pages 285-309, Springer.
    14. Kuosmanen, Timo & Johnson, Andrew, 2017. "Modeling joint production of multiple outputs in StoNED: Directional distance function approach," European Journal of Operational Research, Elsevier, vol. 262(2), pages 792-801.
    15. Daouia, Abdelaati & Noh, Hohsuk & Park, Byeong U., 2016. "Data envelope fitting with constrained polynomial splines," LIDAM Reprints ISBA 2016011, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    16. Timo Kuosmanen & Andrew L. Johnson, 2010. "Data Envelopment Analysis as Nonparametric Least-Squares Regression," Operations Research, INFORMS, vol. 58(1), pages 149-160, February.
    17. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    18. R. D. Banker & A. Charnes & W. W. Cooper, 1984. "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis," Management Science, INFORMS, vol. 30(9), pages 1078-1092, September.
    19. Olesen, Ole B. & Petersen, Niels Christian, 2016. "Stochastic Data Envelopment Analysis—A review," European Journal of Operational Research, Elsevier, vol. 251(1), pages 2-21.
    20. Vincent Charles & Tatiana Gherman & Joe Zhu, 2021. "Data Envelopment Analysis and Big Data: A Systematic Literature Review with Bibliometric Analysis," International Series in Operations Research & Management Science, in: Joe Zhu & Vincent Charles (ed.), Data-Enabled Analytics, pages 1-29, Springer.
    21. Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Measuring technical efficiency for multi-input multi-output production processes through OneClass Support Vector Machines: a finite-sample study," Operational Research, Springer, vol. 23(3), pages 1-33, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guillen, Maria D. & Aparicio, Juan & Kapelko, Magdalena & Esteve, Miriam, 2025. "Measuring environmental inefficiency through machine learning: An approach based on efficiency analysis trees and by-production technology," European Journal of Operational Research, Elsevier, vol. 321(2), pages 529-542.
    2. España, Victor J. & Aparicio, Juan & Barber, Xavier & Esteve, Miriam, 2024. "Estimating production functions through additive models based on regression splines," European Journal of Operational Research, Elsevier, vol. 312(2), pages 684-699.
    3. Guerrero, Nadia M. & Moragues, Raul & Aparicio, Juan & Valero-Carreras, Daniel, 2024. "Support Vector Frontiers with kernel splines," Omega, Elsevier, vol. 128(C).
    4. Moragues, Raul & Aparicio, Juan & Esteve, Miriam, 2023. "An unsupervised learning-based generalization of Data Envelopment Analysis," Operations Research Perspectives, Elsevier, vol. 11(C).
    5. Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Ranking the Importance of Variables in a Nonparametric Frontier Analysis Using Unsupervised Machine Learning Techniques," Mathematics, MDPI, vol. 11(11), pages 1-24, June.
    6. Esteve, Miriam & Aparicio, Juan & Rodriguez-Sala, Jesus J. & Zhu, Joe, 2023. "Random Forests and the measurement of super-efficiency in the context of Free Disposal Hull," European Journal of Operational Research, Elsevier, vol. 304(2), pages 729-744.
    7. Valero-Carreras, Daniel & Aparicio, Juan & Guerrero, Nadia M., 2021. "Support vector frontiers: A new approach for estimating production functions through support vector machines," Omega, Elsevier, vol. 104(C).
    8. Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Measuring technical efficiency for multi-input multi-output production processes through OneClass Support Vector Machines: a finite-sample study," Operational Research, Springer, vol. 23(3), pages 1-33, September.
    9. Nadia M. Guerrero & Juan Aparicio & Daniel Valero-Carreras, 2022. "Combining Data Envelopment Analysis and Machine Learning," Mathematics, MDPI, vol. 10(6), pages 1-22, March.
    10. Guillen, Maria D. & Charles, Vincent & Aparicio, Juan, 2025. "Enhanced efficiency assessment in manufacturing: Leveraging machine learning for improved performance analysis," Omega, Elsevier, vol. 134(C).
    11. Amir Moradi-Motlagh & Ali Emrouznejad, 2022. "The origins and development of statistical approaches in non-parametric frontier models: a survey of the first two decades of scholarly literature (1998–2020)," Annals of Operations Research, Springer, vol. 318(1), pages 713-741, November.
    12. Valentin Zelenyuk, 2019. "Data Envelopment Analysis and Business Analytics: The Big Data Challenges and Some Solutions," CEPA Working Papers Series WP072019, School of Economics, University of Queensland, Australia.
    13. Halkos, George & Tzeremes, Nickolaos, 2008. "Measuring regional public health provision," MPRA Paper 23762, University Library of Munich, Germany.
    14. Zervopoulos, Panagiotis & Emrouznejad, Ali & Sklavos, Sokratis, 2019. "A Bayesian approach for correcting bias of data envelopment analysis estimators," MPRA Paper 91886, University Library of Munich, Germany.
    15. Halkos, George & Tzeremes, Nickolaos, 2009. "Exploring the effect of countries’ economic prosperity on their biodiversity performance," MPRA Paper 32102, University Library of Munich, Germany.
    16. Léopold Simar & Paul W. Wilson, 2015. "Statistical Approaches for Non-parametric Frontier Models: A Guided Tour," International Statistical Review, International Statistical Institute, vol. 83(1), pages 77-110, April.
    17. Halkos, George & Tzeremes, Nickolaos, 2011. "A conditional full frontier modelling for analyzing environmental efficiency and economic growth," MPRA Paper 32839, University Library of Munich, Germany.
    18. George Halkos & Nickolaos Tzeremes, 2010. "The effect of foreign ownership on SMEs performance: An efficiency analysis perspective," Journal of Productivity Analysis, Springer, vol. 34(2), pages 167-180, October.
    19. Halkos, George & Tzeremes, Nickolaos, 2011. "The effect of national culture on countries’ innovation efficiency," MPRA Paper 30100, University Library of Munich, Germany.
    20. Halkos, George E. & Tzeremes, Nickolaos G., 2011. "A conditional nonparametric analysis for measuring the efficiency of regional public healthcare delivery: An application to Greek prefectures," Health Policy, Elsevier, vol. 103(1), pages 73-82.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:323:y:2025:i:1:p:224-240. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.