IDEAS home Printed from https://ideas.repec.org/p/qld/uqcepa/152.html
   My bibliography  Save this paper

LASSO DEA for small and big data

Author

Listed:

Abstract

In data envelopment analysis (DEA), the curse of dimensionality problem may jeopardize the accuracy or even the relevance of results when there is a relatively large dimension of inputs and outputs, even for relatively large samples. Recently, a machine learning approach based on the least absolute shrinkage and selection operator (LASSO) for variable selection was combined with SCNLS (a special case of DEA), and dubbed as LASSO-SCNLS, as a way to circumvent the curse of dimensionality problem. In this paper, we revisit this interesting approach, by considering various data generating processes. We also explore a more advanced version of LASSO, the so-called elastic net (EN) approach, adapt it to DEA and propose the EN-DEA. Our Monte Carlo simulations provide additional and to some extent, new evidence and conclusions. In particular, we find that none of the considered approaches clearly dominate the others. To circumvent the curse of dimensionality of DEA in the context of big wide data, we also propose a simplified two-step approach which we call LASSO+DEA. We find that the proposed simplified approach could be more useful than the existing more sophisticated approaches for reducing very large dimensions into sparser, more parsimonious DEA models that attain greater discriminatory power and suffer less from the curse of dimensionality.

Suggested Citation

  • Ya Chen & Mike Tsionas & Valentin Zelenyuk, 2020. "LASSO DEA for small and big data," CEPA Working Papers Series WP092020, School of Economics, University of Queensland, Australia.
  • Handle: RePEc:qld:uqcepa:152
    as

    Download full text from publisher

    File URL: https://economics.uq.edu.au/files/22126/WP092020.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Tsionas, Mike G. & Izzeldin, Marwan, 2018. "Smooth approximations to monotone concave functions in production analysis: An alternative to nonparametric concave least squares," European Journal of Operational Research, Elsevier, vol. 271(3), pages 797-807.
    3. Keshvari, Abolfazl & Kuosmanen, Timo, 2013. "Stochastic non-convex envelopment of data: Applying isotonic regression to frontier estimation," European Journal of Operational Research, Elsevier, vol. 231(2), pages 481-491.
    4. Khezrimotlagh, Dariush & Zhu, Joe & Cook, Wade D. & Toloo, Mehdi, 2019. "Data envelopment analysis and big data," European Journal of Operational Research, Elsevier, vol. 274(3), pages 1047-1054.
    5. Léopold Simar & Valentin Zelenyuk, 2011. "Stochastic FDH/DEA estimators for frontier analysis," Journal of Productivity Analysis, Springer, vol. 36(1), pages 1-20, August.
    6. Lukas Meier & Sara Van De Geer & Peter Bühlmann, 2008. "The group lasso for logistic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 53-71, February.
    7. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls," Papers 1201.0224, arXiv.org, revised May 2012.
    8. Wilson, Paul W., 2018. "Dimension reduction in nonparametric models of production," European Journal of Operational Research, Elsevier, vol. 267(1), pages 349-367.
    9. Charles, Vincent & Aparicio, Juan & Zhu, Joe, 2019. "The curse of dimensionality of decision-making units: A simple approach to increase the discriminatory power of data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 279(3), pages 929-940.
    10. Lee, Chia-Yen & Cai, Jia-Ying, 2020. "LASSO variable selection in data envelopment analysis with small datasets," Omega, Elsevier, vol. 91(C).
    11. Emrouznejad, Ali & Yang, Guo-liang, 2018. "A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016," Socio-Economic Planning Sciences, Elsevier, vol. 61(C), pages 4-8.
    12. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    13. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    14. Rajarshi Guhaniyogi & David B. Dunson, 2015. "Bayesian Compressed Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1500-1514, December.
    15. Liu, John S. & Lu, Louis Y.Y. & Lu, Wen-Min, 2016. "Research fronts in data envelopment analysis," Omega, Elsevier, vol. 58(C), pages 33-45.
    16. Charnes, A. & Cooper, W. W. & Rhodes, E., 1978. "Measuring the efficiency of decision making units," European Journal of Operational Research, Elsevier, vol. 2(6), pages 429-444, November.
    17. Robert Tibshirani, 2011. "Regression shrinkage and selection via the lasso: a retrospective," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 273-282, June.
    18. Simar, Leopold & Wilson, Paul W., 2007. "Estimation and inference in two-stage, semi-parametric models of production processes," Journal of Econometrics, Elsevier, vol. 136(1), pages 31-64, January.
    19. Misiunas, Nicholas & Oztekin, Asil & Chen, Yao & Chandra, Kavitha, 2016. "DEANN: A healthcare analytic methodology of data envelopment analysis and artificial neural networks for the prediction of organ recipient functional status," Omega, Elsevier, vol. 58(C), pages 46-54.
    20. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    21. Kuosmanen, Timo & Johnson, Andrew, 2017. "Modeling joint production of multiple outputs in StoNED: Directional distance function approach," European Journal of Operational Research, Elsevier, vol. 262(2), pages 792-801.
    22. Kuosmanen, Timo, 2006. "Stochastic Nonparametric Envelopment of Data: Combining Virtues of SFA and DEA in a Unified Framework," Discussion Papers 11864, MTT Agrifood Research Finland.
    23. Timo Kuosmanen & Andrew L. Johnson, 2010. "Data Envelopment Analysis as Nonparametric Least-Squares Regression," Operations Research, INFORMS, vol. 58(1), pages 149-160, February.
    24. Cook, Wade D. & Seiford, Larry M., 2009. "Data envelopment analysis (DEA) - Thirty years on," European Journal of Operational Research, Elsevier, vol. 192(1), pages 1-17, January.
    25. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    26. R. D. Banker & A. Charnes & W. W. Cooper, 1984. "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis," Management Science, INFORMS, vol. 30(9), pages 1078-1092, September.
    27. Liu, John S. & Lu, Louis Y.Y. & Lu, Wen-Min & Lin, Bruce J.Y., 2013. "Data envelopment analysis 1978–2010: A citation-based literature survey," Omega, Elsevier, vol. 41(1), pages 3-15.
    28. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    29. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    30. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    31. Zelenyuk, Valentin, 2020. "Aggregation of inputs and outputs prior to Data Envelopment Analysis under big data," European Journal of Operational Research, Elsevier, vol. 282(1), pages 172-187.
    32. Timo Kuosmanen & Mika Kortelainen, 2012. "Stochastic non-smooth envelopment of data: semi-parametric frontier estimation subject to shape constraints," Journal of Productivity Analysis, Springer, vol. 38(1), pages 11-28, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Ya & Tsionas, Mike G. & Zelenyuk, Valentin, 2021. "LASSO+DEA for small and big wide data," Omega, Elsevier, vol. 102(C).
    2. Duras, Toni & Javed, Farrukh & Månsson, Kristofer & Sjölander, Pär & Söderberg, Magnus, 2023. "Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data," Energy Economics, Elsevier, vol. 120(C).
    3. Valentin Zelenyuk, 2019. "Data Envelopment Analysis and Business Analytics: The Big Data Challenges and Some Solutions," CEPA Working Papers Series WP072019, School of Economics, University of Queensland, Australia.
    4. Lee, Chia-Yen & Cai, Jia-Ying, 2020. "LASSO variable selection in data envelopment analysis with small datasets," Omega, Elsevier, vol. 91(C).
    5. Esteve, Miriam & Aparicio, Juan & Rodriguez-Sala, Jesus J. & Zhu, Joe, 2023. "Random Forests and the measurement of super-efficiency in the context of Free Disposal Hull," European Journal of Operational Research, Elsevier, vol. 304(2), pages 729-744.
    6. Valero-Carreras, Daniel & Aparicio, Juan & Guerrero, Nadia M., 2021. "Support vector frontiers: A new approach for estimating production functions through support vector machines," Omega, Elsevier, vol. 104(C).
    7. Yang, Guo-liang & Fukuyama, Hirofumi & Chen, Kun, 2019. "Investigating the regional sustainable performance of the Chinese real estate industry: A slack-based DEA approach," Omega, Elsevier, vol. 84(C), pages 141-159.
    8. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    9. Tsionas, Mike G., 2021. "Optimal combinations of stochastic frontier and data envelopment analysis models," European Journal of Operational Research, Elsevier, vol. 294(2), pages 790-800.
    10. Yanfang Zhang & Chuanhua Wei & Xiaolin Liu, 2022. "Group Logistic Regression Models with l p,q Regularization," Mathematics, MDPI, vol. 10(13), pages 1-15, June.
    11. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    12. Picazo-Tadeo, Andrés J. & Beltrán-Esteve, Mercedes & Gómez-Limón, José A., 2012. "Assessing eco-efficiency with directional distance functions," European Journal of Operational Research, Elsevier, vol. 220(3), pages 798-809.
    13. Julien Chevallier & Dominique Guégan & Stéphane Goutte, 2021. "Is It Possible to Forecast the Price of Bitcoin?," Forecasting, MDPI, vol. 3(2), pages 1-44, May.
    14. Hahn, G.J. & Brandenburg, M. & Becker, J., 2021. "Valuing supply chain performance within and across manufacturing industries: A DEA-based approach," International Journal of Production Economics, Elsevier, vol. 240(C).
    15. Vladimír Holý, 2022. "The impact of operating environment on efficiency of public libraries," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 30(1), pages 395-414, March.
    16. Hoang, Daniel & Wiegratz, Kevin, 2022. "Machine learning methods in finance: Recent applications and prospects," Working Paper Series in Economics 158, Karlsruhe Institute of Technology (KIT), Department of Economics and Management.
    17. Quaranta, Anna Grazia & Raffoni, Anna & Visani, Franco, 2018. "A multidimensional approach to measuring bank branch efficiency," European Journal of Operational Research, Elsevier, vol. 266(2), pages 746-760.
    18. Zelenyuk, Valentin, 2020. "Aggregation of inputs and outputs prior to Data Envelopment Analysis under big data," European Journal of Operational Research, Elsevier, vol. 282(1), pages 172-187.
    19. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    20. Kuosmanen, Timo & Johnson, Andrew, 2017. "Modeling joint production of multiple outputs in StoNED: Directional distance function approach," European Journal of Operational Research, Elsevier, vol. 262(2), pages 792-801.

    More about this item

    Keywords

    Data envelopment analysis; Data enabled analytics; Sign-constrainedconvex nonparametric least squares (SCNLS); Machine learning; LASSO; Elastic net; Big wide data;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:qld:uqcepa:152. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SOE IT (email available below). General contact details of provider: https://edirc.repec.org/data/decuqau.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.