IDEAS home Printed from https://ideas.repec.org/a/eee/jomega/v91y2020ics0305048318305759.html
   My bibliography  Save this article

LASSO variable selection in data envelopment analysis with small datasets

Author

Listed:
  • Lee, Chia-Yen
  • Cai, Jia-Ying

Abstract

The curse of dimensionality problem arises when a limited number of observations are used to estimate a high-dimensional frontier, in particular, by data envelopment analysis (DEA). The study conducts a data generating process (DGP) to argue the typical “rule of thumb” used in DEA, e.g. the required number of observations should be at least larger than twice of the number of inputs and outputs, is ambiguous and will produce large deviations in estimating the technical efficiency. To address this issue, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique, which is usually used in data science for extracting significant factors, and combine it in a sign-constrained convex nonparametric least squares (SCNLS), which can be regarded as DEA estimator. Simulation results demonstrate that the proposed LASSO-SCNLS method and its variants provide useful guidelines for the DEA with small datasets.

Suggested Citation

  • Lee, Chia-Yen & Cai, Jia-Ying, 2020. "LASSO variable selection in data envelopment analysis with small datasets," Omega, Elsevier, vol. 91(C).
  • Handle: RePEc:eee:jomega:v:91:y:2020:i:c:s0305048318305759
    DOI: 10.1016/j.omega.2018.12.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0305048318305759
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.omega.2018.12.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Leopold Simar & Paul Wilson, 2000. "A general methodology for bootstrapping in non-parametric frontier models," Journal of Applied Statistics, Taylor & Francis Journals, vol. 27(6), pages 779-802.
    2. Wilson, Paul W., 2018. "Dimension reduction in nonparametric models of production," European Journal of Operational Research, Elsevier, vol. 267(1), pages 349-367.
    3. Afriat, Sidney N, 1972. "Efficiency Estimation of Production Function," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 13(3), pages 568-598, October.
    4. Cinzia Daraio & Léopold Simar, 2005. "Introducing Environmental Variables in Nonparametric Frontier Models: a Probabilistic Approach," Journal of Productivity Analysis, Springer, vol. 24(1), pages 93-121, September.
    5. Boussofiane, A. & Dyson, R. G. & Thanassoulis, E., 1991. "Applied data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 52(1), pages 1-15, May.
    6. Adler, Nicole & Yazhemsky, Ekaterina, 2010. "Improving discrimination in data envelopment analysis: PCA-DEA or variable reduction," European Journal of Operational Research, Elsevier, vol. 202(1), pages 273-284, April.
    7. John Ruggiero, 2005. "Impact Assessment Of Input Omission On Dea," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 4(03), pages 359-368.
    8. Jesús T. Pastor & JosÉ L. Ruiz & Inmaculada Sirvent, 2002. "A Statistical Test for Nested Radial Dea Models," Operations Research, INFORMS, vol. 50(4), pages 728-735, August.
    9. Richard Bellman, 1957. "On a Dynamic Programming Approach to the Caterer Problem--I," Management Science, INFORMS, vol. 3(3), pages 270-278, April.
    10. Nataraja, Niranjan R. & Johnson, Andrew L., 2011. "Guidelines for using variable selection techniques in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 215(3), pages 662-669, December.
    11. N Adler & B Golany, 2002. "Including principal component weights to improve discrimination in data envelopment analysis," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 53(9), pages 985-991, September.
    12. Charnes, A. & Cooper, W. W. & Golany, B. & Seiford, L. & Stutz, J., 1985. "Foundations of data envelopment analysis for Pareto-Koopmans efficient empirical production functions," Journal of Econometrics, Elsevier, vol. 30(1-2), pages 91-107.
    13. Liu, John S. & Lu, Louis Y.Y. & Lu, Wen-Min & Lin, Bruce J.Y., 2013. "Data envelopment analysis 1978–2010: A citation-based literature survey," Omega, Elsevier, vol. 41(1), pages 3-15.
    14. Dyson, R. G. & Allen, R. & Camanho, A. S. & Podinovski, V. V. & Sarrico, C. S. & Shale, E. A., 2001. "Pitfalls and protocols in DEA," European Journal of Operational Research, Elsevier, vol. 132(2), pages 245-259, July.
    15. William W. Cooper & Lawrence M. Seiford & Kaoru Tone, 2006. "Introduction to Data Envelopment Analysis and Its Uses," Springer Books, Springer, number 978-0-387-29122-2, September.
    16. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    17. Golany, B & Roll, Y, 1989. "An application procedure for DEA," Omega, Elsevier, vol. 17(3), pages 237-250.
    18. Lee, Chia-Yen & Johnson, Andrew L. & Moreno-Centeno, Erick & Kuosmanen, Timo, 2013. "A more efficient algorithm for Convex Nonparametric Least Squares," European Journal of Operational Research, Elsevier, vol. 227(2), pages 391-400.
    19. Liu, John S. & Lu, Louis Y.Y. & Lu, Wen-Min, 2016. "Research fronts in data envelopment analysis," Omega, Elsevier, vol. 58(C), pages 33-45.
    20. Timo Kuosmanen, 2008. "Representation theorem for convex nonparametric least squares," Econometrics Journal, Royal Economic Society, vol. 11(2), pages 308-325, July.
    21. Kuosmanen, Timo & Johnson, Andrew, 2017. "Modeling joint production of multiple outputs in StoNED: Directional distance function approach," European Journal of Operational Research, Elsevier, vol. 262(2), pages 792-801.
    22. Timo Kuosmanen & Andrew L. Johnson, 2010. "Data Envelopment Analysis as Nonparametric Least-Squares Regression," Operations Research, INFORMS, vol. 58(1), pages 149-160, February.
    23. Aigner, Dennis & Lovell, C. A. Knox & Schmidt, Peter, 1977. "Formulation and estimation of stochastic frontier production function models," Journal of Econometrics, Elsevier, vol. 6(1), pages 21-37, July.
    24. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    25. Timo Kuosmanen & Mika Kortelainen, 2012. "Stochastic non-smooth envelopment of data: semi-parametric frontier estimation subject to shape constraints," Journal of Productivity Analysis, Springer, vol. 38(1), pages 11-28, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Esteve, Miriam & Aparicio, Juan & Rodriguez-Sala, Jesus J. & Zhu, Joe, 2023. "Random Forests and the measurement of super-efficiency in the context of Free Disposal Hull," European Journal of Operational Research, Elsevier, vol. 304(2), pages 729-744.
    2. He Jiang, 2023. "Robust forecasting in spatial autoregressive model with total variation regularization," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(2), pages 195-211, March.
    3. Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Ranking the Importance of Variables in a Nonparametric Frontier Analysis Using Unsupervised Machine Learning Techniques," Mathematics, MDPI, vol. 11(11), pages 1-24, June.
    4. Chen, Ya & Tsionas, Mike G. & Zelenyuk, Valentin, 2021. "LASSO+DEA for small and big wide data," Omega, Elsevier, vol. 102(C).
    5. Martina Mokrišová & Jarmila Horváthová, 2023. "Domain Knowledge Features versus LASSO Features in Predicting Risk of Corporate Bankruptcy—DEA Approach," Risks, MDPI, vol. 11(11), pages 1-18, November.
    6. Imad Bou-Hamad & Abdel Latef Anouze & Ibrahim H. Osman, 2022. "A cognitive analytics management framework to select input and output variables for data envelopment analysis modeling of performance efficiency of banks using random forest and entropy of information," Annals of Operations Research, Springer, vol. 308(1), pages 63-92, January.
    7. Dai, Sheng, 2023. "Variable selection in convex quantile regression: L1-norm or L0-norm regularization?," European Journal of Operational Research, Elsevier, vol. 305(1), pages 338-355.
    8. Olawale Ogunrinde & Ekundayo Shittu, 2023. "Benchmarking performance of photovoltaic power plants in multiple periods," Environment Systems and Decisions, Springer, vol. 43(3), pages 489-503, September.
    9. Karagiannis, Roxani & Karagiannis, Giannis, 2023. "Nonparametric estimates of price efficiency for the Greek infant milk market: Curing the curse of dimensionality with shannon entropy," Economic Modelling, Elsevier, vol. 121(C).
    10. Anna Łozowicka & Bartłomiej Lach, 2022. "CI-DEA: A Way to Improve the Discriminatory Power of DEA—Using the Example of the Efficiency Assessment of the Digitalization in the Life of the Generation 50+," Sustainability, MDPI, vol. 14(6), pages 1-22, March.
    11. Valero-Carreras, Daniel & Aparicio, Juan & Guerrero, Nadia M., 2021. "Support vector frontiers: A new approach for estimating production functions through support vector machines," Omega, Elsevier, vol. 104(C).
    12. Alda A. Henriques & Milton Fontes & Ana S. Camanho & Giovanna D’Inverno & Pedro Amorim & Jaime Gabriel Silva, 2022. "Performance evaluation of problematic samples: a robust nonparametric approach for wastewater treatment plants," Annals of Operations Research, Springer, vol. 315(1), pages 193-220, August.
    13. Ya Chen & Mike Tsionas & Valentin Zelenyuk, 2020. "LASSO DEA for small and big data," CEPA Working Papers Series WP092020, School of Economics, University of Queensland, Australia.
    14. Duras, Toni & Javed, Farrukh & Månsson, Kristofer & Sjölander, Pär & Söderberg, Magnus, 2023. "Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data," Energy Economics, Elsevier, vol. 120(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Esteve, Miriam & Aparicio, Juan & Rodriguez-Sala, Jesus J. & Zhu, Joe, 2023. "Random Forests and the measurement of super-efficiency in the context of Free Disposal Hull," European Journal of Operational Research, Elsevier, vol. 304(2), pages 729-744.
    2. Dai, Sheng, 2023. "Variable selection in convex quantile regression: L1-norm or L0-norm regularization?," European Journal of Operational Research, Elsevier, vol. 305(1), pages 338-355.
    3. Chen, Ya & Tsionas, Mike G. & Zelenyuk, Valentin, 2021. "LASSO+DEA for small and big wide data," Omega, Elsevier, vol. 102(C).
    4. Ya Chen & Mike Tsionas & Valentin Zelenyuk, 2020. "LASSO DEA for small and big data," CEPA Working Papers Series WP092020, School of Economics, University of Queensland, Australia.
    5. Valentin Zelenyuk, 2019. "Data Envelopment Analysis and Business Analytics: The Big Data Challenges and Some Solutions," CEPA Working Papers Series WP072019, School of Economics, University of Queensland, Australia.
    6. Toloo, Mehdi & Tone, Kaoru & Izadikhah, Mohammad, 2023. "Selecting slacks-based data envelopment analysis models," European Journal of Operational Research, Elsevier, vol. 308(3), pages 1302-1318.
    7. Peyrache, Antonio & Rose, Christiern & Sicilia, Gabriela, 2020. "Variable selection in Data Envelopment Analysis," European Journal of Operational Research, Elsevier, vol. 282(2), pages 644-659.
    8. Nataraja, Niranjan R. & Johnson, Andrew L., 2011. "Guidelines for using variable selection techniques in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 215(3), pages 662-669, December.
    9. Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Ranking the Importance of Variables in a Nonparametric Frontier Analysis Using Unsupervised Machine Learning Techniques," Mathematics, MDPI, vol. 11(11), pages 1-24, June.
    10. Villanueva-Cantillo, Jeyms & Munoz-Marquez, Manuel, 2021. "Methodology for calculating critical values of relevance measures in variable selection methods in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 290(2), pages 657-670.
    11. Léopold Simar & Paul W. Wilson, 2015. "Statistical Approaches for Non-parametric Frontier Models: A Guided Tour," International Statistical Review, International Statistical Institute, vol. 83(1), pages 77-110, April.
    12. Lee, Chia-Yen & Wang, Ke, 2019. "Nash marginal abatement cost estimation of air pollutant emissions using the stochastic semi-nonparametric frontier," European Journal of Operational Research, Elsevier, vol. 273(1), pages 390-400.
    13. Wei, Xiao & Zhang, Ning, 2020. "The shadow prices of CO2 and SO2 for Chinese Coal-fired Power Plants: A partial frontier approach," Energy Economics, Elsevier, vol. 85(C).
    14. Delnava, Haleh & Khosravi, Ali & El Haj Assad, Mamdouh, 2023. "Metafrontier frameworks for estimating solar power efficiency in the United States using stochastic nonparametric envelopment of data (StoNED)," Renewable Energy, Elsevier, vol. 213(C), pages 195-204.
    15. Mike Tsionas & Valentin Zelenyuk, 2021. "Goodness-of-fit in Optimizing Models of Production: A Generalization with a Bayesian Perspective," CEPA Working Papers Series WP182021, School of Economics, University of Queensland, Australia.
    16. Eskelinen, Juha & Kuosmanen, Timo, 2013. "Intertemporal efficiency analysis of sales teams of a bank: Stochastic semi-nonparametric approach," Journal of Banking & Finance, Elsevier, vol. 37(12), pages 5163-5175.
    17. Mekaroonreung, Maethee & Johnson, Andrew L., 2014. "A nonparametric method to estimate a technical change effect on marginal abatement costs of U.S. coal power plants," Energy Economics, Elsevier, vol. 46(C), pages 45-55.
    18. Quaranta, Anna Grazia & Raffoni, Anna & Visani, Franco, 2018. "A multidimensional approach to measuring bank branch efficiency," European Journal of Operational Research, Elsevier, vol. 266(2), pages 746-760.
    19. Zelenyuk, Valentin, 2020. "Aggregation of inputs and outputs prior to Data Envelopment Analysis under big data," European Journal of Operational Research, Elsevier, vol. 282(1), pages 172-187.
    20. Kuosmanen, Timo & Johnson, Andrew, 2017. "Modeling joint production of multiple outputs in StoNED: Directional distance function approach," European Journal of Operational Research, Elsevier, vol. 262(2), pages 792-801.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jomega:v:91:y:2020:i:c:s0305048318305759. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/375/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.