IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i11p2590-d1164478.html
   My bibliography  Save this article

Ranking the Importance of Variables in a Nonparametric Frontier Analysis Using Unsupervised Machine Learning Techniques

Author

Listed:
  • Raul Moragues

    (Center of Operations Research (CIO), Miguel Hernandez University of Elche (UMH), 03202 Elche, Spain
    Ph.D. Program in Economics (DEcIDE), Miguel Hernandez University of Elche (UMH), 03202 Elche, Spain)

  • Juan Aparicio

    (Center of Operations Research (CIO), Miguel Hernandez University of Elche (UMH), 03202 Elche, Spain
    Joint Research Unit, Valencian Graduate School and Research Network of Artificial Intelligence (valgrAI), 46022 Valencia, Spain)

  • Miriam Esteve

    (Center of Operations Research (CIO), Miguel Hernandez University of Elche (UMH), 03202 Elche, Spain)

Abstract

In this paper, we propose and compare new methodologies for ranking the importance of variables in productive processes via an adaptation of OneClass Support Vector Machines. In particular, we adapt two methodologies inspired by the machine learning literature: one involving the random shuffling of values of a variable and another one using the objective value of the dual formulation of the model. Additionally, we motivate the use of these type of algorithms in the production context and compare their performance via a computational experiment. We observe that the methodology based on shuffling the values of a variable outperforms the methodology based on the dual formulation. We observe that the shuffling-based methodology correctly ranks the variables in 94% of the scenarios with one relevant input and one irrelevant input. Moreover, it correctly ranks each variable in at least 65% of replications of a scenario with three relevant inputs and one irrelevant input.

Suggested Citation

  • Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Ranking the Importance of Variables in a Nonparametric Frontier Analysis Using Unsupervised Machine Learning Techniques," Mathematics, MDPI, vol. 11(11), pages 1-24, June.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:11:p:2590-:d:1164478
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/11/2590/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/11/2590/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Leopold Simar & Paul Wilson, 2000. "A general methodology for bootstrapping in non-parametric frontier models," Journal of Applied Statistics, Taylor & Francis Journals, vol. 27(6), pages 779-802.
    2. Phillip Fanchon, 2003. "Variable selection for dynamic measures of efficiency in the computer industry," International Advances in Economic Research, Springer;International Atlantic Economic Society, vol. 9(3), pages 175-188, August.
    3. Banker, Rajiv D. & Chang, Hsihui, 1995. "A simulation study of hypothesis tests for differences in efficiencies," International Journal of Production Economics, Elsevier, vol. 39(1-2), pages 37-54, April.
    4. Per Andersen & Niels Christian Petersen, 1993. "A Procedure for Ranking Efficient Units in Data Envelopment Analysis," Management Science, INFORMS, vol. 39(10), pages 1261-1264, October.
    5. Inmaculada Sirvent & José L. Ruiz & Fernando Borrás & Jesús T. Pastor, 2005. "A Monte Carlo Evaluation Of Several Tests For The Selection Of Variables In Dea Models," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 4(03), pages 325-343.
    6. Abdelaati Daouia & Hohsuk Noh & Byeong U. Park, 2016. "Data envelope fitting with constrained polynomial splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 3-30, January.
    7. Sharma, Mithun J. & Yu, Song Jin, 2015. "Stepwise regression data envelopment analysis for variable reduction," Applied Mathematics and Computation, Elsevier, vol. 253(C), pages 126-134.
    8. Benítez-Peña, Sandra & Bogetoft, Peter & Romero Morales, Dolores, 2020. "Feature Selection in Data Envelopment Analysis: A Mathematical Optimization approach," Omega, Elsevier, vol. 96(C).
    9. Charnes, A. & Cooper, W. W. & Rhodes, E., 1978. "Measuring the efficiency of decision making units," European Journal of Operational Research, Elsevier, vol. 2(6), pages 429-444, November.
    10. John Ruggiero, 2005. "Impact Assessment Of Input Omission On Dea," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 4(03), pages 359-368.
    11. Jesús T. Pastor & JosÉ L. Ruiz & Inmaculada Sirvent, 2002. "A Statistical Test for Nested Radial Dea Models," Operations Research, INFORMS, vol. 50(4), pages 728-735, August.
    12. Meeusen, Wim & van den Broeck, Julien, 1977. "Efficiency Estimation from Cobb-Douglas Production Functions with Composed Error," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 18(2), pages 435-444, June.
    13. Nataraja, Niranjan R. & Johnson, Andrew L., 2011. "Guidelines for using variable selection techniques in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 215(3), pages 662-669, December.
    14. Léopold Simar & Paul W. Wilson, 1998. "Sensitivity Analysis of Efficiency Scores: How to Bootstrap in Nonparametric Frontier Models," Management Science, INFORMS, vol. 44(1), pages 49-61, January.
    15. N Adler & B Golany, 2002. "Including principal component weights to improve discrimination in data envelopment analysis," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 53(9), pages 985-991, September.
    16. Nadia M. Guerrero & Juan Aparicio & Daniel Valero-Carreras, 2022. "Combining Data Envelopment Analysis and Machine Learning," Mathematics, MDPI, vol. 10(6), pages 1-22, March.
    17. Svetlana V. Ratner & Artem M. Shaposhnikov & Andrey V. Lychev, 2023. "Network DEA and Its Applications (2017–2022): A Systematic Literature Review," Mathematics, MDPI, vol. 11(9), pages 1-24, May.
    18. Tsionas, Mike, 2022. "Efficiency estimation using probabilistic regression trees with an application to Chilean manufacturing industries," International Journal of Production Economics, Elsevier, vol. 249(C).
    19. Charles, Vincent & Aparicio, Juan & Zhu, Joe, 2019. "The curse of dimensionality of decision-making units: A simple approach to increase the discriminatory power of data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 279(3), pages 929-940.
    20. Lee, Chia-Yen & Cai, Jia-Ying, 2020. "LASSO variable selection in data envelopment analysis with small datasets," Omega, Elsevier, vol. 91(C).
    21. Kneip, Alois & Park, Byeong U. & Simar, Léopold, 1998. "A Note On The Convergence Of Nonparametric Dea Estimators For Production Efficiency Scores," Econometric Theory, Cambridge University Press, vol. 14(6), pages 783-793, December.
    22. Valero-Carreras, Daniel & Aparicio, Juan & Guerrero, Nadia M., 2021. "Support vector frontiers: A new approach for estimating production functions through support vector machines," Omega, Elsevier, vol. 104(C).
    23. Pei Fun Lee & Weng Siew Lam & Weng Hoe Lam, 2023. "Performance Evaluation of the Efficiency of Logistics Companies with Data Envelopment Analysis Model," Mathematics, MDPI, vol. 11(3), pages 1-15, January.
    24. Peyrache, Antonio & Rose, Christiern & Sicilia, Gabriela, 2020. "Variable selection in Data Envelopment Analysis," European Journal of Operational Research, Elsevier, vol. 282(2), pages 644-659.
    25. Jirawan Jitthavech, 2016. "Variable elimination in nested DEA models: a statistical approach," International Journal of Operational Research, Inderscience Enterprises Ltd, vol. 27(3), pages 389-410.
    26. Cherchye, Laurens & De Rock, Bram & Walheer, Barnabé, 2016. "Multi-output profit efficiency and directional distance functions," Omega, Elsevier, vol. 61(C), pages 100-109.
    27. Olesen, O.B. & Ruggiero, J., 2022. "The hinging hyperplanes: An alternative nonparametric representation of a production function," European Journal of Operational Research, Elsevier, vol. 296(1), pages 254-266.
    28. Daouia, Abdelaati & Noh, Hohsuk & Park, Byeong U., 2016. "Data envelope fitting with constrained polynomial splines," LIDAM Reprints ISBA 2016011, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    29. Timo Kuosmanen & Andrew L. Johnson, 2010. "Data Envelopment Analysis as Nonparametric Least-Squares Regression," Operations Research, INFORMS, vol. 58(1), pages 149-160, February.
    30. Yongjun Li & Xiao Shi & Min Yang & Liang Liang, 2017. "Variable selection in data envelopment analysis via Akaike’s information criteria," Annals of Operations Research, Springer, vol. 253(1), pages 453-476, June.
    31. R. D. Banker & A. Charnes & W. W. Cooper, 1984. "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis," Management Science, INFORMS, vol. 30(9), pages 1078-1092, September.
    32. Jenkins, Larry & Anderson, Murray, 2003. "A multivariate statistical approach to reducing the number of variables in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 147(1), pages 51-61, May.
    33. Chambers, Robert G. & Chung, Yangho & Fare, Rolf, 1996. "Benefit and Distance Functions," Journal of Economic Theory, Elsevier, vol. 70(2), pages 407-419, August.
    34. Aigner, Dennis & Lovell, C. A. Knox & Schmidt, Peter, 1977. "Formulation and estimation of stochastic frontier production function models," Journal of Econometrics, Elsevier, vol. 6(1), pages 21-37, July.
    35. Duras, Toni & Javed, Farrukh & Månsson, Kristofer & Sjölander, Pär & Söderberg, Magnus, 2023. "Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data," Energy Economics, Elsevier, vol. 120(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Esteve, Miriam & Aparicio, Juan & Rodriguez-Sala, Jesus J. & Zhu, Joe, 2023. "Random Forests and the measurement of super-efficiency in the context of Free Disposal Hull," European Journal of Operational Research, Elsevier, vol. 304(2), pages 729-744.
    2. Imad Bou-Hamad & Abdel Latef Anouze & Ibrahim H. Osman, 2022. "A cognitive analytics management framework to select input and output variables for data envelopment analysis modeling of performance efficiency of banks using random forest and entropy of information," Annals of Operations Research, Springer, vol. 308(1), pages 63-92, January.
    3. Raul Moragues & Juan Aparicio & Miriam Esteve, 2023. "Measuring technical efficiency for multi-input multi-output production processes through OneClass Support Vector Machines: a finite-sample study," Operational Research, Springer, vol. 23(3), pages 1-33, September.
    4. España, Victor J. & Aparicio, Juan & Barber, Xavier & Esteve, Miriam, 2024. "Estimating production functions through additive models based on regression splines," European Journal of Operational Research, Elsevier, vol. 312(2), pages 684-699.
    5. Villanueva-Cantillo, Jeyms & Munoz-Marquez, Manuel, 2021. "Methodology for calculating critical values of relevance measures in variable selection methods in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 290(2), pages 657-670.
    6. Toloo, Mehdi & Tone, Kaoru & Izadikhah, Mohammad, 2023. "Selecting slacks-based data envelopment analysis models," European Journal of Operational Research, Elsevier, vol. 308(3), pages 1302-1318.
    7. Nataraja, Niranjan R. & Johnson, Andrew L., 2011. "Guidelines for using variable selection techniques in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 215(3), pages 662-669, December.
    8. Peyrache, Antonio & Rose, Christiern & Sicilia, Gabriela, 2020. "Variable selection in Data Envelopment Analysis," European Journal of Operational Research, Elsevier, vol. 282(2), pages 644-659.
    9. Nadia M. Guerrero & Juan Aparicio & Daniel Valero-Carreras, 2022. "Combining Data Envelopment Analysis and Machine Learning," Mathematics, MDPI, vol. 10(6), pages 1-22, March.
    10. Jamal Ouenniche & Skarleth Carrales, 2018. "Assessing efficiency profiles of UK commercial banks: a DEA analysis with regression-based feedback," Annals of Operations Research, Springer, vol. 266(1), pages 551-587, July.
    11. Anna Łozowicka & Bartłomiej Lach, 2022. "CI-DEA: A Way to Improve the Discriminatory Power of DEA—Using the Example of the Efficiency Assessment of the Digitalization in the Life of the Generation 50+," Sustainability, MDPI, vol. 14(6), pages 1-22, March.
    12. Karagiannis, Roxani & Karagiannis, Giannis, 2023. "Nonparametric estimates of price efficiency for the Greek infant milk market: Curing the curse of dimensionality with shannon entropy," Economic Modelling, Elsevier, vol. 121(C).
    13. Davtalab-Olyaie, Mostafa & Asgharian, Masoud & Nia, Vahid Partovi, 2019. "Stochastic ranking and dominance in DEA," International Journal of Production Economics, Elsevier, vol. 214(C), pages 125-138.
    14. Léopold Simar & Paul W. Wilson, 2015. "Statistical Approaches for Non-parametric Frontier Models: A Guided Tour," International Statistical Review, International Statistical Institute, vol. 83(1), pages 77-110, April.
    15. Kao, Chiang & Liu, Shiang-Tai, 2009. "Stochastic data envelopment analysis in measuring the efficiency of Taiwan commercial banks," European Journal of Operational Research, Elsevier, vol. 196(1), pages 312-322, July.
    16. Adler, Nicole & Yazhemsky, Ekaterina, 2010. "Improving discrimination in data envelopment analysis: PCA-DEA or variable reduction," European Journal of Operational Research, Elsevier, vol. 202(1), pages 273-284, April.
    17. Luis R. Murillo‐Zamorano, 2004. "Economic Efficiency and Frontier Techniques," Journal of Economic Surveys, Wiley Blackwell, vol. 18(1), pages 33-77, February.
    18. Qiwei Xie & Yuanyuan Li & Lizheng Wang & Chao Liu, 2018. "Improving discrimination in data envelopment analysis without losing information based on Renyi’s entropy," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 26(4), pages 1053-1068, December.
    19. Manuel Salas-Velasco, 2020. "Measuring and explaining the production efficiency of Spanish universities using a non-parametric approach and a bootstrapped-truncated regression," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 825-846, February.
    20. Dai, Sheng, 2023. "Variable selection in convex quantile regression: L1-norm or L0-norm regularization?," European Journal of Operational Research, Elsevier, vol. 305(1), pages 338-355.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:11:p:2590-:d:1164478. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.