IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v96y2016icp57-73.html
   My bibliography  Save this article

Random forest for ordinal responses: Prediction and variable selection

Author

Listed:
  • Janitza, Silke
  • Tutz, Gerhard
  • Boulesteix, Anne-Laure

Abstract

The random forest method is a commonly used tool for classification with high-dimensional data that is able to rank candidate predictors through its inbuilt variable importance measures. It can be applied to various kinds of regression problems including nominal, metric and survival response variables. While classification and regression problems using random forest methodology have been extensively investigated in the past, in the case of ordinal response there is no standard procedure. Extensive studies using random forest based on conditional inference trees are conducted to explore whether incorporating the ordering information yields any improvement in both prediction performance or variable selection. Two novel permutation variable importance measures are presented that are reasonable alternatives to the currently implemented importance measure which was developed for nominal response and makes no use of the ordering in the levels of an ordinal response variable. Results based on simulated and real data suggest that predictor rankings can be improved in some settings by using new permutation importance measures that explicitly use the ordering in the response levels in combination with ordinal regression trees. With respect to prediction accuracy, the performance of ordinal regression trees was similar to and in most settings even slightly better than that of classification trees.

Suggested Citation

  • Janitza, Silke & Tutz, Gerhard & Boulesteix, Anne-Laure, 2016. "Random forest for ordinal responses: Prediction and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 57-73.
  • Handle: RePEc:eee:csdana:v:96:y:2016:i:c:p:57-73
    DOI: 10.1016/j.csda.2015.10.005
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947315002601
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2015.10.005?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Raffaella Piccarreta, 2001. "A new measure of nominal-ordinal association," Journal of Applied Statistics, Taylor & Francis Journals, vol. 28(1), pages 107-120.
    2. Hothorn, Torsten & Hornik, Kurt & van de Wiel, Mark A. & Zeileis, Achim, 2006. "A Lego System for Conditional Inference," The American Statistician, American Statistical Association, vol. 60, pages 257-263, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Weidong Guo & Zach Zhizhong Zhou, 2022. "A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1248-1313, September.
    2. Wang, Yong & Ma, Yinjie & Xie, Deyi & Yu, Zhenhuan & E, Jiaqiang, 2021. "Numerical study on the influence of gasoline properties and thermodynamic conditions on premixed laminar flame velocity at multiple conditions," Energy, Elsevier, vol. 233(C).
    3. Roman Hornung, 2020. "Ordinal Forests," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 4-17, April.
    4. Gerhard Tutz, 2022. "Ordinal Trees and Random Forests: Score-Free Recursive Partitioning and Improved Ensembles," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 241-263, July.
    5. Gairaa, Kacem & Voyant, Cyril & Notton, Gilles & Benkaciali, Saïd & Guermoui, Mawloud, 2022. "Contribution of ordinal variables to short-term global solar irradiation forecasting for sites with low variabilities," Renewable Energy, Elsevier, vol. 183(C), pages 890-902.
    6. Aleix Alcacer & Irene Epifanio & Jorge Valero & Alfredo Ballester, 2021. "Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size," Mathematics, MDPI, vol. 9(7), pages 1-15, April.
    7. Marcella Corduas & Alfonso Piscitelli, 2017. "Modeling university student satisfaction: the case of the humanities and social studies degree programs," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(2), pages 617-628, March.
    8. Ha, Tran Vinh & Asada, Takumi & Arimura, Mikiharu, 2019. "Determination of the influence factors on household vehicle ownership patterns in Phnom Penh using statistical and machine learning methods," Journal of Transport Geography, Elsevier, vol. 78(C), pages 70-86.
    9. Guoqiang Chen & Tianyu Long & Jiangong Xiong & Yun Bai, 2017. "Multiple Random Forests Modelling for Urban Water Consumption Forecasting," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 31(15), pages 4715-4729, December.
    10. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    11. Maljkovic, Danica & Basic, Bojana Dalbelo, 2020. "Determination of influential parameters for heat consumption in district heating systems using machine learning," Energy, Elsevier, vol. 201(C).
    12. Yifei Jiang & Honglei Zhang & Xianting Cao & Ge Wei & Yang Yang, 2023. "How to better incorporate geographic variation in Airbnb price modeling?," Tourism Economics, , vol. 29(5), pages 1181-1203, August.
    13. Odey Alshboul & Ali Shehadeh & Ghassan Almasabha & Ali Saeed Almuflih, 2022. "Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction," Sustainability, MDPI, vol. 14(11), pages 1-20, May.
    14. Lechner, Michael & Okasa, Gabriel, 2019. "Random Forest Estimation of the Ordered Choice Model," Economics Working Paper Series 1908, University of St. Gallen, School of Economics and Political Science.
    15. Mohammad Mehedy Hassan & Jane Southworth, 2017. "Analyzing Land Cover Change and Urban Growth Trajectories of the Mega-Urban Region of Dhaka Using Remotely Sensed Data and an Ensemble Classifier," Sustainability, MDPI, vol. 10(1), pages 1-24, December.
    16. Riccardo Di Francesco, 2023. "Ordered Correlation Forest," Papers 2309.08755, arXiv.org.
    17. Yaser Abdollahfard & Mehdi Sedighi & Mostafa Ghasemi, 2023. "A New Approach for Improving Microbial Fuel Cell Performance Using Artificial Intelligence," Sustainability, MDPI, vol. 15(2), pages 1-14, January.
    18. Silke Janitza & Ender Celik & Anne-Laure Boulesteix, 2018. "A computationally fast variable importance test for random forests for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 885-915, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tomáš Želinský, 2015. "Nekonzistentnosť časových preferencií ľudí z arginalizovaných rómskych komunít [On inconsistency of time preferences of people from the marginalised roma communities]," Politická ekonomie, Prague University of Economics and Business, vol. 2015(2), pages 204-222.
    2. repec:jss:jstsof:36:i02 is not listed on IDEAS
    3. Georgina Milne & Andrew William Byrne & Emma Campbell & Jordon Graham & John McGrath & Raymond Kirke & Wilma McMaster & Jesko Zimmermann & Adewale Henry Adenuga, 2022. "Quantifying Land Fragmentation in Northern Irish Cattle Enterprises," Land, MDPI, vol. 11(3), pages 1-16, March.
    4. Payton J. Jones & Patrick Mair & Thorsten Simon & Achim Zeileis, 2020. "Network Trees: A Method for Recursively Partitioning Covariance Structures," Psychometrika, Springer;The Psychometric Society, vol. 85(4), pages 926-945, December.
    5. Seibold Heidi & Hothorn Torsten & Zeileis Achim, 2016. "Model-Based Recursive Partitioning for Subgroup Analyses," The International Journal of Biostatistics, De Gruyter, vol. 12(1), pages 45-63, May.
    6. M. Perakis & P. Maravelakis & S. Psarakis & E. Xekalaki & J. Panaretos, 2005. "On Certain Indices for Ordinal Data with Unequally Weighted Classes," Quality & Quantity: International Journal of Methodology, Springer, vol. 39(5), pages 515-536, October.
    7. McGinlay, James & Parsons, David J. & Morris, Joe & Hubatova, Marie & Graves, Anil & Bradbury, Richard B. & Bullock, James M., 2017. "Do charismatic species groups generate more cultural ecosystem service benefits?," Ecosystem Services, Elsevier, vol. 27(PA), pages 15-24.
    8. Marc Ditzhaus & Arnold Janssen, 2020. "Bootstrap and permutation rank tests for proportional hazards under right censoring," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 26(3), pages 493-517, July.
    9. Elsäßer Amelie & Victor Anja & Hommel Gerhard, 2011. "Multiple Testing in Candidate Gene Situations: A Comparison of Classical, Discrete, and Resampling-Based Procedures," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-21, November.
    10. repec:jss:jstsof:28:i08 is not listed on IDEAS
    11. Torsten Hothorn & Achim Zeileis, 2008. "Generalized Maximally Selected Statistics," Biometrics, The International Biometric Society, vol. 64(4), pages 1263-1269, December.
    12. Jorge M. Arevalillo & Hilario Navarro, 2021. "Skewness-Kurtosis Model-Based Projection Pursuit with Application to Summarizing Gene Expression Data," Mathematics, MDPI, vol. 9(9), pages 1-18, April.
    13. Henning Sommermeyer & Hanna Krauss & Zuzanna Chęcińska-Maciejewska & Marcin Pszczola & Jacek Piątek, 2020. "Infantile Colic—The Perspective of German and Polish Pediatricians in 2020," IJERPH, MDPI, vol. 17(19), pages 1-11, September.
    14. Raffaella Piccarreta, 2008. "Classification trees for ordinal variables," Computational Statistics, Springer, vol. 23(3), pages 407-427, July.
    15. Chavez, Alex K. & Bicchieri, Cristina, 2013. "Third-party sanctioning and compensation behavior: Findings from the ultimatum game," Journal of Economic Psychology, Elsevier, vol. 39(C), pages 268-277.
    16. Raphael Knevels & Alexander Brenning & Simone Gingrich & Gerhard Heiss & Theresia Lechner & Philip Leopold & Christoph Plutzar & Herwig Proske & Helene Petschko, 2021. "Towards the Use of Land Use Legacies in Landslide Modeling: Current Challenges and Future Perspectives in an Austrian Case Study," Land, MDPI, vol. 10(9), pages 1-29, September.
    17. Hothorn, Torsten & Hornik, Kurt & van de Wiel, Mark A. & Zeileis, Achim, 2008. "Implementing a Class of Permutation Tests: The coin Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i08).
    18. repec:jss:jstsof:34:i01 is not listed on IDEAS
    19. Bryan Keller, 2012. "Detecting Treatment Effects with Small Samples: The Power of Some Tests Under the Randomization Model," Psychometrika, Springer;The Psychometric Society, vol. 77(2), pages 324-338, April.
    20. Zeileis, Achim & Croissant, Yves, 2010. "Extended Model Formulas in R: Multiple Parts and Multiple Responses," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 34(i01).
    21. Nordhausen, Klaus & Oja, Hannu, 2011. "Multivariate L1 Statistical Methods: The Package MNM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 43(i05).
    22. Santiago Carbo-Valverde & Pedro Cuadros-Solas & Francisco Rodríguez-Fernández, 2020. "A machine learning approach to the digitalization of bank customers: Evidence from random and causal forests," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-39, October.
    23. Ribas, Giovana Ghisleni & Zanon, Alencar Junior & Streck, Nereu Augusto & Pilecco, Isabela Bulegon & de Souza, Pablo Mazzuco & Heinemann, Alexandre Bryan & Grassini, Patricio, 2021. "Assessing yield and economic impact of introducing soybean to the lowland rice system in southern Brazil," Agricultural Systems, Elsevier, vol. 188(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:96:y:2016:i:c:p:57-73. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.