IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1907.02436.html
   My bibliography  Save this paper

Random Forest Estimation of the Ordered Choice Model

Author

Listed:
  • Michael Lechner
  • Gabriel Okasa

Abstract

In this paper we develop a new machine learning estimator for ordered choice models based on the random forest. The proposed Ordered Forest flexibly estimates the conditional choice probabilities while taking the ordering information explicitly into account. In addition to common machine learning estimators, it enables the estimation of marginal effects as well as conducting inference and thus provides the same output as classical econometric estimators. An extensive simulation study reveals a good predictive performance, particularly in settings with non-linearities and near-multicollinearity. An empirical application contrasts the estimation of marginal effects and their standard errors with an ordered logit model. A software implementation of the Ordered Forest is provided both in R and Python in the package orf available on CRAN and PyPI, respectively.

Suggested Citation

  • Michael Lechner & Gabriel Okasa, 2019. "Random Forest Estimation of the Ordered Choice Model," Papers 1907.02436, arXiv.org, revised Sep 2022.
  • Handle: RePEc:arx:papers:1907.02436
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1907.02436
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Matzkin, Rosa L, 1992. "Nonparametric and Distribution-Free Estimation of the Binary Threshold Crossing and the Binary Choice Models," Econometrica, Econometric Society, vol. 60(2), pages 239-270, March.
    2. J. S. Butler & T. Aldrich Finegan & John J. Siegfried, 1998. "Does more calculus improve student learning in intermediate micro- and macroeconomic theory?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 13(2), pages 185-202.
    3. Michael Lechner, 2002. "Program Heterogeneity And Propensity Score Matching: An Application To The Evaluation Of Active Labor Market Policies," The Review of Economics and Statistics, MIT Press, vol. 84(2), pages 205-220, May.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. Powell, James L. & Stoker, Thomas M., 1996. "Optimal bandwidth choice for density-weighted averages," Journal of Econometrics, Elsevier, vol. 75(2), pages 291-316, December.
    6. Seunghoon Kim & Youngbin Lym & Ki-Jung Kim, 2021. "Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers," IJERPH, MDPI, vol. 18(4), pages 1-23, February.
    7. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    8. Greene,William H. & Hensher,David A., 2010. "Modeling Ordered Choices," Cambridge Books, Cambridge University Press, number 9780521194204, July.
    9. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    10. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    11. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    12. Stefan Boes & Rainer Winkelmann, 2006. "Ordered Response Models," Springer Books, in: Olaf Hübler & Jachim Frohn (ed.), Modern Econometric Analysis, chapter 12, pages 167-181, Springer.
    13. repec:pri:cheawb:case_paxson_economic_status_paper is not listed on IDEAS
    14. Jeremy T. Fox, 2007. "Semiparametric estimation of multinomial discrete-choice models using a subset of choices," RAND Journal of Economics, RAND Corporation, vol. 38(4), pages 1002-1019, December.
    15. Stewart, Mark B., 2005. "A comparison of semiparametric estimators for the ordered response model," Computational Statistics & Data Analysis, Elsevier, vol. 49(2), pages 555-573, April.
    16. Raffaella Piccarreta, 2008. "Classification trees for ordinal variables," Computational Statistics, Springer, vol. 23(3), pages 407-427, July.
    17. Anne Case & Darren Lubotsky & Christina Paxson, 2002. "Economic Status and Health in Childhood: The Origins of the Gradient," American Economic Review, American Economic Association, vol. 92(5), pages 1308-1334, December.
    18. Lee, Lung-fei, 1995. "Semiparametric maximum likelihood estimation of polychotomous and sequential choice models," Journal of Econometrics, Elsevier, vol. 65(2), pages 381-428, February.
    19. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    20. Daniel Goller & Michael C. Knaus & Michael Lechner & Gabriel Okasa, 2021. "Predicting match outcomes in football by an Ordered Forest estimator," Chapters, in: Ruud H. Koning & Stefan Kesenne (ed.), A Modern Guide to Sports Economics, chapter 22, pages 335-355, Edward Elgar Publishing.
    21. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    22. Antonio Afonso & Pedro Gomes & Philipp Rother, 2009. "Ordered response models for sovereign debt ratings," Applied Economics Letters, Taylor & Francis Journals, vol. 16(8), pages 769-773.
    23. repec:pri:cheawb:case_paxson_economic_status_paper.pdf is not listed on IDEAS
    24. Lewbel, Arthur, 2000. "Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables," Journal of Econometrics, Elsevier, vol. 97(1), pages 145-177, July.
    25. Boes, Stefan & Staub, Kevin & Winkelmann, Rainer, 2010. "Relative status and satisfaction," Economics Letters, Elsevier, vol. 109(3), pages 168-170, December.
    26. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    27. Racine, Jeffrey S., 2008. "Nonparametric Econometrics: A Primer," Foundations and Trends(R) in Econometrics, now publishers, vol. 3(1), pages 1-88, March.
    28. Klein, Roger W & Spady, Richard H, 1993. "An Efficient Semiparametric Estimator for Binary Response Models," Econometrica, Econometric Society, vol. 61(2), pages 387-421, March.
    29. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    30. Lin, Zhongjian & Li, Qi & Sun, Yiguo, 2014. "A consistent nonparametric test of parametric regression functional form in fixed effects panel data models," Journal of Econometrics, Elsevier, vol. 178(P1), pages 167-179.
    31. Stoker, Thomas M., 1996. "Smoothing bias in the measurement of marginal effects," Journal of Econometrics, Elsevier, vol. 72(1-2), pages 49-84.
    32. Roger W. Klein & Robert P. Sherman, 2002. "Shift Restrictions and Semiparametric Estimation in Ordered Response Models," Econometrica, Econometric Society, vol. 70(2), pages 663-691, March.
    33. Janitza, Silke & Tutz, Gerhard & Boulesteix, Anne-Laure, 2016. "Random forest for ordinal responses: Prediction and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 57-73.
    34. Alberto Abadie & Guido W. Imbens, 2006. "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, Econometric Society, vol. 74(1), pages 235-267, January.
    35. Constantinou Anthony Costa & Fenton Norman Elliott, 2012. "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(1), pages 1-14, March.
    36. Gerhard Tutz, 2022. "Ordinal Trees and Random Forests: Score-Free Recursive Partitioning and Improved Ensembles," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 241-263, July.
    37. Murasko, Jason E., 2008. "An evaluation of the age-profile in the relationship between household income and the health of children in the United States," Journal of Health Economics, Elsevier, vol. 27(6), pages 1489-1502, December.
    38. Jeffrey Racine, 2008. "Nonparametric econometrics: a primer (in Russian)," Quantile, Quantile, issue 4, pages 7-56, March.
    39. Young S. Kwon & Ingoo Han & Kun Chang Lee, 1997. "Ordinal Pairwise Partitioning (OPP) Approach to Neural Networks Training in Bond rating," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 6(1), pages 23-40, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Franziska Braschke & Patrick A. Puhani, 2023. "Population Adjustment to Asymmetric Labour Market Shocks in India: A Comparison to Europe and the United States at Two Different Regional Levels," The Indian Journal of Labour Economics, Springer;The Indian Society of Labour Economics (ISLE), vol. 66(1), pages 7-35, March.
    2. Qinglong Shao, 2022. "Does less working time improve life satisfaction? Evidence from European Social Survey," Health Economics Review, Springer, vol. 12(1), pages 1-18, December.
    3. Daniel Goller & Sandro Heiniger, 2024. "A general framework to quantify the event importance in multi-event contests," Annals of Operations Research, Springer, vol. 341(1), pages 71-93, October.
    4. Michael M. Lokshin & Hannon,Michael & Miguel Purroy & Ivan Torre, 2024. "Do More Informed Citizens Make Better Climate Policy Decisions ?," Policy Research Working Paper Series 10921, The World Bank.
    5. Wang, Shixuan & Syntetos, Aris A. & Liu, Ying & Di Cairano-Gilfedder, Carla & Naim, Mohamed M., 2023. "Improving automotive garage operations by categorical forecasts using a large number of variables," European Journal of Operational Research, Elsevier, vol. 306(2), pages 893-908.
    6. Riccardo Di Francesco, 2023. "Ordered Correlation Forest," Papers 2309.08755, arXiv.org.
    7. Paul S. Clarke & Annalivia Polselli, 2023. "Double Machine Learning for Static Panel Models with Fixed Effects," Papers 2312.08174, arXiv.org, revised Dec 2024.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    2. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    3. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    4. William H. Greene & David A. Hensher, 2008. "Modeling Ordered Choices: A Primer and Recent Developments," Working Papers 08-26, New York University, Leonard N. Stern School of Business, Department of Economics.
    5. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    6. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    7. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    8. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    9. Yingying Dong & Arthur Lewbel, 2015. "A Simple Estimator for Binary Choice Models with Endogenous Regressors," Econometric Reviews, Taylor & Francis Journals, vol. 34(1-2), pages 82-105, February.
    10. Ackerberg, Daniel A. & Xu, Haiqing, 2024. "On extending Powell, Stock, and Stoker (1989) to indexes with functionally dependent covariates," Economics Letters, Elsevier, vol. 242(C).
    11. Daniel Boller & Michael Lechner & Gabriel Okasa, 2021. "The Effect of Sport in Online Dating: Evidence from Causal Machine Learning," Papers 2104.04601, arXiv.org.
    12. Sayed Alim Samim & Zhiquan Hu & Sebastian Stepien & Sayed Younus Amini & Ramin Rayee & Kunyu Niu & George Mgendi, 2021. "Food Insecurity and Related Factors among Farming Families in Takhar Region, Afghanistan," Sustainability, MDPI, vol. 13(18), pages 1-17, September.
    13. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    14. Hoderlein, Stefan & Sherman, Robert, 2015. "Identification and estimation in a correlated random coefficients binary response model," Journal of Econometrics, Elsevier, vol. 188(1), pages 135-149.
    15. Giuseppe De Luca & Valeria Perotti, 2011. "Estimation of ordered response models with sample selection," Stata Journal, StataCorp LLC, vol. 11(2), pages 213-239, June.
    16. Jonathan A. Cook & Saad Siddiqui, 2020. "Random forests and selected samples," Bulletin of Economic Research, Wiley Blackwell, vol. 72(3), pages 272-287, July.
    17. Kayo Murakami & Hideki Shimada & Yoshiaki Ushifusa & Takanori Ida, 2022. "Heterogeneous Treatment Effects Of Nudge And Rebate: Causal Machine Learning In A Field Experiment On Electricity Conservation," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1779-1803, November.
    18. Andree,Bo Pieter Johannes & Chamorro Elizondo,Andres Fernando & Kraay,Aart C. & Spencer,Phoebe Girouard & Wang,Dieter, 2020. "Predicting Food Crises," Policy Research Working Paper Series 9412, The World Bank.
    19. Taisuke Otsu & Mengshan Xu, 2022. "Isotonic propensity score matching," STICERD - Econometrics Paper Series 623, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    20. Yan, Jin & Yoo, Hong Il, 2019. "Semiparametric estimation of the random utility model with rank-ordered choice data," Journal of Econometrics, Elsevier, vol. 211(2), pages 414-438.

    More about this item

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1907.02436. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.