IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1907.02436.html
   My bibliography  Save this paper

Random Forest Estimation of the Ordered Choice Model

Author

Listed:
  • Michael Lechner
  • Gabriel Okasa

Abstract

In this paper we develop a new machine learning estimator for ordered choice models based on the random forest. The proposed Ordered Forest flexibly estimates the conditional choice probabilities while taking the ordering information explicitly into account. In addition to common machine learning estimators, it enables the estimation of marginal effects as well as conducting inference and thus provides the same output as classical econometric estimators. An extensive simulation study reveals a good predictive performance, particularly in settings with non-linearities and near-multicollinearity. An empirical application contrasts the estimation of marginal effects and their standard errors with an ordered logit model. A software implementation of the Ordered Forest is provided both in R and Python in the package orf available on CRAN and PyPI, respectively.

Suggested Citation

  • Michael Lechner & Gabriel Okasa, 2019. "Random Forest Estimation of the Ordered Choice Model," Papers 1907.02436, arXiv.org, revised Sep 2022.
  • Handle: RePEc:arx:papers:1907.02436
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1907.02436
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Matzkin, Rosa L, 1992. "Nonparametric and Distribution-Free Estimation of the Binary Threshold Crossing and the Binary Choice Models," Econometrica, Econometric Society, vol. 60(2), pages 239-270, March.
    2. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    3. Powell, James L. & Stoker, Thomas M., 1996. "Optimal bandwidth choice for density-weighted averages," Journal of Econometrics, Elsevier, vol. 75(2), pages 291-316, December.
    4. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    5. Greene,William H. & Hensher,David A., 2010. "Modeling Ordered Choices," Cambridge Books, Cambridge University Press, number 9780521194204, January.
    6. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    7. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    8. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    9. Stefan Boes & Rainer Winkelmann, 2006. "Ordered Response Models," Springer Books, in: Olaf Hübler & Jachim Frohn (ed.), Modern Econometric Analysis, chapter 12, pages 167-181, Springer.
    10. Jeremy T. Fox, 2007. "Semiparametric estimation of multinomial discrete-choice models using a subset of choices," RAND Journal of Economics, RAND Corporation, vol. 38(4), pages 1002-1019, December.
    11. Stewart, Mark B., 2005. "A comparison of semiparametric estimators for the ordered response model," Computational Statistics & Data Analysis, Elsevier, vol. 49(2), pages 555-573, April.
    12. Raffaella Piccarreta, 2008. "Classification trees for ordinal variables," Computational Statistics, Springer, vol. 23(3), pages 407-427, July.
    13. Lee, Lung-fei, 1995. "Semiparametric maximum likelihood estimation of polychotomous and sequential choice models," Journal of Econometrics, Elsevier, vol. 65(2), pages 381-428, February.
    14. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    15. Daniel Goller & Michael C. Knaus & Michael Lechner & Gabriel Okasa, 2021. "Predicting match outcomes in football by an Ordered Forest estimator," Chapters, in: Ruud H. Koning & Stefan Kesenne (ed.), A Modern Guide to Sports Economics, chapter 22, pages 335-355, Edward Elgar Publishing.
    16. Antonio Afonso & Pedro Gomes & Philipp Rother, 2009. "Ordered response models for sovereign debt ratings," Applied Economics Letters, Taylor & Francis Journals, vol. 16(8), pages 769-773.
    17. Lewbel, Arthur, 2000. "Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables," Journal of Econometrics, Elsevier, vol. 97(1), pages 145-177, July.
    18. Racine, Jeffrey S., 2008. "Nonparametric Econometrics: A Primer," Foundations and Trends(R) in Econometrics, now publishers, vol. 3(1), pages 1-88, March.
    19. Klein, Roger W & Spady, Richard H, 1993. "An Efficient Semiparametric Estimator for Binary Response Models," Econometrica, Econometric Society, vol. 61(2), pages 387-421, March.
    20. Lin, Zhongjian & Li, Qi & Sun, Yiguo, 2014. "A consistent nonparametric test of parametric regression functional form in fixed effects panel data models," Journal of Econometrics, Elsevier, vol. 178(P1), pages 167-179.
    21. Stoker, Thomas M., 1996. "Smoothing bias in the measurement of marginal effects," Journal of Econometrics, Elsevier, vol. 72(1-2), pages 49-84.
    22. Roger W. Klein & Robert P. Sherman, 2002. "Shift Restrictions and Semiparametric Estimation in Ordered Response Models," Econometrica, Econometric Society, vol. 70(2), pages 663-691, March.
    23. Janitza, Silke & Tutz, Gerhard & Boulesteix, Anne-Laure, 2016. "Random forest for ordinal responses: Prediction and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 57-73.
    24. Constantinou Anthony Costa & Fenton Norman Elliott, 2012. "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(1), pages 1-14, March.
    25. Alberto Abadie & Guido W. Imbens, 2006. "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, Econometric Society, vol. 74(1), pages 235-267, January.
    26. Jeffrey Racine, 2008. "Nonparametric econometrics: a primer (in Russian)," Quantile, Quantile, issue 4, pages 7-56, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Franziska Braschke & Patrick A. Puhani, 2023. "Population Adjustment to Asymmetric Labour Market Shocks in India: A Comparison to Europe and the United States at Two Different Regional Levels," The Indian Journal of Labour Economics, Springer;The Indian Society of Labour Economics (ISLE), vol. 66(1), pages 7-35, March.
    2. Qinglong Shao, 2022. "Does less working time improve life satisfaction? Evidence from European Social Survey," Health Economics Review, Springer, vol. 12(1), pages 1-18, December.
    3. Daniel Goller & Sandro Heiniger, 2024. "A general framework to quantify the event importance in multi-event contests," Annals of Operations Research, Springer, vol. 341(1), pages 71-93, October.
    4. Wang, Shixuan & Syntetos, Aris A. & Liu, Ying & Di Cairano-Gilfedder, Carla & Naim, Mohamed M., 2023. "Improving automotive garage operations by categorical forecasts using a large number of variables," European Journal of Operational Research, Elsevier, vol. 306(2), pages 893-908.
    5. Riccardo Di Francesco, 2023. "Ordered Correlation Forest," Papers 2309.08755, arXiv.org.
    6. Paul S. Clarke & Annalivia Polselli, 2023. "Double Machine Learning for Static Panel Models with Fixed Effects," Papers 2312.08174, arXiv.org, revised Dec 2024.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    2. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    3. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    4. Yingying Dong & Arthur Lewbel, 2015. "A Simple Estimator for Binary Choice Models with Endogenous Regressors," Econometric Reviews, Taylor & Francis Journals, vol. 34(1-2), pages 82-105, February.
    5. Sayed Alim Samim & Zhiquan Hu & Sebastian Stepien & Sayed Younus Amini & Ramin Rayee & Kunyu Niu & George Mgendi, 2021. "Food Insecurity and Related Factors among Farming Families in Takhar Region, Afghanistan," Sustainability, MDPI, vol. 13(18), pages 1-17, September.
    6. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    7. Hoderlein, Stefan & Sherman, Robert, 2015. "Identification and estimation in a correlated random coefficients binary response model," Journal of Econometrics, Elsevier, vol. 188(1), pages 135-149.
    8. Giuseppe De Luca & Valeria Perotti, 2011. "Estimation of ordered response models with sample selection," Stata Journal, StataCorp LP, vol. 11(2), pages 213-239, June.
    9. Jonathan A. Cook & Saad Siddiqui, 2020. "Random forests and selected samples," Bulletin of Economic Research, Wiley Blackwell, vol. 72(3), pages 272-287, July.
    10. Taisuke Otsu & Mengshan Xu, 2022. "Isotonic propensity score matching," STICERD - Econometrics Paper Series 623, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    11. Yan, Jin & Yoo, Hong Il, 2019. "Semiparametric estimation of the random utility model with rank-ordered choice data," Journal of Econometrics, Elsevier, vol. 211(2), pages 414-438.
    12. William H. Greene & David A. Hensher, 2008. "Modeling Ordered Choices: A Primer and Recent Developments," Working Papers 08-26, New York University, Leonard N. Stern School of Business, Department of Economics.
    13. Yixiao Jiang, 2021. "Semiparametric Estimation of a Corporate Bond Rating Model," Econometrics, MDPI, vol. 9(2), pages 1-20, May.
    14. Youmi Suk & Hyunseung Kang, 2022. "Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 310-343, March.
    15. Xi Wang & Songnian Chen, 2022. "Partial Identification and Estimation of Semiparametric Ordered Response Models with Interval Regressor Data," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 84(4), pages 830-849, August.
    16. Paul S. Clarke & Annalivia Polselli, 2023. "Double Machine Learning for Static Panel Models with Fixed Effects," Papers 2312.08174, arXiv.org, revised Dec 2024.
    17. Mengshan Xu & Taisuke Otsu, 2022. "Isotonic propensity score matching," Papers 2207.08868, arXiv.org, revised Aug 2024.
    18. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    19. Ichimura, Hidehiko & Todd, Petra E., 2007. "Implementing Nonparametric and Semiparametric Estimators," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 74, Elsevier.
    20. Roberto Martino & Phu Nguyen-Van, 2014. "Labour market regulation and fiscal parameters: A structural model for European regions," Working Papers of BETA 2014-19, Bureau d'Economie Théorique et Appliquée, UDS, Strasbourg.

    More about this item

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1907.02436. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.