IDEAS home Printed from https://ideas.repec.org/p/usg/econwp/201908.html
   My bibliography  Save this paper

Random Forest Estimation of the Ordered Choice Model

Author

Listed:
  • Lechner, Michael
  • Okasa, Gabriel

Abstract

In econometrics so-called ordered choice models are popular when interest is in the estimation of the probabilities of particular values of categorical outcome variables with an inherent ordering, conditional on covariates. In this paper we develop a new machine learning estimator based on the random forest algorithm for such models without imposing any distributional assumptions. The proposed Ordered Forest estimator provides a flexible estimation method of the conditional choice probabilities that can naturally deal with nonlinearities in the data, while taking the ordering information explicitly into account. In addition to common machine learning estimators, it enables the estimation of marginal effects as well as conducting inference thereof and thus providing the same output as classical econometric estimators based on ordered logit or probit models. An extensive simulation study examines the finite sample properties of the Ordered Forest and reveals its good predictive performance, particularly in settings with multicollinearity among the predictors and nonlinear functional forms. An empirical application further illustrates the estimation of the marginal effects and their standard errors and demonstrates the advantages of the flexible estimation compared to a parametric benchmark model.

Suggested Citation

  • Lechner, Michael & Okasa, Gabriel, 2019. "Random Forest Estimation of the Ordered Choice Model," Economics Working Paper Series 1908, University of St. Gallen, School of Economics and Political Science.
  • Handle: RePEc:usg:econwp:2019:08
    as

    Download full text from publisher

    File URL: http://ux-tauri.unisg.ch/RePEc/usg/econwp/EWP-1908.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. J. S. Butler & T. Aldrich Finegan & John J. Siegfried, 1998. "Does more calculus improve student learning in intermediate micro- and macroeconomic theory?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 13(2), pages 185-202.
    2. Michael Lechner, 2002. "Program Heterogeneity And Propensity Score Matching: An Application To The Evaluation Of Active Labor Market Policies," The Review of Economics and Statistics, MIT Press, vol. 84(2), pages 205-220, May.
    3. Seunghoon Kim & Youngbin Lym & Ki-Jung Kim, 2021. "Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers," IJERPH, MDPI, vol. 18(4), pages 1-23, February.
    4. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    5. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    6. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    7. Stefan Boes & Rainer Winkelmann, 2006. "Ordered Response Models," Springer Books, in: Olaf Hübler & Jachim Frohn (ed.), Modern Econometric Analysis, chapter 12, pages 167-181, Springer.
    8. repec:pri:cheawb:case_paxson_economic_status_paper is not listed on IDEAS
    9. Jeremy T. Fox, 2007. "Semiparametric estimation of multinomial discrete-choice models using a subset of choices," RAND Journal of Economics, RAND Corporation, vol. 38(4), pages 1002-1019, December.
    10. Stewart, Mark B., 2005. "A comparison of semiparametric estimators for the ordered response model," Computational Statistics & Data Analysis, Elsevier, vol. 49(2), pages 555-573, April.
    11. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    12. Raffaella Piccarreta, 2008. "Classification trees for ordinal variables," Computational Statistics, Springer, vol. 23(3), pages 407-427, July.
    13. Lee, Lung-fei, 1995. "Semiparametric maximum likelihood estimation of polychotomous and sequential choice models," Journal of Econometrics, Elsevier, vol. 65(2), pages 381-428, February.
    14. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    15. Daniel Goller & Michael C. Knaus & Michael Lechner & Gabriel Okasa, 2021. "Predicting match outcomes in football by an Ordered Forest estimator," Chapters, in: Ruud H. Koning & Stefan Kesenne (ed.), A Modern Guide to Sports Economics, chapter 22, pages 335-355, Edward Elgar Publishing.
    16. Antonio Afonso & Pedro Gomes & Philipp Rother, 2009. "Ordered response models for sovereign debt ratings," Applied Economics Letters, Taylor & Francis Journals, vol. 16(8), pages 769-773.
    17. repec:pri:cheawb:case_paxson_economic_status_paper.pdf is not listed on IDEAS
    18. Lewbel, Arthur, 2000. "Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables," Journal of Econometrics, Elsevier, vol. 97(1), pages 145-177, July.
    19. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    20. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    21. Roger W. Klein & Robert P. Sherman, 2002. "Shift Restrictions and Semiparametric Estimation in Ordered Response Models," Econometrica, Econometric Society, vol. 70(2), pages 663-691, March.
    22. Constantinou Anthony Costa & Fenton Norman Elliott, 2012. "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(1), pages 1-14, March.
    23. Jeffrey Racine, 2008. "Nonparametric econometrics: a primer (in Russian)," Quantile, Quantile, issue 4, pages 7-56, March.
    24. Matzkin, Rosa L, 1992. "Nonparametric and Distribution-Free Estimation of the Binary Threshold Crossing and the Binary Choice Models," Econometrica, Econometric Society, vol. 60(2), pages 239-270, March.
    25. Powell, James L. & Stoker, Thomas M., 1996. "Optimal bandwidth choice for density-weighted averages," Journal of Econometrics, Elsevier, vol. 75(2), pages 291-316, December.
    26. Greene,William H. & Hensher,David A., 2010. "Modeling Ordered Choices," Cambridge Books, Cambridge University Press, number 9780521142373, Enero.
    27. Anne Case & Darren Lubotsky & Christina Paxson, 2002. "Economic Status and Health in Childhood: The Origins of the Gradient," American Economic Review, American Economic Association, vol. 92(5), pages 1308-1334, December.
    28. Greene,William H. & Hensher,David A., 2010. "Modeling Ordered Choices," Cambridge Books, Cambridge University Press, number 9780521142373, July.
    29. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    30. Boes, Stefan & Staub, Kevin & Winkelmann, Rainer, 2010. "Relative status and satisfaction," Economics Letters, Elsevier, vol. 109(3), pages 168-170, December.
    31. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    32. Racine, Jeffrey S., 2008. "Nonparametric Econometrics: A Primer," Foundations and Trends(R) in Econometrics, now publishers, vol. 3(1), pages 1-88, March.
    33. Klein, Roger W & Spady, Richard H, 1993. "An Efficient Semiparametric Estimator for Binary Response Models," Econometrica, Econometric Society, vol. 61(2), pages 387-421, March.
    34. Lin, Zhongjian & Li, Qi & Sun, Yiguo, 2014. "A consistent nonparametric test of parametric regression functional form in fixed effects panel data models," Journal of Econometrics, Elsevier, vol. 178(P1), pages 167-179.
    35. Stoker, Thomas M., 1996. "Smoothing bias in the measurement of marginal effects," Journal of Econometrics, Elsevier, vol. 72(1-2), pages 49-84.
    36. Janitza, Silke & Tutz, Gerhard & Boulesteix, Anne-Laure, 2016. "Random forest for ordinal responses: Prediction and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 57-73.
    37. Alberto Abadie & Guido W. Imbens, 2006. "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, Econometric Society, vol. 74(1), pages 235-267, January.
    38. Gerhard Tutz, 2022. "Ordinal Trees and Random Forests: Score-Free Recursive Partitioning and Improved Ensembles," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 241-263, July.
    39. Murasko, Jason E., 2008. "An evaluation of the age-profile in the relationship between household income and the health of children in the United States," Journal of Health Economics, Elsevier, vol. 27(6), pages 1489-1502, December.
    40. Young S. Kwon & Ingoo Han & Kun Chang Lee, 1997. "Ordinal Pairwise Partitioning (OPP) Approach to Neural Networks Training in Bond rating," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 6(1), pages 23-40, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Paul S. Clarke & Annalivia Polselli, 2023. "Double Machine Learning for Static Panel Models with Fixed Effects," Papers 2312.08174, arXiv.org, revised Dec 2024.
    2. Qinglong Shao, 2022. "Does less working time improve life satisfaction? Evidence from European Social Survey," Health Economics Review, Springer, vol. 12(1), pages 1-18, December.
    3. Daniel Goller & Sandro Heiniger, 2024. "A general framework to quantify the event importance in multi-event contests," Annals of Operations Research, Springer, vol. 341(1), pages 71-93, October.
    4. Franziska Braschke & Patrick A. Puhani, 2023. "Population Adjustment to Asymmetric Labour Market Shocks in India: A Comparison to Europe and the United States at Two Different Regional Levels," The Indian Journal of Labour Economics, Springer;The Indian Society of Labour Economics (ISLE), vol. 66(1), pages 7-35, March.
    5. Wang, Shixuan & Syntetos, Aris A. & Liu, Ying & Di Cairano-Gilfedder, Carla & Naim, Mohamed M., 2023. "Improving automotive garage operations by categorical forecasts using a large number of variables," European Journal of Operational Research, Elsevier, vol. 306(2), pages 893-908.
    6. Riccardo Di Francesco, 2023. "Ordered Correlation Forest," Papers 2309.08755, arXiv.org.
    7. Michael M. Lokshin & Hannon,Michael & Miguel Purroy & Ivan Torre, 2024. "Do More Informed Citizens Make Better Climate Policy Decisions ?," Policy Research Working Paper Series 10921, The World Bank.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    2. Sayed Alim Samim & Zhiquan Hu & Sebastian Stepien & Sayed Younus Amini & Ramin Rayee & Kunyu Niu & George Mgendi, 2021. "Food Insecurity and Related Factors among Farming Families in Takhar Region, Afghanistan," Sustainability, MDPI, vol. 13(18), pages 1-17, September.
    3. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    4. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    5. William H. Greene & David A. Hensher, 2008. "Modeling Ordered Choices: A Primer and Recent Developments," Working Papers 08-26, New York University, Leonard N. Stern School of Business, Department of Economics.
    6. Xi Wang & Songnian Chen, 2022. "Partial Identification and Estimation of Semiparametric Ordered Response Models with Interval Regressor Data," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 84(4), pages 830-849, August.
    7. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    8. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    9. Stefan Boes, 2013. "Nonparametric analysis of treatment effects in ordered response models," Empirical Economics, Springer, vol. 44(1), pages 81-109, February.
    10. William H. Greene & Mark N. Harris & Rachel J. Knott & Nigel Rice, 2021. "Specification and testing of hierarchical ordered response models with anchoring vignettes," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 31-64, January.
    11. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    12. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    13. Yingying Dong & Arthur Lewbel, 2015. "A Simple Estimator for Binary Choice Models with Endogenous Regressors," Econometric Reviews, Taylor & Francis Journals, vol. 34(1-2), pages 82-105, February.
    14. Ackerberg, Daniel A. & Xu, Haiqing, 2024. "On extending Powell, Stock, and Stoker (1989) to indexes with functionally dependent covariates," Economics Letters, Elsevier, vol. 242(C).
    15. Daniel Boller & Michael Lechner & Gabriel Okasa, 2021. "The Effect of Sport in Online Dating: Evidence from Causal Machine Learning," Papers 2104.04601, arXiv.org.
    16. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    17. Hoderlein, Stefan & Sherman, Robert, 2015. "Identification and estimation in a correlated random coefficients binary response model," Journal of Econometrics, Elsevier, vol. 188(1), pages 135-149.
    18. Giuseppe De Luca & Valeria Perotti, 2011. "Estimation of ordered response models with sample selection," Stata Journal, StataCorp LLC, vol. 11(2), pages 213-239, June.
    19. Rosati, Nicoletta & Bellia, Mario & Matos, Pedro Verga & Oliveira, Vasco, 2020. "Ratings matter: Announcements in times of crisis and the dynamics of stock markets," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 64(C).
    20. Jonathan A. Cook & Saad Siddiqui, 2020. "Random forests and selected samples," Bulletin of Economic Research, Wiley Blackwell, vol. 72(3), pages 272-287, July.

    More about this item

    Keywords

    Ordered choice models; random forests; probabilities; marginal effects; machine learning;
    All these keywords.

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:usg:econwp:2019:08. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/vwasgch.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.