IDEAS home Printed from https://ideas.repec.org/a/bla/buecrs/v72y2020i3p272-287.html

Random forests and selected samples

Author

Listed:
  • Jonathan A. Cook
  • Saad Siddiqui

Abstract

This paper presents a procedure for recovering causal coefficients from selected samples that uses random forests, a popular machine‐learning algorithm. This proposed method makes few assumptions regarding the selection equation and the distribution of the error terms. Our Monte Carlo results indicate that our method performs well, even when the selection and outcome equations contain the same variables, as long as the selection equation is nonlinear. The method can also be used when there are many variables in the selection equation. We also compare the results of our procedure with other parametric and semiparametric methods using real data.

Suggested Citation

  • Jonathan A. Cook & Saad Siddiqui, 2020. "Random forests and selected samples," Bulletin of Economic Research, Wiley Blackwell, vol. 72(3), pages 272-287, July.
  • Handle: RePEc:bla:buecrs:v:72:y:2020:i:3:p:272-287
    DOI: 10.1111/boer.12222
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/boer.12222
    Download Restriction: no

    File URL: https://libkey.io/10.1111/boer.12222?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robert Jonsson, 2012. "When does Heckman’s two-step procedure for censored data work and when does it not?," Statistical Papers, Springer, vol. 53(1), pages 33-49, February.
    2. Robinson, Peter M, 1982. "On the Asymptotic Properties of Estimators of Models Containing Limited Dependent Variables," Econometrica, Econometric Society, vol. 50(1), pages 27-41, January.
    3. Richard Blundell & Alan Duncan, 1998. "Kernel Regression in Empirical Microeconomics," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 62-87.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    6. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    7. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    8. Newey, Whitney K & Powell, James L & Walker, James R, 1990. "Semiparametric Estimation of Selection Models: Some Empirical Results," American Economic Review, American Economic Association, vol. 80(2), pages 324-328, May.
    9. Thomas W. Zuehlke, 2017. "Use of quadratic terms in Type 2 Tobit models," Applied Economics, Taylor & Francis Journals, vol. 49(17), pages 1706-1714, April.
    10. Maria Fraga O. Martins, 2001. "Parametric and semiparametric estimation of sample selection models: an empirical application to the female labour force in Portugal," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(1), pages 23-39.
    11. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    12. Arabmazar, Abbas & Schmidt, Peter, 1982. "An Investigation of the Robustness of the Tobit Estimator to Non-Normality," Econometrica, Econometric Society, vol. 50(4), pages 1055-1063, July.
    13. Klein, Roger W & Spady, Richard H, 1993. "An Efficient Semiparametric Estimator for Binary Response Models," Econometrica, Econometric Society, vol. 61(2), pages 387-421, March.
    14. Jonathan A. Cook & Fred Gale, 2019. "Using food prices and consumption to examine Chinese cost of living," Pacific Economic Review, Wiley Blackwell, vol. 24(1), pages 3-26, February.
    15. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Quiroga Gutierrez, Ana Cecilia, 2024. "Picture this: Making health insurance choices easier for those who need it," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 111(C).
    2. Wayne Taylor & Brett Hollenbeck, 2021. "Leveraging loyalty programs using competitor based targeting," Quantitative Marketing and Economics (QME), Springer, vol. 19(3), pages 417-455, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ichimura, Hidehiko & Todd, Petra E., 2007. "Implementing Nonparametric and Semiparametric Estimators," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 74, Elsevier.
    2. Pigini Claudia, 2015. "Bivariate Non-Normality in the Sample Selection Model," Journal of Econometric Methods, De Gruyter, vol. 4(1), pages 123-144, January.
    3. Schafgans, Marcia M. A., 2000. "Gender wage differences in Malaysia: parametric and semiparametric estimation," Journal of Development Economics, Elsevier, vol. 63(2), pages 351-378, December.
    4. Huber, Martin & Melly, Blaise, 2011. "Quantile Regression in the Presence of Sample Selection," Economics Working Paper Series 1109, University of St. Gallen, School of Economics and Political Science.
    5. Michael Lechner & Gabriel Okasa, 2025. "Random Forest estimation of the ordered choice model," Empirical Economics, Springer, vol. 68(1), pages 1-106, January.
    6. Shi, Ruoyao, 2024. "An Averaging Estimator For Two-Step M-Estimation In Semiparametric Models," Econometric Theory, Cambridge University Press, vol. 40(3), pages 652-687, June.
    7. Claudia PIGINI, 2012. "Of Butterflies and Caterpillars: Bivariate Normality in the Sample Selection Model," Working Papers 377, Universita' Politecnica delle Marche (I), Dipartimento di Scienze Economiche e Sociali.
    8. Kea BARET, 2021. "Fiscal rules’ compliance and Social Welfare," Working Papers of BETA 2021-38, Bureau d'Economie Théorique et Appliquée, UDS, Strasbourg.
    9. Sonia Bhalotra & Claudia Sanhueza, 2004. "Parametric and Semi-parametric Estimations of the Return to Schooling in South Africa," Econometric Society 2004 Latin American Meetings 294, Econometric Society.
    10. Khashayar Khosravi & Greg Lewis & Vasilis Syrgkanis, 2019. "Non-Parametric Inference Adaptive to Intrinsic Dimension," Papers 1901.03719, arXiv.org, revised Jun 2019.
    11. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    12. Dylan Brewer & Alyssa Carlson, 2024. "Addressing sample selection bias for machine learning methods," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(3), pages 383-400, April.
    13. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    14. Augusto Cerqua & Marco Letta & Gabriele Pinto, 2024. "On the (Mis)Use of Machine Learning with Panel Data," Papers 2411.09218, arXiv.org, revised May 2025.
    15. Michela Bia & Martin Huber & Lukáš Lafférs, 2024. "Double Machine Learning for Sample Selection Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 958-969, July.
    16. Miruna Oprescu & Vasilis Syrgkanis & Zhiwei Steven Wu, 2018. "Orthogonal Random Forest for Causal Inference," Papers 1806.03467, arXiv.org, revised Sep 2019.
    17. Tatsuru Kikuchi, 2025. "Stochastic Boundaries in Spatial General Equilibrium: A Diffusion-Based Approach to Causal Inference with Spillover Effects," Papers 2508.06594, arXiv.org.
    18. Huffman, Sonya Kostova, 1999. "Changes of household consumption behavior during the transition from centrally-planned to market-oriented economy," ISU General Staff Papers 1999010108000013568, Iowa State University, Department of Economics.
    19. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    20. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:buecrs:v:72:y:2020:i:3:p:272-287. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0307-3378 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.