IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1610.01271.html
   My bibliography  Save this paper

Generalized Random Forests

Author

Listed:
  • Susan Athey
  • Julie Tibshirani
  • Stefan Wager

Abstract

We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.

Suggested Citation

  • Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
  • Handle: RePEc:arx:papers:1610.01271
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1610.01271
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Achim Zeileis, 2005. "A Unified Approach to Structural Change Tests Based on ML Scores, F Statistics, and OLS Residuals," Econometric Reviews, Taylor & Francis Journals, vol. 24(4), pages 445-466.
    2. S. Darolles & Y. Fan & J. P. Florens & E. Renault, 2011. "Nonparametric Instrumental Regression," Econometrica, Econometric Society, vol. 79(5), pages 1541-1565, September.
    3. Angrist, Joshua D, 1990. "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records," American Economic Review, American Economic Association, vol. 80(3), pages 313-336, June.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    6. Imbens, Guido W & Angrist, Joshua D, 1994. "Identification and Estimation of Local Average Treatment Effects," Econometrica, Econometric Society, vol. 62(2), pages 467-475, March.
    7. Newey, Whitney K., 1994. "Kernel Estimation of Partial Means and a General Variance Estimator," Econometric Theory, Cambridge University Press, vol. 10(2), pages 1-21, June.
    8. Abadie, Alberto, 2003. "Semiparametric instrumental variable estimation of treatment response models," Journal of Econometrics, Elsevier, vol. 113(2), pages 231-263, April.
    9. Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504.
    10. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey, 2016. "Double machine learning for treatment and causal parameters," CeMMAP working papers CWP49/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    11. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    12. Angrist, Joshua D & Evans, William N, 1998. "Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size," American Economic Review, American Economic Association, vol. 88(3), pages 450-477, June.
    13. Ishwaran, Hemant & Kogalur, Udaya B., 2010. "Consistency of random survival forests," Statistics & Probability Letters, Elsevier, vol. 80(13-14), pages 1056-1064, July.
    14. Lewbel, Arthur, 2007. "A local generalized method of moments estimator," Economics Letters, Elsevier, vol. 94(1), pages 124-128, January.
    15. Ruoqing Zhu & Donglin Zeng & Michael R. Kosorok, 2015. "Reinforcement Learning Trees," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1770-1784, December.
    16. Ploberger, Werner & Kramer, Walter, 1992. "The CUSUM Test with OLS Residuals," Econometrica, Econometric Society, vol. 60(2), pages 271-285, March.
    17. Newey, Whitney K, 1994. "The Asymptotic Variance of Semiparametric Estimators," Econometrica, Econometric Society, vol. 62(6), pages 1349-1382, November.
    18. Victor Chernozhukov & Ivan Fernandez-Val & Christian Hansen, 2013. "Program evaluation with high-dimensional data," CeMMAP working papers CWP57/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    19. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    20. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    21. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    22. Bradley Efron, 2014. "Estimation and Accuracy After Model Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 991-1007, September.
    23. Achim Zeileis & Kurt Hornik, 2007. "Generalized M‐fluctuation tests for parameter instability," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 61(4), pages 488-508, November.
    24. van der Laan Mark J. & Rubin Daniel, 2006. "Targeted Maximum Likelihood Learning," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-40, December.
    25. Sexton, Joseph & Laake, Petter, 2009. "Standard errors for bagged and random forest estimators," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 801-811, January.
    26. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    27. Andrews, Donald W K, 1993. "Tests for Parameter Instability and Structural Change with Unknown Change Point," Econometrica, Econometric Society, vol. 61(4), pages 821-856, July.
    28. Hansen, Bruce E., 1992. "Testing for parameter instability in linear models," Journal of Policy Modeling, Elsevier, vol. 14(4), pages 517-533, August.
    29. Biau, Gérard & Devroye, Luc, 2010. "On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification," Journal of Multivariate Analysis, Elsevier, vol. 101(10), pages 2499-2518, November.
    30. Liangjun Su & Irina Murtazashvili & Aman Ullah, 2013. "Local Linear GMM Estimation of Functional Coefficient IV Models With an Application to Estimating the Rate of Return to Schooling," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 31(2), pages 184-207, April.
    31. Angrist, Joshua D, 1990. "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records: Errata," American Economic Review, American Economic Association, vol. 80(5), pages 1284-1286, December.
    32. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    33. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    34. Bo E. Honoré & Ekaterini Kyriazidou, 2000. "Panel Data Discrete Choice Models with Lagged Dependent Variables," Econometrica, Econometric Society, vol. 68(4), pages 839-874, July.
    35. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    36. Ciampi, Antonio & Thiffault, Johanne & Nakache, Jean-Pierre & Asselain, Bernard, 1986. "Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 4(3), pages 185-204, October.
    37. Whitney K. Newey, 2013. "Nonparametric Instrumental Variables Estimation," American Economic Review, American Economic Association, vol. 103(3), pages 550-556, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    2. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    3. Frolich, Markus, 2007. "Nonparametric IV estimation of local average treatment effects with covariates," Journal of Econometrics, Elsevier, vol. 139(1), pages 35-75, July.
    4. Joshua D. Angrist, 2022. "Empirical Strategies in Economics: Illuminating the Path From Cause to Effect," Econometrica, Econometric Society, vol. 90(6), pages 2509-2539, November.
    5. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    6. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    7. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    8. Huber Martin & Wüthrich Kaspar, 2019. "Local Average and Quantile Treatment Effects Under Endogeneity: A Review," Journal of Econometric Methods, De Gruyter, vol. 8(1), pages 1-27, January.
    9. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    10. Manuel Denzer, 2019. "Estimating Causal Effects in Binary Response Models with Binary Endogenous Explanatory Variables - A Comparison of Possible Estimators," Working Papers 1916, Gutenberg School of Management and Economics, Johannes Gutenberg-Universität Mainz.
    11. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    12. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    13. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    14. Edgar Merkle & Achim Zeileis, 2013. "Tests of Measurement Invariance Without Subgroups: A Generalization of Classical Methods," Psychometrika, Springer;The Psychometric Society, vol. 78(1), pages 59-82, January.
    15. Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," Papers 2201.07072, arXiv.org, revised Apr 2023.
    16. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    17. Öberg, Stefan, 2021. "Treatment for natural experiments: How to improve causal estimates using conceptual definitions and substantive interpretations," SocArXiv pkyue, Center for Open Science.
    18. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    19. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    20. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1610.01271. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.