IDEAS home Printed from https://ideas.repec.org/a/eee/stapro/v151y2019icp49-57.html
   My bibliography  Save this article

Consistent estimation of residual variance with random forest Out-Of-Bag errors

Author

Listed:
  • Ramosaj, Burim
  • Pauly, Markus

Abstract

The issue of estimating residual variance in regression models with unknown and eventually complex link-function is still an open problem. Predictions of such outcomes are usually conducted by black-box procedures without seriously restricting the link-function class. However, quantifying uncertainty by means of residual variance estimators is of primary interest in many practical applications, e.g. as a primary step towards the construction of prediction intervals. Here, we consider this issue for the random forest. Therein, the functional relationship between covariates and response variable is modeled by a weighted sum of the latter. The dependence structure is, however, involved in the weights that are constructed during the tree construction process making the model complex in mathematical analysis. Restricting to L2-consistent random forest models, we provide random forest based residual variance estimators and prove their consistency.

Suggested Citation

  • Ramosaj, Burim & Pauly, Markus, 2019. "Consistent estimation of residual variance with random forest Out-Of-Bag errors," Statistics & Probability Letters, Elsevier, vol. 151(C), pages 49-57.
  • Handle: RePEc:eee:stapro:v:151:y:2019:i:c:p:49-57
    DOI: 10.1016/j.spl.2019.03.017
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167715219300938
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.spl.2019.03.017?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Scornet, Erwan, 2016. "On the asymptotics of random forests," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 72-83.
    3. Biau, Gérard & Devroye, Luc, 2010. "On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification," Journal of Multivariate Analysis, Elsevier, vol. 101(10), pages 2499-2518, November.
    4. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    5. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    2. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    3. Uguccioni, James, 2022. "The long-run effects of parental unemployment in childhood," CLEF Working Paper Series 45, Canadian Labour Economics Forum (CLEF), University of Waterloo.
    4. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    5. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    6. Jincheng Shen & Lu Wang & Jeremy M. G. Taylor, 2017. "Estimation of the optimal regime in treatment of prostate cancer recurrence from observational data using flexible weighting models," Biometrics, The International Biometric Society, vol. 73(2), pages 635-645, June.
    7. Raval, Devesh & Rosenbaum, Ted & Wilson, Nathan E., 2021. "How do machine learning algorithms perform in predicting hospital choices? evidence from changing environments," Journal of Health Economics, Elsevier, vol. 78(C).
    8. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    9. Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
    10. Yifei Sun & Sy Han Chiou & Mei‐Cheng Wang, 2020. "ROC‐guided survival trees and ensembles," Biometrics, The International Biometric Society, vol. 76(4), pages 1177-1189, December.
    11. Charles B. Perkins & J. Christina Wang, 2019. "How Magic a Bullet Is Machine Learning for Credit Analysis? An Exploration with FinTech Lending Data," Working Papers 19-16, Federal Reserve Bank of Boston.
    12. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    13. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    14. Guoyi Zhang & Yan Lu, 2012. "Bias-corrected random forests in regression," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(1), pages 151-160, March.
    15. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    16. William Arbour, 2021. "Can Recidivism be Prevented from Behind Bars? Evidence from a Behavioral Program," Working Papers tecipa-683, University of Toronto, Department of Economics.
    17. Dimitris Bertsimas & Agni Orfanoudaki & Rory B. Weiner, 2020. "Personalized treatment for coronary artery disease patients: a machine learning approach," Health Care Management Science, Springer, vol. 23(4), pages 482-506, December.
    18. Stephen Jarvis & Olivier Deschenes & Akshaya Jha, 2022. "The Private and External Costs of Germany’s Nuclear Phase-Out," Journal of the European Economic Association, European Economic Association, vol. 20(3), pages 1311-1346.
    19. Hayakawa, Kazunobu & Keola, Souknilanh & Silaphet, Korrakoun & Yamanouchi, Kenta, 2022. "Estimating the impacts of international bridges on foreign firm locations: a machine learning approach," IDE Discussion Papers 847, Institute of Developing Economies, Japan External Trade Organization(JETRO).
    20. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:stapro:v:151:y:2019:i:c:p:49-57. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.