IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2511.13934.html

Empirical Likelihood for Random Forests and Ensembles

Author

Listed:
  • Harold D. Chiang
  • Yukitoshi Matsushita
  • Taisuke Otsu

Abstract

We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.

Suggested Citation

  • Harold D. Chiang & Yukitoshi Matsushita & Taisuke Otsu, 2025. "Empirical Likelihood for Random Forests and Ensembles," Papers 2511.13934, arXiv.org.
  • Handle: RePEc:arx:papers:2511.13934
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2511.13934
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Emre Demirkaya & Yingying Fan & Lan Gao & Jinchi Lv & Patrick Vossler & Jingbo Wang, 2024. "Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(545), pages 297-307, January.
    2. Scornet, Erwan, 2016. "On the asymptotics of random forests," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 72-83.
    3. Yuichi Kitamura, 2006. "Empirical Likelihood Methods in Econometrics: Theory and Practice," Levine's Bibliography 321307000000000307, UCLA Department of Economics.
    4. Lixing Zhu & Liugen Xue, 2006. "Empirical likelihood confidence regions in a partially linear single‐index model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(3), pages 549-570, June.
    5. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    6. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    7. Jing, Bing-Yi & Yuan, Junqing & Zhou, Wang, 2009. "Jackknife Empirical Likelihood," Journal of the American Statistical Association, American Statistical Association, vol. 104(487), pages 1224-1232.
    8. Bravo, Francesco & Juan Carlos, Escanciano & Ingrid Van Keilegom, Ingrid, 2020. "Two-Step Semiparametric Empirical Likelihood Inference," LIDAM Reprints ISBA 2020046, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    9. Yuichi Kitamura, 2006. "Empirical Likelihood Methods in Econometrics: Theory and Practice," Cowles Foundation Discussion Papers 1569, Cowles Foundation for Research in Economics, Yale University.
    10. Song Chen & Ingrid Van Keilegom, 2009. "A review on empirical likelihood methods for regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 18(3), pages 415-447, November.
    11. Yuichi Kitamura, 2006. "Empirical Likelihood Methods in Econometrics: Theory and Practice," CIRJE F-Series CIRJE-F-430, CIRJE, Faculty of Economics, University of Tokyo.
    12. Song Chen & Ingrid Van Keilegom, 2009. "Rejoinder on: A review on empirical likelihood methods for regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 18(3), pages 468-474, November.
    13. Yukitoshi Matsushita & Taisuke Otsu, 2024. "Empirical Likelihood for Network Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(547), pages 2117-2128, July.
    14. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chiang, Harold D. & Matsushita, Yukitoshi & Otsu, Taisuke, 2025. "Multiway empirical likelihood," Journal of Econometrics, Elsevier, vol. 249(PA).
    2. Song, Yichun, 2025. "A Frisch-Waugh-Lovell theorem for empirical likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 211(C).
    3. Lin, Zhexiao & Han, Fang, 2025. "On regression-adjusted imputation estimators of average treatment effects," Journal of Econometrics, Elsevier, vol. 251(C).
    4. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    5. Harold D Chiang & Yukitoshi Matsushita & Taisuke Otsu, 2021. "Multiway empirical likelihood," Papers 2108.04852, arXiv.org, revised Aug 2024.
    6. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    7. Harold D Chiang & Yukitoshi Matsushita & Taisuke Otsu, 2021. "Multiway empirical likelihood," STICERD - Econometrics Paper Series 617, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    8. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    9. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    10. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    11. Escribano, Álvaro & Wang, Dandan, 2021. "Mixed random forest, cointegration, and forecasting gasoline prices," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1442-1462.
    12. Yigit Aydede & Jan Ditzen, 2022. "Identifying the regional drivers of influenza-like illness in Nova Scotia with dominance analysis," Papers 2212.06684, arXiv.org.
    13. Lotfi Boudabsa & Damir Filipovi'c, 2022. "Ensemble learning for portfolio valuation and risk management," Papers 2204.05926, arXiv.org.
    14. Li, Minqiang & Peng, Liang & Qi, Yongcheng, 2011. "Reduce computation in profile empirical likelihood method," MPRA Paper 33744, University Library of Munich, Germany.
    15. Daniel Boller & Michael Lechner & Gabriel Okasa, 2025. "The effect of sport in online dating: evidence from causal machine learning," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 12(1), pages 1-13, December.
    16. Stefan Boes, 2007. "Count Data Models with Unobserved Heterogeneity: An Empirical Likelihood Approach," SOI - Working Papers 0704, Socioeconomic Institute - University of Zurich.
    17. Zhang, Rongmao & Peng, Liang & Qi, Yongcheng, 2012. "Jackknife-blockwise empirical likelihood methods under dependence," Journal of Multivariate Analysis, Elsevier, vol. 104(1), pages 56-72, February.
    18. Max Biggs & Rim Hariss & Georgia Perakis, 2023. "Constrained optimization of objective functions determined from random forests," Production and Operations Management, Production and Operations Management Society, vol. 32(2), pages 397-415, February.
    19. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    20. Xiao, Zhiguo, 2010. "The weighted method of moments approach for moment condition models," Economics Letters, Elsevier, vol. 107(2), pages 183-186, May.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2511.13934. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.