IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v71y2022i5p1865-1894.html
   My bibliography  Save this article

Flexible domain prediction using mixed effects random forests

Author

Listed:
  • Patrick Krennmair
  • Timo Schmid

Abstract

This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area‐specific sample sizes. Small area estimators are predominantly conceptualised within the regression‐setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non‐linear and non‐parametric alternatives, combining excellent predictive performance and a reduced risk of model‐misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non‐parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income‐data from the state Nuevo León. Finally, the methodology is evaluated in model‐based and design‐based simulations comparing the proposed methodology to traditional regression‐based approaches for estimating small area averages.

Suggested Citation

  • Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
  • Handle: RePEc:bla:jorssc:v:71:y:2022:i:5:p:1865-1894
    DOI: 10.1111/rssc.12600
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12600
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12600?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Monique Graf & J. Miguel Marín & Isabel Molina, 2019. "A generalized mixed model for skewed distributions applied to small area estimation," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 565-597, June.
    3. Bilton, Penny & Jones, Geoff & Ganesh, Siva & Haslett, Steve, 2017. "Classification trees for poverty mapping," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 53-66.
    4. Peter Hall & Tapabrata Maiti, 2006. "On parametric bootstrap methods for small area prediction," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(2), pages 221-238, April.
    5. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    6. Nikos Tzavidis & Li‐Chun Zhang & Angela Luna & Timo Schmid & Natalia Rojas‐Perilla, 2018. "From start to finish: a framework for the production of small area official statistics," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 927-979, October.
    7. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    8. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    9. Sexton, Joseph & Laake, Petter, 2009. "Standard errors for bagged and random forest estimators," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 801-811, January.
    10. Sugasawa, Shonosuke & Kubokawa, Tatsuya, 2017. "Transforming response values in small area prediction," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 47-60.
    11. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    12. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    13. Timo Schmid & Fabian Bruckschen & Nicola Salvati & Till Zbiranski, 2017. "Constructing sociodemographic indicators for national statistical institutes by using mobile phone data: estimating literacy rates in Senegal," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(4), pages 1163-1190, October.
    14. Hajjem, Ahlem & Bellavance, François & Larocque, Denis, 2011. "Mixed effects regression trees for clustered data," Statistics & Probability Letters, Elsevier, vol. 81(4), pages 451-459, April.
    15. Frederic Lambert & Hyunmin Park, 2019. "Income Inequality and Government Transfers in Mexico," IMF Working Papers 2019/148, International Monetary Fund.
    16. J. D. Opsomer & G. Claeskens & M. G. Ranalli & G. Kauermann & F. J. Breidt, 2008. "Non‐parametric small area estimation using penalized spline regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 265-286, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tomasz .Zk{a}d{l}o & Adam Chwila, 2024. "A step towards the integration of machine learning and small area estimation," Papers 2402.07521, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    2. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    3. Nikos Tzavidis & Li‐Chun Zhang & Angela Luna & Timo Schmid & Natalia Rojas‐Perilla, 2018. "From start to finish: a framework for the production of small area official statistics," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 927-979, October.
    4. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    5. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    6. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    7. Escribano, Álvaro & Wang, Dandan, 2021. "Mixed random forest, cointegration, and forecasting gasoline prices," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1442-1462.
    8. Yigit Aydede & Jan Ditzen, 2022. "Identifying the regional drivers of influenza-like illness in Nova Scotia with dominance analysis," Papers 2212.06684, arXiv.org.
    9. Daniel Boller & Michael Lechner & Gabriel Okasa, 2021. "The Effect of Sport in Online Dating: Evidence from Causal Machine Learning," Papers 2104.04601, arXiv.org.
    10. Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
    11. Max Biggs & Rim Hariss & Georgia Perakis, 2023. "Constrained optimization of objective functions determined from random forests," Production and Operations Management, Production and Operations Management Society, vol. 32(2), pages 397-415, February.
    12. Marchetti Stefano & Tzavidis Nikos, 2021. "Robust Estimation of the Theil Index and the Gini Coeffient for Small Areas," Journal of Official Statistics, Sciendo, vol. 37(4), pages 955-979, December.
    13. Kayo Murakami & Hideki Shimada & Yoshiaki Ushifusa & Takanori Ida, 2022. "Heterogeneous Treatment Effects Of Nudge And Rebate: Causal Machine Learning In A Field Experiment On Electricity Conservation," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1779-1803, November.
    14. Nora Würz & Timo Schmid & Nikos Tzavidis, 2022. "Estimating regional income indicators under transformations and access to limited population auxiliary information," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1679-1706, October.
    15. Tzavidis, Nikos & Zhang, Li-Chun & Luna Hernandez, Angela & Schmid, Timo & Rojas-Perilla, Natalia, 2016. "From start to finish: A framework for the production of small area official statistics," Discussion Papers 2016/13, Free University Berlin, School of Business & Economics.
    16. Natalia Rojas‐Perilla & Sören Pannier & Timo Schmid & Nikos Tzavidis, 2020. "Data‐driven transformations in small area estimation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 121-148, January.
    17. Paul Walter & Marcus Groß & Timo Schmid & Nikos Tzavidis, 2021. "Domain prediction with grouped income data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1501-1523, October.
    18. Ann-Kristin Kreutzmann, 2018. "Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions [Die Schätzung von Quantilen: Herausforderungen und Probleme im Kontext von Einkommens- und V," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 12(3), pages 245-270, December.
    19. Pedro Forquesato, 2022. "Who Benefits from Political Connections in Brazilian Municipalities," Papers 2204.09450, arXiv.org.
    20. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:71:y:2022:i:5:p:1865-1894. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.