IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2408.01023.html

Distilling interpretable causal trees from causal forests

Author

Listed:
  • Patrick Rehill

Abstract

Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.

Suggested Citation

  • Patrick Rehill, 2024. "Distilling interpretable causal trees from causal forests," Papers 2408.01023, arXiv.org.
  • Handle: RePEc:arx:papers:2408.01023
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2408.01023
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Amann, Erwin & Rzepka, Sylvi, 2023. "The effect of goal-setting prompts in a blended learning environment—evidence from a field experiment," Economics of Education Review, Elsevier, vol. 92(C).
    3. Grubinger, Thomas & Zeileis, Achim & Pfeiffer, Karl-Peter, 2014. "evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i01).
    4. Vira Semenova & Victor Chernozhukov, 2021. "Debiased machine learning of conditional average treatment effects and other causal functions," The Econometrics Journal, Royal Economic Society, vol. 24(2), pages 264-289.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Patrick Rehill, 2024. "How do applied researchers use the Causal Forest? A methodological review of a method," Papers 2404.13356, arXiv.org, revised Dec 2024.
    2. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    3. Patrick Rehill & Nicholas Biddle, 2024. "Heterogeneous treatment effect estimation with high-dimensional data in public policy evaluation -- an application to the conditioning of cash transfers in Morocco using causal machine learning," Papers 2401.07075, arXiv.org, revised Mar 2024.
    4. Heejun Shin & Joseph Antonelli, 2023. "Improved inference for doubly robust estimators of heterogeneous treatment effects," Biometrics, The International Biometric Society, vol. 79(4), pages 3140-3152, December.
    5. Heiler, Phillip, 2024. "Heterogeneous treatment effect bounds under sample selection with an application to the effects of social media on political polarization," Journal of Econometrics, Elsevier, vol. 244(1).
    6. Huber, Martin & Meier, Jonas & Wallimann, Hannes, 2022. "Business analytics meets artificial intelligence: Assessing the demand effects of discounts on Swiss train tickets," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 22-39.
    7. Henrika Langen & Martin Huber, 2023. "How causal machine learning can leverage marketing strategies: Assessing and improving the performance of a coupon campaign," PLOS ONE, Public Library of Science, vol. 18(1), pages 1-37, January.
    8. Liu, Nan & Liu, Yanbo & Sasaki, Yuya, 2026. "Estimation and inference for causal functions with multi-way clustered data," Journal of Econometrics, Elsevier, vol. 253(C).
    9. Martin Huber & Jannis Kueck, 2022. "Testing the identification of causal effects in observational data," Papers 2203.15890, arXiv.org, revised May 2026.
    10. Phillip Heiler & Michael C. Knaus, 2021. "Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments," Papers 2110.01427, arXiv.org, revised Aug 2023.
    11. Patrick Rehill & Nicholas Biddle, 2023. "Transparency challenges in policy evaluation with causal machine learning -- improving usability and accountability," Papers 2310.13240, arXiv.org, revised Mar 2024.
    12. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    13. Patrick Rehill & Nicholas Biddle, 2025. "Policy Learning for Many Outcomes of Interest: Combining Optimal Policy Trees with Multi-objective Bayesian Optimisation," Computational Economics, Springer;Society for Computational Economics, vol. 66(2), pages 971-1001, August.
    14. Yingheng Zhang & Haojie Li & Gang Ren, 2025. "Data-driven exploration of heterogeneous gasoline price elasticities using generalized random forests," Transportation, Springer, vol. 52(1), pages 215-237, February.
    15. Yao Cui & Andrew M. Davis, 2022. "Tax-Induced Inequalities in the Sharing Economy," Management Science, INFORMS, vol. 68(10), pages 7202-7220, October.
    16. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    17. Franziska Zimmert & Michael Zimmert, 2024. "Part‐time subsidies and maternal reemployment: Evidence from a difference‐in‐differences analysis," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(6), pages 1149-1171, September.
    18. Dana Turjeman & Fred M. Feinberg, 2024. "When the Data Are Out: Measuring Behavioral Changes Following a Data Breach," Marketing Science, INFORMS, vol. 43(2), pages 440-461, March.
    19. Mohammed Alyakoob & Mohammad Rahman, 2026. "Market Design Choices, Racial Discrimination, and Equitable Microentrepreneurship in Digital Marketplaces," Management Science, INFORMS, vol. 72(3), pages 1878-1903, March.
    20. Achim Ahrens & Victor Chernozhukov & Christian Hansen & Damian Kozbur & Mark Schaffer & Thomas Wiemann, 2025. "An Introduction to Double/Debiased Machine Learning," Papers 2504.08324, arXiv.org, revised Feb 2026.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2408.01023. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.