IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2502.15072.html
   My bibliography  Save this paper

Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making

Author

Listed:
  • Lei Bill Wang
  • Zhenbang Jiao
  • Fangyi Wang

Abstract

Policymakers often use Classification and Regression Trees (CART) to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. However, classic CART and knowledge distillation method whose student model is a CART (referred to as KD-CART) do not minimize the misclassification risk associated with classifying the latent probabilities of these binary events. To reduce the misclassification risk, we propose two methods, Penalized Final Split (PFS) and Maximizing Distance Final Split (MDFS). PFS incorporates a tunable penalty into the standard CART splitting criterion function. MDFS maximizes a weighted sum of distances between node means and the threshold. It can point-identify the optimal split under the unique intersect latent probability assumption. In addition, we develop theoretical result for MDFS splitting rule estimation, which has zero asymptotic risk. Through extensive simulation studies, we demonstrate that these methods predominately outperform classic CART and KD-CART in terms of misclassification error. Furthermore, in our empirical evaluations, these methods provide deeper insights than the two baseline methods.

Suggested Citation

  • Lei Bill Wang & Zhenbang Jiao & Fangyi Wang, 2025. "Modifying Final Splits of Classification Tree for Fine-tuning Subpopulation Target in Policy Making," Papers 2502.15072, arXiv.org.
  • Handle: RePEc:arx:papers:2502.15072
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2502.15072
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ghysels, Eric & Babii, Andrii & Chen, Xi & Kumar, Rohit, 2020. "Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice," CEPR Discussion Papers 15418, C.E.P.R. Discussion Papers.
    2. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    3. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    4. Ruoqing Zhu & Donglin Zeng & Michael R. Kosorok, 2015. "Reinforcement Learning Trees," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1770-1784, December.
    5. Andini, Monica & Ciani, Emanuele & de Blasio, Guido & D'Ignazio, Alessio & Salvestrini, Viola, 2018. "Targeting with machine learning: An application to a tax rebate program in Italy," Journal of Economic Behavior & Organization, Elsevier, vol. 156(C), pages 86-102.
    6. Waheed, T. & Bonnell, R.B. & Prasher, S.O. & Paulet, E., 2006. "Measuring performance in precision agriculture: CART--A decision tree approach," Agricultural Water Management, Elsevier, vol. 84(1-2), pages 173-185, July.
    7. Sexton, Joseph & Laake, Petter, 2009. "Standard errors for bagged and random forest estimators," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 801-811, January.
    8. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    9. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    2. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    3. David M. Ritzwoller & Vasilis Syrgkanis, 2024. "Simultaneous Inference for Local Structural Parameters with Random Forests," Papers 2405.07860, arXiv.org, revised Sep 2024.
    4. Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," Working Papers 2201, Tulane University, Department of Economics.
    5. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    6. Battiston, Pietro & Gamba, Simona & Santoro, Alessandro, 2024. "Machine learning and the optimization of prediction-based policies," Technological Forecasting and Social Change, Elsevier, vol. 199(C).
    7. Cerqua, Augusto & Letta, Marco, 2022. "Local inequalities of the COVID-19 crisis," Regional Science and Urban Economics, Elsevier, vol. 92(C).
    8. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    9. Rina Friedberg & Julie Tibshirani & Susan Athey & Stefan Wager, 2018. "Local Linear Forests," Papers 1807.11408, arXiv.org, revised Sep 2020.
    10. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    11. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    12. Garbero, Alessandra & Sakos, Grayson & Cerulli, Giovanni, 2023. "Towards data-driven project design: Providing optimal treatment rules for development projects," Socio-Economic Planning Sciences, Elsevier, vol. 89(C).
    13. Ta-Wei Huang & Eva Ascarza, 2024. "Doing More with Less: Overcoming Ineffective Long-Term Targeting Using Short-Term Signals," Marketing Science, INFORMS, vol. 43(4), pages 863-884, July.
    14. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    15. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    16. Black, Dan A. & Grogger, Jeffrey & Kirchmaier, Tom & Sanders, Koen, 2023. "Criminal charges, risk assessment and violent recidivism in cases of domestic abuse," LSE Research Online Documents on Economics 121374, London School of Economics and Political Science, LSE Library.
    17. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    18. Domenico Giannone & Michele Lenza & Giorgio E. Primiceri, 2021. "Economic Predictions With Big Data: The Illusion of Sparsity," Econometrica, Econometric Society, vol. 89(5), pages 2409-2437, September.
    19. Grodecka, Anna & Hull, Isaiah, 2019. "The Impact of Local Taxes and Public Services on Property Values," Working Paper Series 374, Sveriges Riksbank (Central Bank of Sweden).
    20. Nathan Kallus & Miruna Oprescu, 2022. "Robust and Agnostic Learning of Conditional Distributional Treatment Effects," Papers 2205.11486, arXiv.org, revised Feb 2023.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2502.15072. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.