IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.15360.html

Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation

Author

Listed:
  • Eichi Uehara

Abstract

Estimating Heterogeneous Treatment Effects (HTE) in industrial applications such as AdTech and healthcare presents a dual challenge: extreme class imbalance and heavy-tailed outcome distributions. While the X-Learner framework effectively addresses imbalance through cross-imputation, we demonstrate that it is fundamentally vulnerable to "Outlier Smearing" when reliant on Mean Squared Error (MSE) minimization. In this failure mode, the bias from a few extreme observations ("whales") in the minority group is propagated to the entire majority group during the imputation step, corrupting the estimated treatment effect structure. To resolve this, we propose the Robust X-Learner (RX-Learner). This framework integrates a redescending {\gamma}-divergence objective -- structurally equivalent to the Welsch loss under Gaussian assumptions -- into the gradient boosting machinery. We further stabilize the non-convex optimization using a Proxy Hessian strategy grounded in Majorization-Minimization (MM) principles. Empirical evaluation on a semi-synthetic Criteo Uplift dataset demonstrates that the RX-Learner reduces the Precision in Estimation of Heterogeneous Effect (PEHE) metric by 98.6% compared to the standard X-Learner, effectively decoupling the stable "Core" population from the volatile "Periphery".

Suggested Citation

  • Eichi Uehara, 2026. "Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation," Papers 2601.15360, arXiv.org.
  • Handle: RePEc:arx:papers:2601.15360
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.15360
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    2. X Nie & S Wager, 2021. "Quasi-oracle estimation of heterogeneous treatment effects [TensorFlow: A system for large-scale machine learning]," Biometrika, Biometrika Trust, vol. 108(2), pages 299-319.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    2. Hua Chen & Jianing Xing & Xiaoxu Yang & Kai Zhan, 2021. "Heterogeneous Effects of Health Insurance on Rural Children’s Health in China: A Causal Machine Learning Approach," IJERPH, MDPI, vol. 18(18), pages 1-14, September.
    3. Krantz, Sebastian, 2024. "Mapping Africa's infrastructure potential with geospatial big data and causal ML," Kiel Working Papers 2276, Kiel Institute for the World Economy.
    4. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    5. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    6. Wang, Jingyuan & Terabe, Shintaro & Yaginuma, Hideki, 2026. "Evaluating the long-term urban effects of high-speed rail in Japan: An integrated approach using synthetic difference-in-differences and double/debiased machine learning," Transportation Research Part A: Policy and Practice, Elsevier, vol. 203(C).
    7. Newham, Melissa & Valente, Marica, 2024. "The cost of influence: How gifts to physicians shape prescriptions and drug costs," Journal of Health Economics, Elsevier, vol. 95(C).
    8. Michael Lechner & Jana Mareckova, 2024. "Comprehensive Causal Machine Learning," Papers 2405.10198, arXiv.org, revised Feb 2025.
    9. Paul S. Clarke & Annalivia Polselli, 2023. "Double Machine Learning for Static Panel Models with Fixed Effects," Papers 2312.08174, arXiv.org, revised Dec 2024.
    10. Vinish Shrestha, 2024. "Heterogeneous Impacts of ACA-Medicaid Expansion on Insurance and Labor Market Outcomes in the American South," Working Papers 2024-08, Towson University, Department of Economics, revised Jun 2024.
    11. Keyon Vafa & Susan Athey & David M. Blei, 2025. "Estimating wage disparities using foundation models," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 122(22), pages 2427298122-, June.
    12. Marica Valente & Timm Gries & Lorenzo Trapani, 2023. "Informal employment from migration shocks," Working Papers 2023-09, Faculty of Economics and Statistics, Universität Innsbruck.
    13. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    14. Xinkun Nie & Stefan Wager, 2017. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects," Papers 1712.04912, arXiv.org, revised Aug 2020.
    15. Juan Carlos Escanciano & Telmo P'erez-Izquierdo, 2023. "Automatic Locally Robust GMM with Machine-Learning-Generated Regressors," Papers 2301.10643, arXiv.org, revised Mar 2026.
    16. St'ephane Bonhomme & Koen Jochmans & Martin Weidner, 2024. "A Neyman-Orthogonalization Approach to the Incidental Parameter Problem," Papers 2412.10304, arXiv.org, revised Feb 2026.
    17. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    18. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    19. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    20. Harsh Parikh & Trang Quynh Nguyen & Elizabeth A. Stuart & Kara E. Rudolph & Caleb H. Miles, 2025. "A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference," Papers 2505.11014, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.15360. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.