IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.08643.html

Automatic debiased machine learning and sensitivity analysis for sample selection models

Author

Listed:
  • Jakob Bjelac
  • Victor Chernozhukov
  • Phil-Adrian Klotz
  • Jannis Kueck
  • Theresa M. A. Schmitz

Abstract

In this paper, we extend the Riesz representation framework to causal inference under sample selection, where both treatment assignment and outcome observability are non-random. Formulating the problem in terms of a Riesz representer enables stable estimation and a transparent decomposition of omitted variable bias into three interpretable components: a data-identified scale factor, outcome confounding strength, and selection confounding strength. For estimation, we employ the ForestRiesz estimator, which accounts for selective outcome observability while avoiding the instability associated with direct propensity score inversion. We assess finite-sample performance through a simulation study and show that conventional double machine learning approaches can be highly sensitive to tuning parameters due to their reliance on inverse probability weighting, whereas the ForestRiesz estimator delivers more stable performance by leveraging automatic debiased machine learning. In an empirical application to the gender wage gap in the U.S., we find that our ForestRiesz approach yields larger treatment effect estimates than a standard double machine learning approach, suggesting that ignoring sample selection leads to an underestimation of the gender wage gap. Sensitivity analysis indicates that implausibly strong unobserved confounding would be required to overturn our results. Overall, our approach provides a unified, robust, and computationally attractive framework for causal inference under sample selection.

Suggested Citation

  • Jakob Bjelac & Victor Chernozhukov & Phil-Adrian Klotz & Jannis Kueck & Theresa M. A. Schmitz, 2026. "Automatic debiased machine learning and sensitivity analysis for sample selection models," Papers 2601.08643, arXiv.org.
  • Handle: RePEc:arx:papers:2601.08643
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.08643
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Newey, Whitney K, 1994. "The Asymptotic Variance of Semiparametric Estimators," Econometrica, Econometric Society, vol. 62(6), pages 1349-1382, November.
    2. Victor Chernozhukov & Carlos Cinelli & Whitney Newey & Amit Sharma & Vasilis Syrgkanis, 2021. "Long Story Short: Omitted Variable Bias in Causal Machine Learning," Papers 2112.13398, arXiv.org, revised May 2024.
    3. Philipp Bach & Oliver Schacht & Victor Chernozhukov & Sven Klaassen & Martin Spindler, 2024. "Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study," Papers 2402.04674, arXiv.org.
    4. Victor Chernozhukov & Whitney K Newey & Rahul Singh, 2022. "Debiased machine learning of global and local parameters using regularized Riesz representers [Semiparametric instrumental variable estimation of treatment response models]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 576-601.
    5. Joseph G. Altonji & Todd E. Elder & Christopher R. Taber, 2005. "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools," Journal of Political Economy, University of Chicago Press, vol. 113(1), pages 151-184, February.
    6. Victor Chernozhukov & Whitney K. Newey & Victor Quintas-Martinez & Vasilis Syrgkanis, 2021. "RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests," Papers 2110.03031, arXiv.org, revised Jun 2022.
    7. Sofiia Dolgikh & Bodan Potanin, 2025. "Double machine learning for causal inference in a multivariate sample selection model," Papers 2511.12640, arXiv.org.
    8. Victor Chernozhukov & Whitney K. Newey & Rahul Singh, 2022. "Automatic Debiased Machine Learning of Causal and Structural Effects," Econometrica, Econometric Society, vol. 90(3), pages 967-1027, May.
    9. Carlos Cinelli & Chad Hazlett, 2020. "Making sense of sensitivity: extending omitted variable bias," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(1), pages 39-67, February.
    10. Bach, Philipp & Klaaßen, Sven & Kueck, Jannis & Mattes, Mara & Spindler, Martin, 2025. "Sensitivity analysis for treatment effects in difference-in-differences models using Riesz Rrepresentation," Discussion Papers 2025/7, Free University Berlin, School of Business & Economics.
    11. Philipp Bach & Sven Klaassen & Jannis Kueck & Mara Mattes & Martin Spindler, 2025. "Sensitivity Analysis for Treatment Effects in Difference-in-Differences Models using Riesz Representation," Papers 2510.09064, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yuya Sasaki & Takuya Ura & Yichong Zhang, 2022. "Unconditional quantile regression with high‐dimensional data," Quantitative Economics, Econometric Society, vol. 13(3), pages 955-978, July.
    2. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    3. Zequn Jin & Lihua Lin & Zhengyu Zhang, 2022. "Identification and Auto-debiased Machine Learning for Outcome Conditioned Average Structural Derivatives," Papers 2211.07903, arXiv.org.
    4. Brenda Prallon, 2026. "How Robust are Robustness Checks?," Papers 2602.19384, arXiv.org.
    5. Gyungbae Park, 2024. "Debiased Machine Learning when Nuisance Parameters Appear in Indicator Functions," Papers 2403.15934, arXiv.org, revised Mar 2025.
    6. Liu, Lin & Mukherjee, Rajarshi & Robins, James M., 2024. "Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators," Journal of Econometrics, Elsevier, vol. 240(2).
    7. Jikai Jin & Vasilis Syrgkanis, 2025. "Sharp Structure-Agnostic Lower Bounds for General Linear Functional Estimation," Papers 2512.17341, arXiv.org, revised Jan 2026.
    8. Zhengyu Zhang & Zequn Jin & Lihua Lin, 2024. "Identification and inference of outcome conditioned partial effects of general interventions," Papers 2407.16950, arXiv.org.
    9. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Juan Carlos Escanciano & Telmo P'erez-Izquierdo, 2023. "Automatic Locally Robust GMM with Machine-Learning-Generated Regressors," Papers 2301.10643, arXiv.org, revised Mar 2026.
    11. Craig S Wright, 2026. "Design-Robust Event-Study Estimation under Staggered Adoption Diagnostics, Sensitivity, and Orthogonalisation," Papers 2601.18801, arXiv.org.
    12. Stéphane Bonhomme & Martin Weidner, 2020. "Minimizing Sensitivity to Model Misspecification," CeMMAP working papers CWP37/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Stéphane Bonhomme & Martin Weidner, 2022. "Minimizing sensitivity to model misspecification," Quantitative Economics, Econometric Society, vol. 13(3), pages 907-954, July.
    14. Pontus af Buren & Jurg Schweri, 2024. "Firms' training processes and their apprentices' education success," Economics of Education Working Paper Series 0225, University of Zurich, Department of Business Administration (IBW).
    15. Betts, Alexander & Flinder Stierna, Maria & Omata, Naohiko & Sterck, Olivier, 2023. "Refugees welcome? Inter-group interaction and host community attitude formation," World Development, Elsevier, vol. 161(C).
    16. Betts,Alexander Milton Stedman & Stierna,Maria Flinder & Omata,Naohiko & Sterck,Olivier Christian Brigitte, 2022. "Social Cohesion and Refugee-Host Interactions : Evidence from East Africa," Policy Research Working Paper Series 9917, The World Bank.
    17. Melody Huang & Cory McCartan, 2025. "Relative Bias Under Imperfect Identification in Observational Causal Inference," Papers 2507.23743, arXiv.org, revised Mar 2026.
    18. St'ephane Bonhomme & Martin Weidner, 2018. "Minimizing Sensitivity to Model Misspecification," Papers 1807.02161, arXiv.org, revised Oct 2021.
    19. Christoph Breunig & Ruixuan Liu & Zhengfei Yu, 2025. "Robust Semiparametric Inference for Bayesian Additive Regression Trees," Papers 2509.24634, arXiv.org, revised Oct 2025.
    20. Juan Carlos Escanciano & Lin Zhu, 2013. "Set inferences and sensitivity analysis in semiparametric conditionally identified models," CeMMAP working papers 55/13, Institute for Fiscal Studies.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.08643. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.