Automatic debiased machine learning and sensitivity analysis for sample selection models

Automatic debiased machine learning and sensitivity analysis for sample selection models

Author

Listed:

Jakob Bjelac
Victor Chernozhukov
Phil-Adrian Klotz
Jannis Kueck
Theresa M. A. Schmitz

Abstract

In this paper, we extend the Riesz representation framework to causal inference under sample selection, where both treatment assignment and outcome observability are non-random. Formulating the problem in terms of a Riesz representer enables stable estimation and a transparent decomposition of omitted variable bias into three interpretable components: a data-identified scale factor, outcome confounding strength, and selection confounding strength. For estimation, we employ the ForestRiesz estimator, which accounts for selective outcome observability while avoiding the instability associated with direct propensity score inversion. We assess finite-sample performance through a simulation study and show that conventional double machine learning approaches can be highly sensitive to tuning parameters due to their reliance on inverse probability weighting, whereas the ForestRiesz estimator delivers more stable performance by leveraging automatic debiased machine learning. In an empirical application to the gender wage gap in the U.S., we find that our ForestRiesz approach yields larger treatment effect estimates than a standard double machine learning approach, suggesting that ignoring sample selection leads to an underestimation of the gender wage gap. Sensitivity analysis indicates that implausibly strong unobserved confounding would be required to overturn our results. Overall, our approach provides a unified, robust, and computationally attractive framework for causal inference under sample selection.

Suggested Citation

Jakob Bjelac & Victor Chernozhukov & Phil-Adrian Klotz & Jannis Kueck & Theresa M. A. Schmitz, 2026. "Automatic debiased machine learning and sensitivity analysis for sample selection models," Papers 2601.08643, arXiv.org.

Handle: RePEc:arx:papers:2601.08643

Download full text from publisher

References listed on IDEAS

Newey, Whitney K, 1994. "The Asymptotic Variance of Semiparametric Estimators," Econometrica, Econometric Society, vol. 62(6), pages 1349-1382, November.
- Newey, W.K., 1989. "The Asymptotic Variance Of Semiparametric Estimotors," Papers 346, Princeton, Department of Economics - Econometric Research Program.
- Newey, W.K., 1991. "The Asymptotic Variance of Semiparametric Estimators," Working papers 583, Massachusetts Institute of Technology (MIT), Department of Economics.
Victor Chernozhukov & Carlos Cinelli & Whitney Newey & Amit Sharma & Vasilis Syrgkanis, 2021. "Long Story Short: Omitted Variable Bias in Causal Machine Learning," Papers 2112.13398, arXiv.org, revised May 2024.
- Victor Chernozhukov & Carlos Cinelli & Whitney Newey & Amit Sharma & Vasilis Syrgkanis, 2022. "Long Story Short: Omitted Variable Bias in Causal Machine Learning," NBER Working Papers 30302, National Bureau of Economic Research, Inc.
Philipp Bach & Oliver Schacht & Victor Chernozhukov & Sven Klaassen & Martin Spindler, 2024. "Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study," Papers 2402.04674, arXiv.org.
Victor Chernozhukov & Whitney K Newey & Rahul Singh, 2022. "Debiased machine learning of global and local parameters using regularized Riesz representers [Semiparametric instrumental variable estimation of treatment response models]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 576-601.
- Victor Chernozhukov & Whitney Newey & Rahul Singh, 2018. "De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers," Papers 1802.08667, arXiv.org, revised Oct 2022.
Joseph G. Altonji & Todd E. Elder & Christopher R. Taber, 2005. "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools," Journal of Political Economy, University of Chicago Press, vol. 113(1), pages 151-184, February.
- Joseph G. Altonji & Todd E. Elder & Christopher R. Taber, 2000. "Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools," NBER Working Papers 7831, National Bureau of Economic Research, Inc.
Victor Chernozhukov & Whitney K. Newey & Victor Quintas-Martinez & Vasilis Syrgkanis, 2021. "RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests," Papers 2110.03031, arXiv.org, revised Jun 2022.
Sofiia Dolgikh & Bodan Potanin, 2025. "Double machine learning for causal inference in a multivariate sample selection model," Papers 2511.12640, arXiv.org.
Victor Chernozhukov & Whitney K. Newey & Rahul Singh, 2022. "Automatic Debiased Machine Learning of Causal and Structural Effects," Econometrica, Econometric Society, vol. 90(3), pages 967-1027, May.
- Victor Chernozhukov & Whitney K Newey & Rahul Singh, 2018. "Automatic Debiased Machine Learning of Causal and Structural Effects," Papers 1809.05224, arXiv.org, revised Oct 2022.
Carlos Cinelli & Chad Hazlett, 2020. "Making sense of sensitivity: extending omitted variable bias," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(1), pages 39-67, February.
Bach, Philipp & Klaaßen, Sven & Kueck, Jannis & Mattes, Mara & Spindler, Martin, 2025. "Sensitivity analysis for treatment effects in difference-in-differences models using Riesz Rrepresentation," Discussion Papers 2025/7, Free University Berlin, School of Business & Economics.
Philipp Bach & Sven Klaassen & Jannis Kueck & Mara Mattes & Martin Spindler, 2025. "Sensitivity Analysis for Treatment Effects in Difference-in-Differences Models using Riesz Representation," Papers 2510.09064, arXiv.org.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Yuya Sasaki & Takuya Ura & Yichong Zhang, 2022. "Unconditional quantile regression with high‐dimensional data," Quantitative Economics, Econometric Society, vol. 13(3), pages 955-978, July.
- Yuya Sasaki & Takuya Ura & Yichong Zhang, 2020. "Unconditional Quantile Regression with High Dimensional Data," Papers 2007.13659, arXiv.org, revised Feb 2022.
Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
Zequn Jin & Lihua Lin & Zhengyu Zhang, 2022. "Identification and Auto-debiased Machine Learning for Outcome Conditioned Average Structural Derivatives," Papers 2211.07903, arXiv.org.
Brenda Prallon, 2026. "How Robust are Robustness Checks?," Papers 2602.19384, arXiv.org.
Gyungbae Park, 2024. "Debiased Machine Learning when Nuisance Parameters Appear in Indicator Functions," Papers 2403.15934, arXiv.org, revised Mar 2025.
Liu, Lin & Mukherjee, Rajarshi & Robins, James M., 2024. "Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators," Journal of Econometrics, Elsevier, vol. 240(2).
Jikai Jin & Vasilis Syrgkanis, 2025. "Sharp Structure-Agnostic Lower Bounds for General Linear Functional Estimation," Papers 2512.17341, arXiv.org, revised Jan 2026.
Zhengyu Zhang & Zequn Jin & Lihua Lin, 2024. "Identification and inference of outcome conditioned partial effects of general interventions," Papers 2407.16950, arXiv.org.
Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-Dimensional Econometrics and Regularized GMM," Papers 1806.01888, arXiv.org, revised Jun 2018.
Juan Carlos Escanciano & Telmo P'erez-Izquierdo, 2023. "Automatic Locally Robust GMM with Machine-Learning-Generated Regressors," Papers 2301.10643, arXiv.org, revised Mar 2026.
Craig S Wright, 2026. "Design-Robust Event-Study Estimation under Staggered Adoption Diagnostics, Sensitivity, and Orthogonalisation," Papers 2601.18801, arXiv.org.
Stéphane Bonhomme & Martin Weidner, 2020. "Minimizing Sensitivity to Model Misspecification," CeMMAP working papers CWP37/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Stéphane Bonhomme & Martin Weidner, 2022. "Minimizing sensitivity to model misspecification," Quantitative Economics, Econometric Society, vol. 13(3), pages 907-954, July.
Pontus af Buren & Jurg Schweri, 2024. "Firms' training processes and their apprentices' education success," Economics of Education Working Paper Series 0225, University of Zurich, Department of Business Administration (IBW).
Betts, Alexander & Flinder Stierna, Maria & Omata, Naohiko & Sterck, Olivier, 2023. "Refugees welcome? Inter-group interaction and host community attitude formation," World Development, Elsevier, vol. 161(C).
Betts,Alexander Milton Stedman & Stierna,Maria Flinder & Omata,Naohiko & Sterck,Olivier Christian Brigitte, 2022. "Social Cohesion and Refugee-Host Interactions : Evidence from East Africa," Policy Research Working Paper Series 9917, The World Bank.
Melody Huang & Cory McCartan, 2025. "Relative Bias Under Imperfect Identification in Observational Causal Inference," Papers 2507.23743, arXiv.org, revised Mar 2026.
St'ephane Bonhomme & Martin Weidner, 2018. "Minimizing Sensitivity to Model Misspecification," Papers 1807.02161, arXiv.org, revised Oct 2021.
- Stéphane Bonhomme & Martin Weidner, 2018. "Minimizing sensitivity to model misspecification," CeMMAP working papers CWP59/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Christoph Breunig & Ruixuan Liu & Zhengfei Yu, 2025. "Robust Semiparametric Inference for Bayesian Additive Regression Trees," Papers 2509.24634, arXiv.org, revised Oct 2025.
Juan Carlos Escanciano & Lin Zhu, 2013. "Set inferences and sensitivity analysis in semiparametric conditionally identified models," CeMMAP working papers 55/13, Institute for Fiscal Studies.
- Juan Carlos Escanciano & Lin Zhu, 2013. "Set inferences and sensitivity analysis in semiparametric conditionally identified models," CeMMAP working papers CWP55/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2026-01-26 (Big Data)
NEP-CMP-2026-01-26 (Computational Economics)
NEP-ECM-2026-01-26 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.08643. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Automatic debiased machine learning and sensitivity analysis for sample selection models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data