IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2012.00745.html
   My bibliography  Save this paper

Double machine learning for sample selection models

Author

Listed:
  • Michela Bia
  • Martin Huber
  • Luk'av{s} Laff'ers

Abstract

This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data for evaluating the effect of training on hourly wages which are only observed conditional on employment. The estimator is available in the causalweight package for the statistical software R.

Suggested Citation

  • Michela Bia & Martin Huber & Luk'av{s} Laff'ers, 2020. "Double machine learning for sample selection models," Papers 2012.00745, arXiv.org, revised Jul 2021.
  • Handle: RePEc:arx:papers:2012.00745
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2012.00745
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Hausman, Jerry A & Wise, David A, 1979. "Attrition Bias in Experimental and Panel Data: The Gary Income Maintenance Experiment," Econometrica, Econometric Society, vol. 47(2), pages 455-473, March.
    2. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    3. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    4. Guido W. Imbens & Whitney K. Newey, 2009. "Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity," Econometrica, Econometric Society, vol. 77(5), pages 1481-1512, September.
    5. Martin Huber, 2012. "Identification of Average Treatment Effects in Social Experiments Under Alternative Forms of Attrition," Journal of Educational and Behavioral Statistics, , vol. 37(3), pages 443-474, June.
    6. Jeffrey M. Wooldridge, 2002. "Inverse probability weighted M-estimators for sample selection, attrition, and stratification," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 117-139, August.
    7. Whitney K. Newey & James L. Powell & Francis Vella, 1999. "Nonparametric Estimation of Triangular Simultaneous Equations Models," Econometrica, Econometric Society, vol. 67(3), pages 565-604, May.
    8. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    9. John Fitzgerald & Peter Gottschalk & Robert Moffitt, 1998. "An Analysis of Sample Attrition in Panel Data: The Michigan Panel Study of Income Dynamics," Journal of Human Resources, University of Wisconsin Press, vol. 33(2), pages 251-299.
    10. Richard W. Blundell & James L. Powell, 2004. "Endogeneity in Semiparametric Binary Response Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 71(3), pages 655-679.
    11. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    12. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    13. Martin Huber & Blaise Melly, 2015. "A Test of the Conditional Independence Assumption in Sample Selection Models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(7), pages 1144-1168, November.
    14. Kosuke Imai, 2009. "Statistical analysis of randomized experiments with non‐ignorable missing binary outcomes: an application to a voting experiment," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 58(1), pages 83-104, February.
    15. Abowd J.M. & Crepon B. & Kramarz F., 2001. "Moment Estimation With Attrition: An Application to Economic Models," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1223-1231, December.
    16. Gronau, Reuben, 1974. "Wage Comparisons-A Selectivity Bias," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1119-1143, Nov.-Dec..
    17. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    18. Joshua Angrist & Eric Bettinger & Michael Kremer, 2006. "Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia," American Economic Review, American Economic Association, vol. 96(3), pages 847-862, June.
    19. Mitali Das & Whitney K. Newey & Francis Vella, 2003. "Nonparametric Estimation of Sample Selection Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 70(1), pages 33-58.
    20. Ye Luo & Martin Spindler & Jannis Kuck, 2016. "High-Dimensional $L_2$Boosting: Rate of Convergence," Papers 1602.08927, arXiv.org, revised Jul 2022.
    21. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    22. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    23. Jeffrey M. Wooldridge, 2002. "Inverse probability weighted M-estimators for sample selection, attrition, and stratification," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 117-139, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martin Huber & Anna Solovyeva, 2020. "Direct and Indirect Effects under Sample Selection and Outcome Attrition," Econometrics, MDPI, vol. 8(4), pages 1-25, December.
    2. Martin Huber, 2012. "Identification of Average Treatment Effects in Social Experiments Under Alternative Forms of Attrition," Journal of Educational and Behavioral Statistics, , vol. 37(3), pages 443-474, June.
    3. Martin Huber, 2014. "Treatment Evaluation in the Presence of Sample Selection," Econometric Reviews, Taylor & Francis Journals, vol. 33(8), pages 869-905, November.
    4. Martin Huber, 2010. "Identification of average treatment effects in social experiments under different forms of attrition," University of St. Gallen Department of Economics working paper series 2010 2010-22, Department of Economics, University of St. Gallen.
    5. Markus Frölich & Martin Huber, 2014. "Treatment Evaluation With Multiple Outcome Periods Under Endogeneity and Attrition," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1697-1711, December.
    6. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    7. Blundell, Richard & Powell, James L., 2007. "Censored regression quantiles with endogenous regressors," Journal of Econometrics, Elsevier, vol. 141(1), pages 65-83, November.
    8. Martin Huber & Giovanni Mellace, 2015. "Sharp Bounds on Causal Effects under Sample Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 77(1), pages 129-151, February.
    9. Martin Huber & Giovanni Mellace, 2014. "Testing exclusion restrictions and additive separability in sample selection models," Empirical Economics, Springer, vol. 47(1), pages 75-92, August.
    10. Lewbel, Arthur, 2007. "Endogenous selection or treatment model estimation," Journal of Econometrics, Elsevier, vol. 141(2), pages 777-806, December.
    11. Richard Blundell & Monica Costa Dias, 2009. "Alternative Approaches to Evaluation in Empirical Microeconomics," Journal of Human Resources, University of Wisconsin Press, vol. 44(3).
    12. Hans Fricke & Markus Frölich & Martin Huber & Michael Lechner, 2020. "Endogeneity and non‐response bias in treatment evaluation – nonparametric identification of causal effects by instruments," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(5), pages 481-504, August.
    13. Escanciano, Juan Carlos & Jacho-Chávez, David T. & Lewbel, Arthur, 2014. "Uniform convergence of weighted sums of non and semiparametric residuals for estimation and testing," Journal of Econometrics, Elsevier, vol. 178(P3), pages 426-443.
    14. Nicoletti, Cheti, 2006. "Nonresponse in dynamic panel data models," Journal of Econometrics, Elsevier, vol. 132(2), pages 461-489, June.
    15. Ruoyao Shi, 2021. "An Averaging Estimator for Two Step M Estimation in Semiparametric Models," Working Papers 202105, University of California at Riverside, Department of Economics.
    16. Huber, Martin & Mellace, Giovanni, 2011. "Testing instrument validity in sample selection models," Economics Working Paper Series 1145, University of St. Gallen, School of Economics and Political Science.
    17. Rahul Singh, 2021. "Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection," Papers 2111.05277, arXiv.org.
    18. Bodory, Hugo & Huber, Martin, 2018. "The causalweight package for causal inference in R," FSES Working Papers 493, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    19. Juan Carlos Escanciano & Telmo P'erez-Izquierdo, 2023. "Automatic Locally Robust Estimation with Generated Regressors," Papers 2301.10643, arXiv.org, revised Nov 2023.
    20. Hamermesh, Daniel S. & Donald, Stephen G., 2008. "The effect of college curriculum on earnings: An affinity identifier for non-ignorable non-response bias," Journal of Econometrics, Elsevier, vol. 144(2), pages 479-491, June.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2012.00745. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.