IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2511.12640.html

Double machine learning for causal inference in a multivariate sample selection model

Author

Listed:
  • Sofiia Dolgikh
  • Bodan Potanin

Abstract

We propose plug-in (PI) and double machine learning (DML) estimators of average treatment effect (ATE), average treatment effect on the treated (ATET) and local average treatment effect (LATE) in the multivariate sample selection model with ordinal selection equations. Our DML estimators are doubly-robust and based on the efficient influence functions. Finite sample properties of the proposed estimators are studied and compared on simulated data. Specifically, the results of the analysis suggest that without addressing multivariate sample selection, the estimates of the causal parameters may be highly biased. However, the proposed estimators allow us to avoid these biases.

Suggested Citation

  • Sofiia Dolgikh & Bodan Potanin, 2025. "Double machine learning for causal inference in a multivariate sample selection model," Papers 2511.12640, arXiv.org.
  • Handle: RePEc:arx:papers:2511.12640
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2511.12640
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. François Bourguignon & Martin Fournier & Marc Gurgand, 2007. "Selection Bias Corrections Based On The Multinomial Logit Model: Monte Carlo Comparisons," Journal of Economic Surveys, Wiley Blackwell, vol. 21(1), pages 174-205, February.
    2. Angela Vossmeyer, 2016. "Sample Selection and Treatment Effect Estimation of Lender of Last Resort Policies," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(2), pages 197-212, April.
    3. Francis Vella, 1998. "Estimating Models with Sample Selection Bias: A Survey," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 127-169.
    4. Elena Kossova & Bogdan Potanin, 2022. "Estimation of Gaussian multinomial endogenous switching model," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 67, pages 121-143.
    5. Dubin, Jeffrey A & McFadden, Daniel L, 1984. "An Econometric Analysis of Residential Electric Appliance Holdings and Consumption," Econometrica, Econometric Society, vol. 52(2), pages 345-362, March.
    6. Alireza Rezaee & Mojtaba Ganjali & Ehsan Bahrami Samani, 2022. "Sample selection bias with multiple dependent selection rules: an application to survey data analysis with multilevel nonresponse," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 158(1), pages 1-15, December.
    7. Mitali Das & Whitney K. Newey & Francis Vella, 2003. "Nonparametric Estimation of Sample Selection Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 70(1), pages 33-58.
    8. Henning, Christian H.C.A. & Henningsen, Arne, 2007. "AJAE Appendix: Modeling Farm Households' Price Responses in the Presence of Transaction Costs and Heterogeneity in Labor Markets," American Journal of Agricultural Economics APPENDICES, Agricultural and Applied Economics Association, vol. 89(3), pages 1-44, August.
    9. Elena Kossova & Bogdan Potanin, 2018. "Heckman method and switching regression model multivariate generalization," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 50, pages 114-143.
    10. Michela Bia & Martin Huber & Lukáš Lafférs, 2024. "Double Machine Learning for Sample Selection Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 958-969, July.
    11. James Heckman, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    12. Guido W. Imbens & Whitney K. Newey, 2009. "Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity," Econometrica, Econometric Society, vol. 77(5), pages 1481-1512, September.
    13. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    14. Jeff Grogger & Eric Eide, 1995. "Changes in College Skills and the Rise in the College Wage Premium," Journal of Human Resources, University of Wisconsin Press, vol. 30(2), pages 280-310.
    15. Christian H.C.A. Henning & Arne Henningsen, 2007. "Modeling Farm Households' Price Responses in the Presence of Transaction Costs and Heterogeneity in Labor Markets," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 89(3), pages 665-681.
    16. Harald Tauchmann, 2010. "Consistency of Heckman-type two-step estimators for the multivariate sample-selection model," Applied Economics, Taylor & Francis Journals, vol. 42(30), pages 3895-3902.
    17. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    18. Poirier, Dale J., 1980. "Partial observability in bivariate probit models," Journal of Econometrics, Elsevier, vol. 12(2), pages 209-217, February.
    19. Sofiia Dolgikh & Bogdan Potanin, 2024. "Returns to different levels of education in Russia," Journal of Economic Studies, Emerald Group Publishing Limited, vol. 51(8), pages 1647-1663, April.
    20. Li Harrison H. & Owen Art B., 2024. "Double machine learning and design in batch adaptive experiments," Journal of Causal Inference, De Gruyter, vol. 12(1), pages 1-27.
    21. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Insan Tunali & Berk Yavuzoglu, 2018. "Edgeworth Expansion Based Correction Of Selectivity Bias In Models Of Double Selection," Working Papers 1802, Nazarbayev University, Department of Economics, revised Nov 2018.
    2. Victor Chernozhukov & Ivan Fernandez-Val & Siyi Luo, 2018. "Distribution regression with sample selection, with an application to wage decompositions in the UK," CeMMAP working papers CWP68/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Mogstad, Magne & Torgovitsky, Alexander, 2024. "Instrumental variables with unobserved heterogeneity in treatment effects," Handbook of Labor Economics,, Elsevier.
    4. Lachos, Victor H. & Prates, Marcos O. & Dey, Dipak K., 2021. "Heckman selection-t model: Parameter estimation via the EM-algorithm," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    5. Hamermesh, Daniel S. & Donald, Stephen G., 2008. "The effect of college curriculum on earnings: An affinity identifier for non-ignorable non-response bias," Journal of Econometrics, Elsevier, vol. 144(2), pages 479-491, June.
    6. Elena Kossova & Bogdan Potanin, 2018. "Heckman method and switching regression model multivariate generalization," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 50, pages 114-143.
    7. Laurent Lamy & Manasa Patnam & Michael Visser, 2023. "Distinguishing incentive from selection effects in auction-determined contracts," Post-Print hal-04382099, HAL.
    8. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    9. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    10. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    11. Lamy, Laurent & Patnam, Manasa & Visser, Michael, 2023. "Distinguishing incentive from selection effects in auction-determined contracts," Journal of Econometrics, Elsevier, vol. 235(2), pages 1172-1202.
    12. Giuseppe De Luca & Franco Peracchi, 2012. "Estimating Engel curves under unit and item nonresponse," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 27(7), pages 1076-1099, November.
    13. Elena Kossova & Bogdan Potanin, 2022. "Estimation of Gaussian multinomial endogenous switching model," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 67, pages 121-143.
    14. Liu, Ruixuan & Yu, Zhengfei, 2022. "Sample selection models with monotone control functions," Journal of Econometrics, Elsevier, vol. 226(2), pages 321-342.
    15. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    16. Rahul Singh, 2021. "Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection," Papers 2111.05277, arXiv.org.
    17. Evan J. Miller-Tait & Sandeep Mohapatra & M. K. (Marty) Luckert & Brent M. Swallow, 2019. "Processing technologies for undervalued grains in rural India: on target to help the poor?," Food Security: The Science, Sociology and Economics of Food Production and Access to Food, Springer;The International Society for Plant Pathology, vol. 11(1), pages 151-166, February.
    18. Paul Ellickson & Sanjog Misra, 2012. "Enriching interactions: Incorporating outcome data into static discrete games," Quantitative Marketing and Economics (QME), Springer, vol. 10(1), pages 1-26, March.
    19. Breustedt, Gunnar & Schulz, Norbert & Latacz-Lohmann, Uwe, 2013. "Kalibrierung von Vertragsnaturschutzprogrammen mittels eines zweistufigen Discrete-Choice-Experimentes," German Journal of Agricultural Economics, Humboldt-Universitaet zu Berlin, Department for Agricultural Economics, vol. 62(04), pages 1-17, November.
    20. Ayadi, Rym & Bongini, Paola & Casu, Barbara & Cucinelli, Doriana, 2025. "The origin of financial instability and systemic risk: Do bank business models matter?," Journal of Financial Stability, Elsevier, vol. 78(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2511.12640. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.