IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2510.05551.html
   My bibliography  Save this paper

Correcting sample selection bias with categorical outcomes

Author

Listed:
  • Onil Boussim

Abstract

In this paper, I propose a method for correcting sample selection bias when the outcome of interest is categorical, such as occupational choice, health status, or field of study. Classical approaches to sample selection rely on strong parametric distributional assumptions, which may be restrictive in practice. I develop a local representation that decomposes each joint probability into marginal probabilities and a category-specific association parameter that captures how selection differentially affects each outcome. Under some exclusion restrictions, I establish nonparametric point identification of the latent categorical distribution. Building on this identification result, I introduce a semiparametric multinomial logit model with sample selection, propose a computationally tractable two-step estimator, and derive its asymptotic properties. I illustrate the method by studying the determinants of healthcare utilization in C\^ote d'Ivoire.

Suggested Citation

  • Onil Boussim, 2025. "Correcting sample selection bias with categorical outcomes," Papers 2510.05551, arXiv.org, revised Nov 2025.
  • Handle: RePEc:arx:papers:2510.05551
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2510.05551
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sukjin Han & Sungwon Lee, 2019. "Estimation in a generalization of bivariate probit models with dummy endogenous regressors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(6), pages 994-1015, September.
    2. Francis Vella, 1998. "Estimating Models with Sample Selection Bias: A Survey," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 127-169.
    3. Ali, Mir M. & Mikhail, N. N. & Haq, M. Safiul, 1978. "A class of bivariate distributions including the bivariate logistic," Journal of Multivariate Analysis, Elsevier, vol. 8(3), pages 405-412, September.
    4. Victor Chernozhukov & Iv'an Fern'andez-Val & Sukjin Han & Kaspar Wuthrich, 2024. "Estimating Causal Effects of Discrete and Continuous Treatments with Binary Instruments," Papers 2403.05850, arXiv.org, revised Dec 2024.
    5. Van de Ven, Wynand P. M. M. & Van Praag, Bernard M. S., 1981. "The demand for deductibles in private health insurance : A probit model with sample selection," Journal of Econometrics, Elsevier, vol. 17(2), pages 229-252, November.
    6. James Heckman, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    7. Adelchi Azzalini & Hyoung-Moon Kim & Hea-Jung Kim, 2019. "Sample selection models for discrete and other non-Gaussian response variables," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(1), pages 27-56, March.
    8. Han, Sukjin & Vytlacil, Edward J., 2017. "Identification in a generalization of bivariate probit models with dummy endogenous regressors," Journal of Econometrics, Elsevier, vol. 199(1), pages 63-73.
    9. Freedman, David A. & Sekhon, Jasjeet S., 2010. "Endogeneity in Probit Response Models," Political Analysis, Cambridge University Press, vol. 18(2), pages 138-150, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Onil Boussim, 2025. "Identifying treatment effects on categorical outcomes in IV models," Papers 2510.10946, arXiv.org, revised Nov 2025.
    2. de Grange, Louis & González, Felipe & Marechal, Matthieu & Troncoso, Rodrigo, 2024. "A consistent moment equations for binary probit models with endogenous variables using instrumental variables," Journal of choice modelling, Elsevier, vol. 53(C).
    3. Watanabe, Hajime & Maruyama, Takuya, 2024. "A Bayesian sample selection model with a binary outcome for handling residential self-selection in individual car ownership," Journal of choice modelling, Elsevier, vol. 51(C).
    4. S. I. Dolgikh & B. S. Potanin, 2023. "The Impact of Public Administration on the Efficiency of Russian Firms," Studies on Russian Economic Development, Springer, vol. 34(1), pages 59-67, February.
    5. Tocco, Barbara & Bailey, Alastair & Davidova, Sophia, 2013. "Determinants to Leave Agriculture and Change Occupational Sector: Evidence from an Enlarged EU," Working papers 155704, Factor Markets, Centre for European Policy Studies.
    6. Hessami, Zohal & Resnjanskij, Sven, 2019. "Complex ballot propositions, individual voting behavior, and status quo bias," European Journal of Political Economy, Elsevier, vol. 58(C), pages 82-101.
    7. Glenn W. Harrison & Morten I. Lau & Hong Il Yoo, 2020. "Risk Attitudes, Sample Selection, and Attrition in a Longitudinal Field Experiment," The Review of Economics and Statistics, MIT Press, vol. 102(3), pages 552-568, July.
    8. Sukjin Han & Sungwon Lee, 2019. "Estimation in a generalization of bivariate probit models with dummy endogenous regressors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(6), pages 994-1015, September.
    9. Victor Chernozhukov & Ivan Fernandez-Val & Siyi Luo, 2023. "Distribution regression with sample selection and UK wage decomposition," CeMMAP working papers 09/23, Institute for Fiscal Studies.
    10. Victor Chernozhukov & Ivan Fernandez-Val & Siyi Luo, 2018. "Distribution regression with sample selection, with an application to wage decompositions in the UK," CeMMAP working papers CWP68/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    11. Alfonso Miranda & Sophia Rabe-Hesketh, 2006. "Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables," Stata Journal, StataCorp LLC, vol. 6(3), pages 285-308, September.
    12. Lina Zhang & David T. Frazier & D.S. Poskitt & Xueyan Zhao, 2025. "Decomposing identification gains and evaluating instrument identification power for partially identified average treatment effects," Econometric Reviews, Taylor & Francis Journals, vol. 44(7), pages 915-938, August.
    13. Dominik Becker, 2013. "The impact of teachers’ expectations on students’ educational opportunities in the life course: An empirical test of a subjective expected utility explanation," Rationality and Society, , vol. 25(4), pages 422-469, November.
    14. Kumar, Kaushalendra & Singh, Abhishek & James, K.S. & McDougal, Lotus & Raj, Anita, 2020. "Gender bias in hospitalization financing from borrowings, selling of assets, contribution from relatives or friends in India," Social Science & Medicine, Elsevier, vol. 260(C).
    15. Ivlevs Artjoms & Hinks Timothy, 2015. "Bribing Behaviour and Sample Selection: Evidence from Post-Socialist Countries and Western Europe," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 235(2), pages 139-167, April.
    16. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.
    17. Maarten Goos & Anna Salomons, 2017. "Measuring teaching quality in higher education: assessing selection bias in course evaluations," Research in Higher Education, Springer;Association for Institutional Research, vol. 58(4), pages 341-364, June.
    18. Eun-Ju Lee & David Eastwood & Jinkook Lee, 2004. "A Sample Selection Model of Consumer Adoption of Computer Banking," Journal of Financial Services Research, Springer;Western Finance Association, vol. 26(3), pages 263-275, December.
    19. Klein, Roger & Shen, Chan & Vella, Francis, 2015. "Estimation of marginal effects in semiparametric selection models with binary outcomes," Journal of Econometrics, Elsevier, vol. 185(1), pages 82-94.
    20. Maria Felice Arezzo & Giuseppina Guagnano, 2018. "Response-Based Sampling for Binary Choice Models With Sample Selection," Econometrics, MDPI, vol. 6(1), pages 1-17, March.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2510.05551. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.