IDEAS home Printed from
   My bibliography  Save this article

A Binary Choice Model with Sample Selection and Covariate-Related Misclassification


  • Jorge González Chapela

    (Academia General Militar, Centro Universitario de la Defensa de Zaragoza, 50090 Zaragoza, Spain)


Misclassification of a binary response variable and nonrandom sample selection are data issues frequently encountered by empirical researchers. For cases in which both issues feature simultaneously in a data set, we formulate a sample selection model for a misclassified binary outcome in which the conditional probabilities of misclassification are allowed to depend on covariates. Assuming the availability of validation data, the pseudo-maximum likelihood technique can be used to estimate the model. The performance of the estimator accounting for misclassification and sample selection is compared to that of estimators offering partial corrections. An empirical example illustrates the proposed framework.

Suggested Citation

  • Jorge González Chapela, 2022. "A Binary Choice Model with Sample Selection and Covariate-Related Misclassification," Econometrics, MDPI, vol. 10(2), pages 1-20, March.
  • Handle: RePEc:gam:jecnmx:v:10:y:2022:i:2:p:13-:d:777879

    Download full text from publisher

    File URL:
    Download Restriction: no

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. Thomas Dohmen & Armin Falk & David Huffman & Uwe Sunde, 2010. "Are Risk Aversion and Impatience Related to Cognitive Ability?," American Economic Review, American Economic Association, vol. 100(3), pages 1238-1260, June.
    2. Jean-Louis Arcand & Linguère M'Baye, 2013. "Braving the waves: the role of time and risk preferences in illegal migration from Senegal," CERDI Working papers halshs-00855937, HAL.
    3. Gary S. Becker & Casey B. Mulligan, 1997. "The Endogenous Determination of Time Preference," The Quarterly Journal of Economics, Oxford University Press, vol. 112(3), pages 729-758.
    4. William W. Gould & Jeffrey Pitblado & Brian Poi, 2010. "Maximum Likelihood Estimation with Stata," Stata Press books, StataCorp LP, edition 4, number ml4, August.
    5. Bound, John & Brown, Charles & Mathiowetz, Nancy, 2001. "Measurement error in survey data," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 5, chapter 59, pages 3705-3843, Elsevier.
    6. Aller, Carlos & González Chapela, Jorge, 2013. "Misclassification of the dependent variable in a debt–repayment behavior context," Journal of Empirical Finance, Elsevier, vol. 23(C), pages 162-172.
    7. Meyer, Bruce D. & Mittag, Nikolas, 2017. "Misclassification in binary choice models," Journal of Econometrics, Elsevier, vol. 200(2), pages 295-311.
    8. Raven Molloy & Christopher L. Smith & Abigail Wozniak, 2011. "Internal Migration in the United States," Journal of Economic Perspectives, American Economic Association, vol. 25(3), pages 173-196, Summer.
    9. Gibson, John & McKenzie, David, 2011. "The microeconomic determinants of emigration and return migration of the best and brightest: Evidence from the Pacific," Journal of Development Economics, Elsevier, vol. 95(1), pages 18-29, May.
    10. Bollinger, Christopher R & David, Martin H, 2001. "Estimation with Response Error and Nonresponse: Food-Stamp Participation in the SIPP," Journal of Business & Economic Statistics, American Statistical Association, vol. 19(2), pages 129-141, April.
    11. Jonathan Cohen & Keith Marzilli Ericson & David Laibson & John Myles White, 2020. "Measuring Time Preferences," Journal of Economic Literature, American Economic Association, vol. 58(2), pages 299-347, June.
    12. Aline Bütikofer & Giovanni Peri, 2021. "How Cognitive Ability and Personality Traits Affect Geographic Mobility," Journal of Labor Economics, University of Chicago Press, vol. 39(2), pages 559-595.
    13. Bruce Meyer & Nikolas Mittag, 2013. "Misclassification In Binary Choice Models," Working Papers 13-27, Center for Economic Studies, U.S. Census Bureau.
    14. Maria Felice Arezzo & Giuseppina Guagnano, 2019. "Misclassification in Binary Choice Models with Sample Selection," Econometrics, MDPI, vol. 7(3), pages 1-19, July.
    15. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521747387, December.
    16. Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984. "Pseudo Maximum Likelihood Methods: Theory," Econometrica, Econometric Society, vol. 52(3), pages 681-700, May.
    17. Van de Ven, Wynand P. M. M. & Van Praag, Bernard M. S., 1981. "The demand for deductibles in private health insurance : A probit model with sample selection," Journal of Econometrics, Elsevier, vol. 17(2), pages 229-252, November.
    18. Ramalho, Esmeralda A., 2002. "Regression models for choice-based samples with misclassification in the response variable," Journal of Econometrics, Elsevier, vol. 106(1), pages 171-201, January.
    19. Poterba, James M & Summers, Lawrence H, 1995. "Unemployment Benefits and Labor Market Transitions: A Multinomial Logit Model with Errors in Classification," The Review of Economics and Statistics, MIT Press, vol. 77(2), pages 207-216, May.
    20. Butler, J S, 1996. "Estimating the Correlation in Censored Probit Models," The Review of Economics and Statistics, MIT Press, vol. 78(2), pages 356-358, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jorge González Chapela, 2022. "Is there a patience premium on migration?," Empirical Economics, Springer, vol. 63(4), pages 2025-2055, October.
    2. González Chapela, Jorge, 2020. "Patience goes a long way: Evidence from Spain," MPRA Paper 98711, University Library of Munich, Germany.
    3. Massimiliano Bratti & Alfonso Miranda, 2010. "Non‐pecuniary returns to higher education: the effect on smoking intensity in the UK," Health Economics, John Wiley & Sons, Ltd., vol. 19(8), pages 906-920, August.
    4. Meyer, Bruce D. & Mittag, Nikolas, 2021. "An empirical total survey error decomposition using data combination," Journal of Econometrics, Elsevier, vol. 224(2), pages 286-305.
    5. Molinari, Francesca, 2008. "Partial identification of probability distributions with misclassified data," Journal of Econometrics, Elsevier, vol. 144(1), pages 81-117, May.
    6. Kureishi, Wataru & Paule-Paludkiewicz, Hannah & Tsujiyama, Hitoshi & Wakabayashi, Midori, 2021. "Time preferences over the life cycle and household saving puzzles," Journal of Monetary Economics, Elsevier, vol. 124(C), pages 123-139.
    7. Bruckmeier, Kerstin & Riphahn, Regina T. & Wiemers, Jürgen, 2019. "Benefit underreporting in survey data and its consequences for measuring non-take-up: new evidence from linked administrative and survey data," IAB-Discussion Paper 201906, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    8. Zhang, Han, 2021. "How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It," SocArXiv 453jk, Center for Open Science.
    9. Meyer, Bruce D. & Mittag, Nikolas, 2017. "Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness and Holes in the Safety Net," IZA Discussion Papers 10943, Institute of Labor Economics (IZA).
    10. Yingyao Hu & Zhongjian Lin, 2018. "Misclassification and the hidden silent rivalry," CeMMAP working papers CWP12/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    11. Bruce Meyer & Nikolas Mittag, 2017. "Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness and Holes in the Safety Net," Working Papers 2017-075, Human Capital and Economic Opportunity Working Group.
    12. Meyer, Bruce D. & Mittag, Nikolas, 2017. "Misclassification in binary choice models," Journal of Econometrics, Elsevier, vol. 200(2), pages 295-311.
    13. Colm Harmon & Claire Finn, 2006. "A dynamic model of demand for private health insurance in Ireland," Open Access publications 10197/666, School of Economics, University College Dublin.
    14. Yokoo, Hide-Fumi & Arimura, Toshi H. & Chattopadhyay, Mriduchhanda & Katayama, Hajime, 2023. "Subjective risk belief function in the field: Evidence from cooking fuel choices and health in India," Journal of Development Economics, Elsevier, vol. 161(C).
    15. Brenøe, Anne Ardila & Epper, Thomas, 2022. "Parenting values and the intergenerational transmission of time preferences," European Economic Review, Elsevier, vol. 148(C).
    16. Reinhard A. Weisser, 2020. "How Personality Shapes Study Location Choices," Research in Higher Education, Springer;Association for Institutional Research, vol. 61(1), pages 88-116, February.
    17. Aysit Tansel & Ceyhan Ozturk & Erkan Erdil, 2021. "The Impact of Body Mass Index on Growth, Schooling, Productivity, and Savings: A Cross-Country Study," Koç University-TUSIAD Economic Research Forum Working Papers 2118, Koc University-TUSIAD Economic Research Forum.
    18. Kelley, Clare & Lanot, Gauthier, 2002. "Consumption Patterns Over Pay Periods," Economic Research Papers 269469, University of Warwick - Department of Economics.
    19. Giovanni Compiani & Yuichi Kitamura, 2016. "Using mixtures in econometric models: a brief review and some new results," Econometrics Journal, Royal Economic Society, vol. 19(3), pages 95-127, October.
    20. Bastianin, Andrea & Castelnovo, Paolo & Florio, Massimo, 2018. "Evaluating regulatory reform of network industries: a survey of empirical models based on categorical proxies," Utilities Policy, Elsevier, vol. 55(C), pages 115-128.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jecnmx:v:10:y:2022:i:2:p:13-:d:777879. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.