IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v33y2018i3d10.1007_s00180-017-0762-y.html
   My bibliography  Save this article

Sample selection models for count data in R

Author

Listed:
  • Karol Wyszynski

    (University College London)

  • Giampiero Marra

    (University College London)

Abstract

We provide a detailed hands-on tutorial for the R package SemiParSampleSel (version 1.5). The package implements selection models for count responses fitted by penalized maximum likelihood estimation. The approach can deal with non-random sample selection, flexible covariate effects, heterogeneous selection mechanisms and varying distributional parameters. We provide an overview of the theoretical background and then demonstrate how SemiParSampleSel can be used to fit interpretable models of different complexity. We use data from the German Socio-Economic Panel survey (SOEP v28, 2012. doi: 10.5684/soep.v28 ) throughout the tutorial.

Suggested Citation

  • Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
  • Handle: RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-017-0762-y
    DOI: 10.1007/s00180-017-0762-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-017-0762-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-017-0762-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    2. William H. Greene, 1997. "FIML Estimation of Sample Selection Models for Count Data," Working Papers 97-02, New York University, Leonard N. Stern School of Business, Department of Economics.
    3. Chen, Songnian & Zhou, Yahong, 2010. "Semiparametric and nonparametric estimation of sample selection models under symmetry," Journal of Econometrics, Elsevier, vol. 157(1), pages 143-150, July.
    4. Genest, Christian & Nešlehová, Johanna, 2007. "A Primer on Copulas for Count Data," ASTIN Bulletin, Cambridge University Press, vol. 37(2), pages 475-515, November.
    5. Gronau, Reuben, 1974. "Wage Comparisons-A Selectivity Bias," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1119-1143, Nov.-Dec..
    6. Manuel Wiesenfarth & Thomas Kneib, 2010. "Bayesian geoadditive sample selection models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 381-404, May.
    7. Alfonso Miranda, 2004. "FIML estimation of an endogenous switching model for count data," Stata Journal, StataCorp LP, vol. 4(1), pages 40-49, March.
    8. Alfonso Miranda & Sophia Rabe-Hesketh, 2006. "Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables," Stata Journal, StataCorp LP, vol. 6(3), pages 285-308, September.
    9. Ding, Peng, 2014. "Bayesian robust inference of sample selection using selection-t models," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 451-464.
    10. Brechmann, Eike Christian & Schepsmeier, Ulf, 2013. "Modeling Dependence with C- and D-Vine Copulas: The R Package CDVine," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i03).
    11. R. Winkelmann, 1998. "Count data models with selectivity," Econometric Reviews, Taylor & Francis Journals, vol. 17(4), pages 339-359.
    12. Clarke, Kevin A., 2007. "A Simple Distribution-Free Test for Nonnested Model Selection," Political Analysis, Cambridge University Press, vol. 15(3), pages 347-363, July.
    13. Lee, Lung-Fei, 1994. "Semiparametric instrumental variable estimation of simultaneous equation sample selection models," Journal of Econometrics, Elsevier, vol. 63(2), pages 341-388, August.
    14. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    15. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506.
    16. Zimmer, David M. & Trivedi, Pravin K., 2006. "Using Trivariate Copulas to Model Sample Selection and Treatment Effects: Application to Family Health Care Demand," Journal of Business & Economic Statistics, American Statistical Association, vol. 24, pages 63-76, January.
    17. Massimiliano Bratti & Alfonso Miranda, 2011. "Endogenous treatment effects for count data models with endogenous participation or sample selection," Health Economics, John Wiley & Sons, Ltd., vol. 20(9), pages 1090-1109, September.
    18. Giampiero Marra & Simon N. Wood, 2012. "Coverage Properties of Confidence Intervals for Generalized Additive Model Components," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 39(1), pages 53-74, March.
    19. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    20. Bhat, Chandra R. & Eluru, Naveen, 2009. "A copula-based approach to accommodate residential self-selection effects in travel behavior modeling," Transportation Research Part B: Methodological, Elsevier, vol. 43(7), pages 749-765, August.
    21. Murray D. Smith, 2003. "Modelling sample selection using Archimedean copulas," Econometrics Journal, Royal Economic Society, vol. 6(1), pages 99-123, June.
    22. Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
    23. Terza, Joseph V., 1998. "Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects," Journal of Econometrics, Elsevier, vol. 84(1), pages 129-154, May.
    24. Whitney K. Newey, 2009. "Two-step series estimation of sample selection models," Econometrics Journal, Royal Economic Society, vol. 12(s1), pages 217-229, January.
    25. Toomet, Ott & Henningsen, Arne, 2008. "Sample Selection Models in R: Package sampleSelection," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i07).
    26. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    27. Gallant, A Ronald & Nychka, Douglas W, 1987. "Semi-nonparametric Maximum Likelihood Estimation," Econometrica, Econometric Society, vol. 55(2), pages 363-390, March.
    28. Hasebe, Takuya & Vijverberg, Wim P., 2012. "A Flexible Sample Selection Model: A GTL-Copula Approach," IZA Discussion Papers 7003, Institute of Labor Economics (IZA).
    29. Greene, William, 1998. "Sample selection in credit-scoring models1," Japan and the World Economy, Elsevier, vol. 10(3), pages 299-316, July.
    30. Mitali Das & Whitney K. Newey & Francis Vella, 2003. "Nonparametric Estimation of Sample Selection Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 70(1), pages 33-58.
    31. Simon N. Wood, 2004. "Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 673-686, January.
    32. Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
    33. Heckman, James J, 1990. "Varieties of Selection Bias," American Economic Review, American Economic Association, vol. 80(2), pages 313-318, May.
    34. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    35. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167.
    36. Lewis, H Gregg, 1974. "Comments on Selectivity Biases in Wage Comparisons," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1145-1155, Nov.-Dec..
    37. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    38. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    2. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    3. Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
    4. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    5. Marra Giampiero & Radice Rosalba, 2017. "A joint regression modeling framework for analyzing bivariate binary data in R," Dependence Modeling, De Gruyter, vol. 5(1), pages 268-294, December.
    6. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    7. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    8. Marra, Giampiero & Radice, Rosalba, 2017. "Bivariate copula additive models for location, scale and shape," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 99-113.
    9. Maike Hohberg & Francesco Donat & Giampiero Marra & Thomas Kneib, 2021. "Beyond unidimensional poverty analysis using distributional copula models for mixed ordered‐continuous outcomes," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(5), pages 1365-1390, November.
    10. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.
    11. Liu, Ruixuan & Yu, Zhengfei, 2022. "Sample selection models with monotone control functions," Journal of Econometrics, Elsevier, vol. 226(2), pages 321-342.
    12. Schmidt, Rouven & Kneib, Thomas, 2023. "Multivariate distributional stochastic frontier models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    13. James E. Prieger, "undated". "A Generalized Parametric Selection Model for Non-Normal Data," Department of Economics 00-09, California Davis - Department of Economics.
    14. Nadja Klein & Thomas Kneib & Giampiero Marra & Rosalba Radice & Slawa Rokicki & Mark E. McGovern, 2018. "Mixed Binary-Continuous Copula Regression Models with Application to Adverse Birth Outcomes," CHaRMS Working Papers 18-06, Centre for HeAlth Research at the Management School (CHaRMS).
    15. Massimiliano Bratti & Alfonso Miranda, 2011. "Endogenous treatment effects for count data models with endogenous participation or sample selection," Health Economics, John Wiley & Sons, Ltd., vol. 20(9), pages 1090-1109, September.
    16. Chen, Heng & Fan, Yanqin & Wu, Jisong, 2014. "A flexible parametric approach for estimating switching regime models and treatment effect parameters," Journal of Econometrics, Elsevier, vol. 181(2), pages 77-91.
    17. Zhewen Pan, 2023. "On semiparametric estimation of the intercept of the sample selection model: a kernel approach," Papers 2302.05089, arXiv.org.
    18. Adelchi Azzalini & Hyoung-Moon Kim & Hea-Jung Kim, 2019. "Sample selection models for discrete and other non-Gaussian response variables," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(1), pages 27-56, March.
    19. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2013. "Nonlife Ratemaking and Risk Management with Bayesian Additive Models for Location, Scale and Shape," LIDAM Discussion Papers ISBA 2013045, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    20. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-017-0762-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.