IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v071i06.html
   My bibliography  Save this article

Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel

Author

Listed:
  • Wojtyś, Magorzata
  • Marra, Giampiero
  • Radice, Rosalba

Abstract

Sample selection models deal with the situation in which an outcome of interest is observed for a restricted non-randomly selected sample of the population. The estimation of these models is based on a binary equation, which describes the selection process, and an outcome equation, which is used to examine the substantive question of interest. Classic sample selection models assume a priori that continuous covariates have a linear or pre-specified non-linear relationship to the outcome, and that the distribution linking the two equations is bivariate normal. We introduce the R package SemiParSampleSel which implements copula regression spline sample selection models. The proposed implementation can deal with non-random sample selection, non-linear covariate-response relationships, and non-normal bivariate distributions between the model equations. We provide details of the model and algorithm and describe the implementation in SemiParSampleSel. The package is illustrated using simulated and real data examples.

Suggested Citation

  • Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
  • Handle: RePEc:jss:jstsof:v:071:i06
    DOI: http://hdl.handle.net/10.18637/jss.v071.i06
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v071i06/v71i06.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v071i06/SemiParSampleSel_1.4.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v071i06/v71i06.R
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v071i06/ND.dat
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v071.i06?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bhat, Chandra R. & Eluru, Naveen, 2009. "A copula-based approach to accommodate residential self-selection effects in travel behavior modeling," Transportation Research Part B: Methodological, Elsevier, vol. 43(7), pages 749-765, August.
    2. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    3. Whitney K. Newey, 2009. "Two-step series estimation of sample selection models," Econometrics Journal, Royal Economic Society, vol. 12(s1), pages 217-229, January.
    4. Young‐Ju Kim & Chong Gu, 2004. "Smoothing spline Gaussian regression: more scalable computation via efficient approximation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(2), pages 337-356, May.
    5. Hasebe, Takuya & Vijverberg, Wim P., 2012. "A Flexible Sample Selection Model: A GTL-Copula Approach," IZA Discussion Papers 7003, Institute of Labor Economics (IZA).
    6. Manuel Wiesenfarth & Thomas Kneib, 2010. "Bayesian geoadditive sample selection models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 381-404, May.
    7. Lee, Lung-fei, 1994. "Semiparametric two-stage estimation of sample selection models subject to Tobit-type selection rules," Journal of Econometrics, Elsevier, vol. 61(2), pages 305-344, April.
    8. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167.
    9. Ding, Peng, 2014. "Bayesian robust inference of sample selection using selection-t models," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 451-464.
    10. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    11. Brechmann, Eike Christian & Schepsmeier, Ulf, 2013. "Modeling Dependence with C- and D-Vine Copulas: The R Package CDVine," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i03).
    12. Margarita Genius & Elisabetta Strazzera, 2008. "Applying the copula approach to sample selection modelling," Applied Economics, Taylor & Francis Journals, vol. 40(11), pages 1443-1455.
    13. Vella, F, 1992. "Simple Tests for Sample Selection Bias in Censored and Discrete Choice Models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 7(4), pages 413-421, Oct.-Dec..
    14. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506.
    15. Zimmer, David M. & Trivedi, Pravin K., 2006. "Using Trivariate Copulas to Model Sample Selection and Treatment Effects: Application to Family Health Care Demand," Journal of Business & Economic Statistics, American Statistical Association, vol. 24, pages 63-76, January.
    16. Giampiero Marra & Simon N. Wood, 2012. "Coverage Properties of Confidence Intervals for Generalized Additive Model Components," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 39(1), pages 53-74, March.
    17. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    18. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Elena Geminiani & Giampiero Marra & Irini Moustaki, 2021. "Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection," Psychometrika, Springer;The Psychometric Society, vol. 86(1), pages 65-95, March.
    2. Marra Giampiero & Radice Rosalba, 2017. "A joint regression modeling framework for analyzing bivariate binary data in R," Dependence Modeling, De Gruyter, vol. 5(1), pages 268-294, December.
    3. Giampiero Marra & Rosalba Radice & David Zimmer, 2021. "Did the ACA's “guaranteed issue” provision cause adverse selection into nongroup insurance? Analysis using a copula‐based hurdle model," Health Economics, John Wiley & Sons, Ltd., vol. 30(9), pages 2246-2263, September.
    4. Maike Hohberg & Francesco Donat & Giampiero Marra & Thomas Kneib, 2021. "Beyond unidimensional poverty analysis using distributional copula models for mixed ordered‐continuous outcomes," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(5), pages 1365-1390, November.
    5. David Zimmer, 2018. "Using copulas to estimate the coefficient of a binary endogenous regressor in a Poisson regression: Application to the effect of insurance on doctor visits," Health Economics, John Wiley & Sons, Ltd., vol. 27(3), pages 545-556, March.
    6. Maciej Berȩsewicz & Dagmara Nikulin, 2021. "Estimation of the size of informal employment based on administrative records with non‐ignorable selection mechanism," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 667-690, June.
    7. Tibi Didier Zoungrana, 2021. "The effect of wealth on the choice of household drinking water sources in West Africa," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 26(2), pages 2241-2250, April.
    8. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    9. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    10. David M. Zimmer, 2022. "Investigating the dynamic interdependency between poverty and marital separation," Review of Economics of the Household, Springer, vol. 20(4), pages 1239-1254, December.
    11. Mussida Chiara & Zanin Luca, 2019. "Voluntary Mobility of Employees for Better Job Opportunities Given a Temporary Contract: Insights Regarding an Age-Varying Association Between the Two Events," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 19(2), pages 1-27, April.
    12. Sengupta, Reshmi & Rooj, Debasis, 2019. "The effect of health insurance on hospitalization: Identification of adverse selection, moral hazard and the vulnerable population in the Indian healthcare market," World Development, Elsevier, vol. 122(C), pages 110-129.
    13. Maciej Berk{e}sewicz & Dagmara Nikulin, 2019. "Estimation of the size of informal employment based on administrative records with non-ignorable selection mechanism," Papers 1906.10957, arXiv.org.
    14. Nicolai Hans & Nadja Klein & Florian Faschingbauer & Michael Schneider & Andreas Mayr, 2023. "Boosting distributional copula regression," Biometrics, The International Biometric Society, vol. 79(3), pages 2298-2310, September.
    15. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    16. Giampiero Marra & Rosalba Radice & David M. Zimmer, 2020. "Estimating the binary endogenous effect of insurance on doctor visits by copula‐based regression additive models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 953-971, August.
    17. Schmidt, Rouven & Kneib, Thomas, 2023. "Multivariate distributional stochastic frontier models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    18. Hamori, Shigeyuki & Motegi, Kaiji & Zhang, Zheng, 2019. "Calibration estimation of semiparametric copula models with data missing at random," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 85-109.
    19. Machado, Robson J.M. & van den Hout, Ardo & Marra, Giampiero, 2021. "Penalised maximum likelihood estimation in multi-state models for interval-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    20. Pierfrancesco Alaimo Di Loro & Daria Scacciatelli & Giovanna Tagliaferri, 2023. "2-step Gradient Boosting approach to selectivity bias correction in tax audit: an application to the VAT gap in Italy," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(1), pages 237-270, March.
    21. Chiara Mussida & Luca Zanin, 2020. "I found a better job opportunity! Voluntary job mobility of employees and temporary contracts before and after the great recession in France, Italy and Spain," Empirical Economics, Springer, vol. 59(1), pages 47-98, July.
    22. Geminiani, Elena & Marra, Giampiero & Moustaki, Irini, 2021. "Single and multiple-group penalized factor analysis: a trust-region algorithm approach with integrated automatic multiple tuning parameter selection," LSE Research Online Documents on Economics 108873, London School of Economics and Political Science, LSE Library.
    23. Huihui Lin & N. Rao Chaganty, 2021. "Multivariate distributions of correlated binary variables generated by pair-copulas," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-14, December.
    24. Burli, Pralhad & Lal, Pankaj & Wolde, Bernabas & Jose, Shibu & Bardhan, Sougata, 2021. "Perceptions about switchgrass and land allocation decisions: Evidence from a farmer survey in Missouri," Land Use Policy, Elsevier, vol. 109(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    2. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    3. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    4. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    5. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    6. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    7. Marra, Giampiero & Radice, Rosalba, 2017. "Bivariate copula additive models for location, scale and shape," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 99-113.
    8. Pigini Claudia, 2015. "Bivariate Non-Normality in the Sample Selection Model," Journal of Econometric Methods, De Gruyter, vol. 4(1), pages 1-22, January.
    9. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    10. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.
    11. Nathaniel E. Helwig, 2022. "Robust Permutation Tests for Penalized Splines," Stats, MDPI, vol. 5(3), pages 1-18, September.
    12. Longhi, Christian & Musolesi, Antonio & Baumont, Catherine, 2014. "Modeling structural change in the European metropolitan areas during the process of economic integration," Economic Modelling, Elsevier, vol. 37(C), pages 395-407.
    13. Lauren N. Berry & Nathaniel E. Helwig, 2021. "Cross-Validation, Information Theory, or Maximum Likelihood? A Comparison of Tuning Methods for Penalized Splines," Stats, MDPI, vol. 4(3), pages 1-24, September.
    14. Nagler Thomas & Czado Claudia & Schellhase Christian, 2017. "Nonparametric estimation of simplified vine copula models: comparison of methods," Dependence Modeling, De Gruyter, vol. 5(1), pages 99-120, January.
    15. Nadja Klein & Thomas Kneib & Giampiero Marra & Rosalba Radice & Slawa Rokicki & Mark E. McGovern, 2018. "Mixed Binary-Continuous Copula Regression Models with Application to Adverse Birth Outcomes," CHaRMS Working Papers 18-06, Centre for HeAlth Research at the Management School (CHaRMS).
    16. Øystein Sørensen & Anders M. Fjell & Kristine B. Walhovd, 2023. "Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 456-486, June.
    17. Maike Hohberg & Francesco Donat & Giampiero Marra & Thomas Kneib, 2021. "Beyond unidimensional poverty analysis using distributional copula models for mixed ordered‐continuous outcomes," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(5), pages 1365-1390, November.
    18. Chen, Heng & Fan, Yanqin & Wu, Jisong, 2014. "A flexible parametric approach for estimating switching regime models and treatment effect parameters," Journal of Econometrics, Elsevier, vol. 181(2), pages 77-91.
    19. Hasebe, Takuya & Vijverberg, Wim P., 2012. "A Flexible Sample Selection Model: A GTL-Copula Approach," IZA Discussion Papers 7003, Institute of Labor Economics (IZA).
    20. Seebens, Holger, 2009. "Child Welfare and Old-Age Security in Female Headed Households in Tanzania," IZA Discussion Papers 3929, Institute of Labor Economics (IZA).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:071:i06. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.