IDEAS home Printed from
   My bibliography  Save this article

Estimation of a regression spline sample selection model


  • Marra, Giampiero
  • Radice, Rosalba


It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures.

Suggested Citation

  • Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
  • Handle: RePEc:eee:csdana:v:61:y:2013:i:c:p:158-173
    DOI: 10.1016/j.csda.2012.12.010

    Download full text from publisher

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, September.
    2. Omori, Yasuhiro & Miyawaki, Koji, 2010. "Tobit model with covariate dependent thresholds," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2736-2752, November.
    3. Li, Phillip, 2011. "Estimation of sample selection models with two selection mechanisms," Computational Statistics & Data Analysis, Elsevier, vol. 55(2), pages 1099-1108, February.
    4. Ahn, H. & Powell, J.L., 1990. "Semiparametric Estimation Of Censored Selection Models With A Nonparametric Selection Mechanism," Working papers 90-33, Wisconsin Madison - Social Systems.
    5. Newey, Whitney K & Powell, James L & Walker, James R, 1990. "Semiparametric Estimation of Selection Models: Some Empirical Results," American Economic Review, American Economic Association, vol. 80(2), pages 324-328, May.
    6. Mealli, Fabrizia & Pacini, Barbara, 2008. "Comparing principal stratification and selection models in parametric causal inference with nonignorable missingness," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 507-516, December.
    7. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    8. Puhani, Patrick A, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    9. Murray D. Smith, 2003. "Modelling sample selection using Archimedean copulas," Econometrics Journal, Royal Economic Society, vol. 6(1), pages 99-123, June.
    10. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Publishing House "SINERGIA PRESS", vol. 31(3), pages 129-137.
    11. Siu Fai Leung & Shihti Yu, 2000. "Collinearity and Two-Step Estimation of Sample Selection Models: Problems, Origins, and Remedies," Computational Economics, Springer;Society for Computational Economics, vol. 15(3), pages 173-199, June.
    12. Sigelman, Lee & Zeng, Langche, 1999. "Analyzing Censored and Sample-Selected Data with Tobit and Heckit Models," Political Analysis, Cambridge University Press, vol. 8(2), pages 167-182, December.
    13. Montmarquette, Claude & Mahseredjian, Sophie & Houle, Rachel, 2001. "The determinants of university dropouts: a bivariate probability model with sample selection," Economics of Education Review, Elsevier, vol. 20(5), pages 475-484, October.
    14. Lee, Lung-fei, 1994. "Semiparametric two-stage estimation of sample selection models subject to Tobit-type selection rules," Journal of Econometrics, Elsevier, vol. 61(2), pages 305-344, April.
    15. Terza, Joseph V., 1998. "Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects," Journal of Econometrics, Elsevier, vol. 84(1), pages 129-154, May.
    16. Toomet, Ott & Henningsen, Arne, 2008. "Sample Selection Models in R: Package sampleSelection," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 27(i07).
    17. Francis Vella, 1998. "Estimating Models with Sample Selection Bias: A Survey," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 127-169.
    18. Boyes, William J. & Hoffman, Dennis L. & Low, Stuart A., 1989. "An econometric analysis of the bank credit scoring problem," Journal of Econometrics, Elsevier, vol. 40(1), pages 3-14, January.
    19. van Hasselt, Martijn, 2011. "Bayesian inference in a sample selection model," Journal of Econometrics, Elsevier, vol. 165(2), pages 221-232.
    20. Mitali Das & Whitney K. Newey & Francis Vella, 2003. "Nonparametric Estimation of Sample Selection Models," Review of Economic Studies, Oxford University Press, vol. 70(1), pages 33-58.
    21. Simon N. Wood, 2004. "Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 673-686, January.
    22. Lee, Lung-Fei, 1984. "Tests for the Bivariate Normal Distribution in Econometric Models with Selectivity," Econometrica, Econometric Society, vol. 52(4), pages 843-863, July.
    23. Manuel Wiesenfarth & Thomas Kneib, 2010. "Bayesian geoadditive sample selection models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 381-404, May.
    24. Yee, Thomas W., 2010. "The VGAM Package for Categorical Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i10).
    25. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, September.
    26. Philip T. Reiss & R. Todd Ogden, 2009. "Smoothing parameter selection for a class of semiparametric linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 505-523, April.
    27. Giampiero Marra & Simon N. Wood, 2012. "Coverage Properties of Confidence Intervals for Generalized Additive Model Components," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 39(1), pages 53-74, March.
    28. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    29. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Sengupta, Reshmi & Rooj, Debasis, 2019. "The effect of health insurance on hospitalization: Identification of adverse selection, moral hazard and the vulnerable population in the Indian healthcare market," World Development, Elsevier, vol. 122(C), pages 110-129.
    2. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    3. Hajime Seya & Junyi Zhang & Makoto Chikaraishi & Ying Jiang, 2020. "Decisions on truck parking place and time on expressways: an analysis using digital tachograph data," Transportation, Springer, vol. 47(2), pages 555-583, April.
    4. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    5. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    6. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:61:y:2013:i:c:p:158-173. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Haili He). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.