IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v62y2021i4d10.1007_s00362-019-01152-5.html
   My bibliography  Save this article

Semi-parametric regression when some (expensive) covariates are missing by design

Author

Listed:
  • Göran Kauermann

    (Ludwig-Maximilians-Universität München)

  • Mehboob Ali

    (Ludwig-Maximilians-Universität München)

Abstract

The paper deals with the scenario where some covariates are observed by design for a subset of the observations only. In the example treated in the paper this occurs with a two phase sampling scheme where in the first phase a relatively large sample is drawn to record a response variable Y and a set of (cheap) covariates x. In a second phase a smaller sample is drawn from the first phase sample where additional (usually expensive) covariates z are also recorded. The second phase can be drawn with unequal probability sampling, where the sampling weights depend on the observed Y and x. The overall intention is to fit a regression model of Y on both, x and z. Due to the design of the data collection we are faced with missing values for z for a majority of observations. We propose an approximate estimation approach using semi-parametric mean and variance regression of Y on x only and augment this fit with a full regression model of Y on x and z. The idea extends the approach of Little (1992) towards non-normal data and non-linear models. The proposed estimation is numerically rather simple and performs convincingly well in simulation studies compared to alternatives such as complete-case and multiple imputation analysis.

Suggested Citation

  • Göran Kauermann & Mehboob Ali, 2021. "Semi-parametric regression when some (expensive) covariates are missing by design," Statistical Papers, Springer, vol. 62(4), pages 1675-1696, August.
  • Handle: RePEc:spr:stpapr:v:62:y:2021:i:4:d:10.1007_s00362-019-01152-5
    DOI: 10.1007/s00362-019-01152-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-019-01152-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-019-01152-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Horton, Nicholas J. & Kleinman, Ken P., 2007. "Much Ado About Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models," The American Statistician, American Statistical Association, vol. 61, pages 79-90, February.
    3. Liang H. & Wang S. & Robins J.M. & Carroll R.J., 2004. "Estimation in Partially Linear Models With Missing Covariates," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 357-367, January.
    4. Qi-Hua Wang, 2009. "Statistical estimation in partial linear models with covariate data missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 61(1), pages 47-84, March.
    5. Liang, Hua, 2008. "Generalized partially linear models with missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(5), pages 880-895, May.
    6. J. F. Lawless & J. D. Kalbfleisch & C. J. Wild, 1999. "Semiparametric methods for response‐selective and missing data problems in regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(2), pages 413-438, April.
    7. Guoyou Qin & Zhongyi Zhu & Wing Fung, 2012. "Robust estimation of the generalised partial linear model with missing covariates," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(2), pages 517-530.
    8. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506.
    9. M. P. Wand, 2003. "Smoothing and mixed models," Computational Statistics, Springer, vol. 18(2), pages 223-249, July.
    10. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz & Amy H. Herring, 2005. "Missing-Data Methods for Generalized Linear Models: A Comparative Review," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 332-346, March.
    11. Bernd Fitzenberger & Benjamin Fuchs, 2017. "The Residency Discount for Rents in Germany and the Tenancy Law Reform Act 2001: Evidence from Quantile Regressions," German Economic Review, Verein für Socialpolitik, vol. 18(2), pages 212-236, May.
    12. Guangyu Zhang & Roderick Little, 2009. "Extensions of the Penalized Spline of Propensity Prediction Method of Imputation," Biometrics, The International Biometric Society, vol. 65(3), pages 911-918, September.
    13. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167.
    14. Helge Toutenburg & Thomas Nittner, 2002. "Linear Regression Models with Incomplete Categorical Covariates," Computational Statistics, Springer, vol. 17(2), pages 215-232, July.
    15. Takumi Saegusa, 2015. "Variance Estimation under Two-Phase Sampling," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(4), pages 1078-1091, December.
    16. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bravo, Francesco, 2015. "Semiparametric estimation with missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 329-346.
    2. Michael Wegener & Göran Kauermann, 2017. "Forecasting in nonlinear univariate time series using penalized splines," Statistical Papers, Springer, vol. 58(3), pages 557-576, September.
    3. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    4. Basile, Roberto & Durbán, María & Mínguez, Román & María Montero, Jose & Mur, Jesús, 2014. "Modeling regional economic dynamics: Spatial dependence, spatial heterogeneity and nonlinearities," Journal of Economic Dynamics and Control, Elsevier, vol. 48(C), pages 229-245.
    5. Takuma Yoshida, 2019. "Two stage smoothing in additive models with missing covariates," Statistical Papers, Springer, vol. 60(6), pages 1803-1826, December.
    6. Hübler, Olaf, 2017. "Health and Body Mass Index: No Simple Relationship," IZA Discussion Papers 10620, Institute of Labor Economics (IZA).
    7. Rachid Muleia & Makini Boothe & Osvaldo Loquiha & Marc Aerts & Christel Faes, 2020. "Spatial Distribution of HIV Prevalence among Young People in Mozambique," IJERPH, MDPI, vol. 17(3), pages 1-20, January.
    8. Lee, Wang-Sheng & McKinnish, Terra, 2019. "Locus of control and marital satisfaction: Couple perspectives using Australian data," Journal of Economic Psychology, Elsevier, vol. 74(C).
    9. Lemmens, Aurélie & Croux, Christophe & Stremersch, Stefan, 2012. "Dynamics in the international market segmentation of new product growth," International Journal of Research in Marketing, Elsevier, vol. 29(1), pages 81-92.
    10. Chandra, Hukum & Salvati, Nicola & Chambers, Ray, 2018. "Small area estimation under a spatially non-linear model," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 19-38.
    11. Gressani, Oswaldo & Lambert, Philippe, 2020. "The Laplace-P-spline methodology for fast approximate Bayesian inference in additive partial linear models," LIDAM Discussion Papers ISBA 2020020, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    12. Takuma Yoshida, 2016. "Asymptotics and smoothing parameter selection for penalized spline regression with various loss functions," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 70(4), pages 278-303, November.
    13. Zhangong Zhou & Linjun Tang, 2019. "Testing for parametric component of partially linear models with missing covariates," Statistical Papers, Springer, vol. 60(3), pages 747-760, June.
    14. F. Y. Kuo & W. T. M. Dunsmuir & I. H. Sloan & M. P. Wand & R. S. Womersley, 2008. "Quasi-Monte Carlo for Highly Structured Generalised Response Models," Methodology and Computing in Applied Probability, Springer, vol. 10(2), pages 239-275, June.
    15. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2013. "Nonlife Ratemaking and Risk Management with Bayesian Additive Models for Location, Scale and Shape," LIDAM Discussion Papers ISBA 2013045, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    16. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    17. Timothy K.M. Beatty & Erling Røed Larsen, 2005. "Using Engel curves to estimate bias in the Canadian CPI as a cost of living index," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 38(2), pages 482-499, May.
    18. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    19. Hyunju Son & Youyi Fong, 2021. "Fast grid search and bootstrap‐based inference for continuous two‐phase polynomial regression models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(3), May.
    20. Bernhard Baumgartner & Daniel Guhl & Thomas Kneib & Winfried J. Steiner, 2018. "Flexible estimation of time-varying effects for frequently purchased retail goods: a modeling approach based on household panel data," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(4), pages 837-873, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:62:y:2021:i:4:d:10.1007_s00362-019-01152-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.