IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v35y2020i3d10.1007_s00180-020-00976-2.html
   My bibliography  Save this article

Multiple imputation and functional methods in the presence of measurement error and missingness in explanatory variables

Author

Listed:
  • Firouzeh Noghrehchi

    (The University of New South Wales)

  • Jakub Stoklosa

    (The University of New South Wales
    The University of New South Wales)

  • Spiridon Penev

    (The University of New South Wales)

Abstract

In many applications involving regression analysis, explanatory variables (or covariates) may be imprecisely measured or may contain missing values. Although there exists a vast literature on measurement error modeling to account for errors-in-variables, and on missing data methodology to handle missingness, very few methods have been developed to simultaneously address both. In this paper, we consider likelihood-based multiple imputation to handle missing data, and combine this with two well-known functional measurement error methods: simulation-extrapolation and corrected score. This unified approach has several appealing characteristics: the model fitting procedure is easy to understand and off-the-shelf software can be incorporated into the modeling framework; no calibration data or a validation subset is required in the model fitting procedure; and the missing data component of the proposed approach is likelihood-based which allows standard likelihood machinery. We demonstrate our methods on simulated datasets and apply them to daily ozone pollution measurements in Los Angeles where observed covariates consist of missing data and imprecise measurements. We conclude that the proposed methods substantially reduce bias and mean squared errors in regression coefficients, in comparison to methods that ignore either measurement error or missingness in covariates.

Suggested Citation

  • Firouzeh Noghrehchi & Jakub Stoklosa & Spiridon Penev, 2020. "Multiple imputation and functional methods in the presence of measurement error and missingness in explanatory variables," Computational Statistics, Springer, vol. 35(3), pages 1291-1317, September.
  • Handle: RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-020-00976-2
    DOI: 10.1007/s00180-020-00976-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-020-00976-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-020-00976-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Nicoletti, Cheti & Peracchi, Franco & Foliano, Francesca, 2011. "Estimating Income Poverty in the Presence of Missing Data and Measurement Error," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(1), pages 61-72.
    3. Wang, Qihua & Sun, Zhihua, 2007. "Estimation in partially linear models with missing responses at random," Journal of Multivariate Analysis, Elsevier, vol. 98(7), pages 1470-1493, August.
    4. Casella, George & Moreno, Elias, 2006. "Objective Bayesian Variable Selection," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 157-167, March.
    5. Paul T. von Hippel, 2013. "The Bias and Efficiency of Incomplete-Data Estimators in Small Univariate Normal Samples," Sociological Methods & Research, , vol. 42(4), pages 531-558, November.
    6. Grace Y. Yi & Yanyuan Ma & Raymond J. Carroll, 2012. "A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error," Biometrika, Biometrika Trust, vol. 99(1), pages 151-165.
    7. Xiao Song & Ching‐Yun Wang, 2019. "GMM nonparametric correction methods for logistic regression with error‐contaminated covariates and partially observed instrumental variables," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 46(3), pages 898-919, September.
    8. Carroll, Raymond J. & Freedman, Laurence & Pee, David, 1997. "Design aspects of calibration studies in nutrition, with analysis of missing data in linear measurement error models," SFB 373 Discussion Papers 1997,12, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    9. Eugster, Manuel J.A. & Leisch, Friedrich, 2011. "Weighted and robust archetypal analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1215-1225, March.
    10. Min Wang & Xiaoqian Sun & Tao Lu, 2015. "Bayesian structured variable selection in linear regression models," Computational Statistics, Springer, vol. 30(1), pages 205-229, March.
    11. Buzas, J. S. & Stefanski, L. A., 1996. "A note on corrected-score estimation," Statistics & Probability Letters, Elsevier, vol. 28(1), pages 1-8, June.
    12. Jian Chen & John J. Hanfelt & Yijian Huang, 2015. "A Simple Corrected Score for Logistic Regression with Errors-in-Covariates," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 44(10), pages 2024-2036, May.
    13. C. Y. Wang & Yijian Huang & Edward C. Chao & Marjorie K. Jeffcoat, 2008. "Expected Estimating Equations for Missing Data, Measurement Error, and Misclassification, with Application to Longitudinal Nonignorable Missing Data," Biometrics, The International Biometric Society, vol. 64(1), pages 85-95, March.
    14. Hua Liang & Suojin Wang & Raymond J. Carroll, 2007. "Partially linear models with missing response variables and error-prone covariates," Biometrika, Biometrika Trust, vol. 94(1), pages 185-198.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sun, Zhihua & Ye, Xue & Sun, Liuquan, 2015. "Consistent test of error-in-variables partially linear model with auxiliary variables," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 118-131.
    2. Majid Mojirsheibani & Timothy Reese, 2017. "Kernel regression estimation for incomplete data with applications," Statistical Papers, Springer, vol. 58(1), pages 185-209, March.
    3. Huixia Judy Wang & Leonard A. Stefanski & Zhongyi Zhu, 2012. "Corrected-loss estimation for quantile regression with covariate measurement errors," Biometrika, Biometrika Trust, vol. 99(2), pages 405-421.
    4. Wangli Xu & Xu Guo & Lixing Zhu, 2012. "Goodness-of-fitting for partial linear model with missing response at random," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(1), pages 103-118.
    5. Nengxiang Ling & Rui Kan & Philippe Vieu & Shuyu Meng, 2019. "Semi-functional partially linear regression model with responses missing at random," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(1), pages 39-70, January.
    6. Xue, Liugen & Xue, Dong, 2011. "Empirical likelihood for semiparametric regression model with missing response data," Journal of Multivariate Analysis, Elsevier, vol. 102(4), pages 723-740, April.
    7. Nengxiang Ling & Lilei Cheng & Philippe Vieu & Hui Ding, 2022. "Missing responses at random in functional single index model for time series data," Statistical Papers, Springer, vol. 63(2), pages 665-692, April.
    8. Wangli Xu & Xu Guo, 2013. "Checking the adequacy of partial linear models with missing covariates at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(3), pages 473-490, June.
    9. Ash Abebe & Huybrechts F. Bindele & Masego Otlaadisa & Boikanyo Makubate, 2021. "Robust estimation of single index models with responses missing at random," Statistical Papers, Springer, vol. 62(5), pages 2195-2225, October.
    10. Qin, Guoyou & Zhang, Jiajia & Zhu, Zhongyi, 2016. "Simultaneous mean and covariance estimation of partially linear models for longitudinal data with missing responses and covariate measurement error," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 24-39.
    11. Baojiang Chen & Xiao-Hua Zhou, 2013. "Generalized Partially Linear Models for Incomplete Longitudinal Data In the Presence of Population-Level Information," Biometrics, The International Biometric Society, vol. 69(2), pages 386-395, June.
    12. Dengke Xu & Jiang Du, 2020. "Nonparametric quantile regression estimation for functional data with responses missing at random," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(8), pages 977-990, November.
    13. Bindele, Huybrechts F., 2018. "Covariates missing at random under signed-rank inference," Econometrics and Statistics, Elsevier, vol. 8(C), pages 78-93.
    14. Salim Bouzebda & Youssouf Souddi & Fethi Madani, 2024. "Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data," Mathematics, MDPI, vol. 12(3), pages 1-22, January.
    15. Xiaohui Liu & Zhizhong Wang & Xuemei Hu, 2011. "Testing heteroscedasticity in partially linear models with missing covariates," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 23(2), pages 321-337.
    16. Bianco, Ana M. & Boente, Graciela & González-Manteiga, Wenceslao & Pérez-González, Ana, 2015. "Robust inference in partially linear models with missing responses," Statistics & Probability Letters, Elsevier, vol. 97(C), pages 88-98.
    17. Glen McGee & Marianthi‐Anna Kioumourtzoglou & Marc G. Weisskopf & Sebastien Haneuse & Brent A. Coull, 2020. "On the interplay between exposure misclassification and informative cluster size," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(5), pages 1209-1226, November.
    18. Zhimeng Sun & Zhi Su & Jingyi Ma, 2014. "Focused vector information criterion model selection and model averaging regression with missing response," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 77(3), pages 415-432, April.
    19. Bianco, Ana M. & Spano, Paula M., 2017. "Robust estimation in partially linear errors-in-variables models," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 46-64.
    20. Yan-Ting Xiao & Fu-Xiao Li, 2020. "Estimation in partially linear varying-coefficient errors-in-variables models with missing response variables," Computational Statistics, Springer, vol. 35(4), pages 1637-1658, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-020-00976-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.