IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0283798.html
   My bibliography  Save this article

A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization

Author

Listed:
  • Jakub Stoklosa
  • Wen-Han Hwang
  • David I Warton

Abstract

In regression modelling, measurement error models are often needed to correct for uncertainty arising from measurements of covariates/predictor variables. The literature on measurement error (or errors-in-variables) modelling is plentiful, however, general algorithms and software for maximum likelihood estimation of models with measurement error are not as readily available, in a form that they can be used by applied researchers without relatively advanced statistical expertise. In this study, we develop a novel algorithm for measurement error modelling, which could in principle take any regression model fitted by maximum likelihood, or penalised likelihood, and extend it to account for uncertainty in covariates. This is achieved by exploiting an interesting property of the Monte Carlo Expectation-Maximization (MCEM) algorithm, namely that it can be expressed as an iteratively reweighted maximisation of complete data likelihoods (formed by imputing the missing values). Thus we can take any regression model for which we have an algorithm for (penalised) likelihood estimation when covariates are error-free, nest it within our proposed iteratively reweighted MCEM algorithm, and thus account for uncertainty in covariates. The approach is demonstrated on examples involving generalized linear models, point process models, generalized additive models and capture–recapture models. Because the proposed method uses maximum (penalised) likelihood, it inherits advantageous optimality and inferential properties, as illustrated by simulation. We also study the model robustness of some violations in predictor distributional assumptions. Software is provided as the refitME package on R, whose key function behaves like a refit() function, taking a fitted regression model object and re-fitting with a pre-specified amount of measurement error.

Suggested Citation

  • Jakub Stoklosa & Wen-Han Hwang & David I Warton, 2023. "A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization," PLOS ONE, Public Library of Science, vol. 18(4), pages 1-21, April.
  • Handle: RePEc:plo:pone00:0283798
    DOI: 10.1371/journal.pone.0283798
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0283798
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0283798&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0283798?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. C. Y. Wang & Naisyin Wang & Suojin Wang, 2000. "Regression Analysis When Covariates Are Regression Parameters of a Random Effects Model for Observed Longitudinal Measurements," Biometrics, The International Biometric Society, vol. 56(2), pages 487-495, June.
    2. Mark Berman & T. Rolf Turner, 1992. "Approximating Point Process Likelihoods with Glim," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(1), pages 31-38, March.
    3. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz, 1999. "Monte Carlo EM for Missing Covariates in Parametric Regression Models," Biometrics, The International Biometric Society, vol. 55(2), pages 591-596, June.
    4. J. G. Booth & J. P. Hobert, 1999. "Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 265-285.
    5. Liang, Hua, 2008. "Generalized partially linear models with missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(5), pages 880-895, May.
    6. Zeileis, Achim, 2006. "Object-oriented Computation of Sandwich Estimators," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 16(i09).
    7. Yee, Thomas W. & Stoklosa, Jakub & Huggins, Richard M., 2015. "The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 65(i05).
    8. Shen‐Ming Lee & Wen‐Han Hwang & Jean de Dieu Tapsoba, 2016. "Estimation in closed capture–recapture models when covariates are missing at random," Biometrics, The International Biometric Society, vol. 72(4), pages 1294-1304, December.
    9. Francis K.C. Hui & David I. Warton & Scott D. Foster, 2015. "Order selection in finite mixture models: complete or observed likelihood information criteria?," Biometrika, Biometrika Trust, vol. 102(3), pages 724-730.
    10. Hua Liang & Sally W. Thurston & David Ruppert & Tatiyana Apanasovich & Russ Hauser, 2008. "Additive partial linear models with measurement errors," Biometrika, Biometrika Trust, vol. 95(3), pages 667-678.
    11. Li, Mengyan & Li, Runze & Ma, Yanyuan, 2021. "Inference in high dimensional linear measurement error models," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    12. Ian W. Renner & David I. Warton, 2013. "Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology," Biometrics, The International Biometric Society, vol. 69(1), pages 274-281, March.
    13. Junhan Fang & Grace Y Yi, 2021. "Matrix-variate logistic regression with measurement error," Biometrika, Biometrika Trust, vol. 108(1), pages 83-97.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Leandro, Camila & Jay-Robert, Pierre & Mériguet, Bruno & Houard, Xavier & Renner, Ian W., 2020. "Is my sdm good enough? insights from a citizen science dataset in a point process modeling framework," Ecological Modelling, Elsevier, vol. 438(C).
    2. Jeffrey Daniel & Julie Horrocks & Gary J. Umphrey, 2020. "Efficient Modelling of Presence-Only Species Data via Local Background Sampling," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(1), pages 90-111, March.
    3. Gressani, Oswaldo & Lambert, Philippe, 2020. "The Laplace-P-spline methodology for fast approximate Bayesian inference in additive partial linear models," LIDAM Discussion Papers ISBA 2020020, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    4. Christophe Botella & Alexis Joly & Pascal Monestiez & Pierre Bonnet & François Munoz, 2020. "Bias in presence-only niche models related to sampling effort and species niches: Lessons for background point selection," PLOS ONE, Public Library of Science, vol. 15(5), pages 1-18, May.
    5. Hemant Kulkarni & Jayabrata Biswas & Kiranmoy Das, 2019. "A joint quantile regression model for multiple longitudinal outcomes," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 453-473, December.
    6. Dong, C. & Li, S., 2021. "Specification Lasso and an Application in Financial Markets," Cambridge Working Papers in Economics 2139, Faculty of Economics, University of Cambridge.
    7. Aeryn Ng & Sarah E. Gergel & Maya Fromstein & Terry Sunderland & Hisham Zerriffi & Jedidah Nankaya, 2025. "Moving beyond forest cover: Linking forest density, age, and fragmentation to diet," Food Security: The Science, Sociology and Economics of Food Production and Access to Food, Springer;The International Society for Plant Pathology, vol. 17(3), pages 625-640, June.
    8. Timo Dimitriadis & iaochun Liu & Julie Schnaitmann, 2023. "Encompassing Tests for Value at Risk and Expected Shortfall Multistep Forecasts Based on Inference on the Boundary," Journal of Financial Econometrics, Oxford University Press, vol. 21(2), pages 412-444.
    9. Jiang, Xianfeng & Packer, Frank, 2019. "Credit ratings of Chinese firms by domestic and global agencies: Assessing the determinants and impact," Journal of Banking & Finance, Elsevier, vol. 105(C), pages 178-193.
    10. Amanda M E D’Andrea & Vera L D Tomazella & Hassan M Aljohani & Pedro L Ramos & Marco P Almeida & Francisco Louzada & Bruna A W Verssani & Amanda B Gazon & Ahmed Z Afify, 2021. "Objective bayesian analysis for multiple repairable systems," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-19, November.
    11. Chuan-hua Wei & Chunling Liu, 2012. "Statistical inference on semi-parametric partial linear additive models," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 809-823, December.
    12. Brent A. Coull & Alan Agresti, 2000. "Random Effects Modeling of Multiple Binomial Responses Using the Multivariate Binomial Logit-Normal Distribution," Biometrics, The International Biometric Society, vol. 56(1), pages 73-80, March.
    13. Ball, Laurence & Carvalho, Carlos & Evans, Christopher & Antonio Ricci, Luca, 2024. "Weighted Median Inflation Around the World: A Measure of Core Inflation," Journal of International Money and Finance, Elsevier, vol. 142(C).
    14. J. E. Mills & C. A. Field & D. J. Dupuis, 2002. "Marginally Specified Generalized Linear Mixed Models: A Robust Approach," Biometrics, The International Biometric Society, vol. 58(4), pages 727-734, December.
    15. Sviták, Jan & Tichem, Jan & Haasbeek, Stefan, 2021. "Price effects of search advertising restrictions," International Journal of Industrial Organization, Elsevier, vol. 77(C).
    16. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    17. Christopher F. Parmeter, 2018. "Estimation of the two-tiered stochastic frontier model with the scaling property," Journal of Productivity Analysis, Springer, vol. 49(1), pages 37-47, February.
    18. Hasler Mario, 2013. "Multiple Contrasts for Repeated Measures," The International Journal of Biostatistics, De Gruyter, vol. 9(1), pages 49-61, July.
    19. Abdollah Jalilian, 2017. "Modelling and classification of species abundance: a case study in the Barro Colorado Island plot," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2401-2409, October.
    20. Jan Pablo Burgard & Patricia Dörr & Ralf Münnich, 2020. "Monte-Carlo Simulation Studies in Survey Statistics – An Appraisal," Research Papers in Economics 2020-04, University of Trier, Department of Economics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0283798. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.