IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0283798.html
   My bibliography  Save this article

A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization

Author

Listed:
  • Jakub Stoklosa
  • Wen-Han Hwang
  • David I Warton

Abstract

In regression modelling, measurement error models are often needed to correct for uncertainty arising from measurements of covariates/predictor variables. The literature on measurement error (or errors-in-variables) modelling is plentiful, however, general algorithms and software for maximum likelihood estimation of models with measurement error are not as readily available, in a form that they can be used by applied researchers without relatively advanced statistical expertise. In this study, we develop a novel algorithm for measurement error modelling, which could in principle take any regression model fitted by maximum likelihood, or penalised likelihood, and extend it to account for uncertainty in covariates. This is achieved by exploiting an interesting property of the Monte Carlo Expectation-Maximization (MCEM) algorithm, namely that it can be expressed as an iteratively reweighted maximisation of complete data likelihoods (formed by imputing the missing values). Thus we can take any regression model for which we have an algorithm for (penalised) likelihood estimation when covariates are error-free, nest it within our proposed iteratively reweighted MCEM algorithm, and thus account for uncertainty in covariates. The approach is demonstrated on examples involving generalized linear models, point process models, generalized additive models and capture–recapture models. Because the proposed method uses maximum (penalised) likelihood, it inherits advantageous optimality and inferential properties, as illustrated by simulation. We also study the model robustness of some violations in predictor distributional assumptions. Software is provided as the refitME package on R, whose key function behaves like a refit() function, taking a fitted regression model object and re-fitting with a pre-specified amount of measurement error.

Suggested Citation

  • Jakub Stoklosa & Wen-Han Hwang & David I Warton, 2023. "A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization," PLOS ONE, Public Library of Science, vol. 18(4), pages 1-21, April.
  • Handle: RePEc:plo:pone00:0283798
    DOI: 10.1371/journal.pone.0283798
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0283798
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0283798&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0283798?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. C. Y. Wang & Naisyin Wang & Suojin Wang, 2000. "Regression Analysis When Covariates Are Regression Parameters of a Random Effects Model for Observed Longitudinal Measurements," Biometrics, The International Biometric Society, vol. 56(2), pages 487-495, June.
    2. Shen‐Ming Lee & Wen‐Han Hwang & Jean de Dieu Tapsoba, 2016. "Estimation in closed capture–recapture models when covariates are missing at random," Biometrics, The International Biometric Society, vol. 72(4), pages 1294-1304, December.
    3. Francis K.C. Hui & David I. Warton & Scott D. Foster, 2015. "Order selection in finite mixture models: complete or observed likelihood information criteria?," Biometrika, Biometrika Trust, vol. 102(3), pages 724-730.
    4. Hua Liang & Sally W. Thurston & David Ruppert & Tatiyana Apanasovich & Russ Hauser, 2008. "Additive partial linear models with measurement errors," Biometrika, Biometrika Trust, vol. 95(3), pages 667-678.
    5. Mark Berman & T. Rolf Turner, 1992. "Approximating Point Process Likelihoods with Glim," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(1), pages 31-38, March.
    6. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz, 1999. "Monte Carlo EM for Missing Covariates in Parametric Regression Models," Biometrics, The International Biometric Society, vol. 55(2), pages 591-596, June.
    7. Li, Mengyan & Li, Runze & Ma, Yanyuan, 2021. "Inference in high dimensional linear measurement error models," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    8. J. G. Booth & J. P. Hobert, 1999. "Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 265-285.
    9. Liang, Hua, 2008. "Generalized partially linear models with missing covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(5), pages 880-895, May.
    10. Zeileis, Achim, 2006. "Object-oriented Computation of Sandwich Estimators," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 16(i09).
    11. Ian W. Renner & David I. Warton, 2013. "Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology," Biometrics, The International Biometric Society, vol. 69(1), pages 274-281, March.
    12. Junhan Fang & Grace Y Yi, 2021. "Matrix-variate logistic regression with measurement error," Biometrika, Biometrika Trust, vol. 108(1), pages 83-97.
    13. Yee, Thomas W. & Stoklosa, Jakub & Huggins, Richard M., 2015. "The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 65(i05).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Leandro, Camila & Jay-Robert, Pierre & Mériguet, Bruno & Houard, Xavier & Renner, Ian W., 2020. "Is my sdm good enough? insights from a citizen science dataset in a point process modeling framework," Ecological Modelling, Elsevier, vol. 438(C).
    2. Christophe Botella & Alexis Joly & Pascal Monestiez & Pierre Bonnet & François Munoz, 2020. "Bias in presence-only niche models related to sampling effort and species niches: Lessons for background point selection," PLOS ONE, Public Library of Science, vol. 15(5), pages 1-18, May.
    3. Jeffrey Daniel & Julie Horrocks & Gary J. Umphrey, 2020. "Efficient Modelling of Presence-Only Species Data via Local Background Sampling," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(1), pages 90-111, March.
    4. Gressani, Oswaldo & Lambert, Philippe, 2020. "The Laplace-P-spline methodology for fast approximate Bayesian inference in additive partial linear models," LIDAM Discussion Papers ISBA 2020020, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    5. Shu Yang & Jae Kwang Kim, 2016. "Likelihood-based Inference with Missing Data Under Missing-at-Random," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 436-454, June.
    6. Wiltshire, Kathryn H & Tanner, Jason E, 2020. "Comparing maximum entropy modelling methods to inform aquaculture site selection for novel seaweed species," Ecological Modelling, Elsevier, vol. 429(C).
    7. Giuseppe Espa & Giuseppe Arbia & Diego Giuliani, 2013. "Conditional versus unconditional industrial agglomeration: disentangling spatial dependence and spatial heterogeneity in the analysis of ICT firms’ distribution in Milan," Journal of Geographical Systems, Springer, vol. 15(1), pages 31-50, January.
    8. Hemant Kulkarni & Jayabrata Biswas & Kiranmoy Das, 2019. "A joint quantile regression model for multiple longitudinal outcomes," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 453-473, December.
    9. Tatiyana V. Apanasovich & David Ruppert & Joanne R. Lupton & Natasa Popovic & Nancy D. Turner & Robert S. Chapkin & Raymond J. Carroll, 2008. "Aberrant Crypt Foci and Semiparametric Modeling of Correlated Binary Data," Biometrics, The International Biometric Society, vol. 64(2), pages 490-500, June.
    10. Matteo Barigozzi & Matteo Luciani, 2019. "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm," Papers 1910.03821, arXiv.org, revised Sep 2024.
    11. Dong, C. & Li, S., 2021. "Specification Lasso and an Application in Financial Markets," Cambridge Working Papers in Economics 2139, Faculty of Economics, University of Cambridge.
    12. Timo Dimitriadis & iaochun Liu & Julie Schnaitmann, 2023. "Encompassing Tests for Value at Risk and Expected Shortfall Multistep Forecasts Based on Inference on the Boundary," Journal of Financial Econometrics, Oxford University Press, vol. 21(2), pages 412-444.
    13. Jiang, Xianfeng & Packer, Frank, 2019. "Credit ratings of Chinese firms by domestic and global agencies: Assessing the determinants and impact," Journal of Banking & Finance, Elsevier, vol. 105(C), pages 178-193.
    14. Ricardo Smith Ramírez, 2007. "FIML estimation of treatment effect models with endogenous selection and multiple censored responses via a Monte Carlo EM Algorithm," Working Papers DTE 403, CIDE, División de Economía.
    15. Amanda M E D’Andrea & Vera L D Tomazella & Hassan M Aljohani & Pedro L Ramos & Marco P Almeida & Francisco Louzada & Bruna A W Verssani & Amanda B Gazon & Ahmed Z Afify, 2021. "Objective bayesian analysis for multiple repairable systems," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-19, November.
    16. Chuan-hua Wei & Chunling Liu, 2012. "Statistical inference on semi-parametric partial linear additive models," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 809-823, December.
    17. Brent A. Coull & Alan Agresti, 2000. "Random Effects Modeling of Multiple Binomial Responses Using the Multivariate Binomial Logit-Normal Distribution," Biometrics, The International Biometric Society, vol. 56(1), pages 73-80, March.
    18. Ball, Laurence & Carvalho, Carlos & Evans, Christopher & Antonio Ricci, Luca, 2024. "Weighted Median Inflation Around the World: A Measure of Core Inflation," Journal of International Money and Finance, Elsevier, vol. 142(C).
    19. Stefan Seifert & Christoph Kahle & Silke Hüttel, 2021. "Price Dispersion in Farmland Markets: What Is the Role of Asymmetric Information?," American Journal of Agricultural Economics, John Wiley & Sons, vol. 103(4), pages 1545-1568, August.
    20. Chen, Songxi, 2012. "Estimation in semiparametric models with missing data," MPRA Paper 46216, University Library of Munich, Germany.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0283798. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.