IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v78y2022i1p324-336.html
   My bibliography  Save this article

Simultaneous spatial smoothing and outlier detection using penalized regression, with application to childhood obesity surveillance from electronic health records

Author

Listed:
  • Young‐Geun Choi
  • Lawrence P. Hanrahan
  • Derek Norton
  • Ying‐Qi Zhao

Abstract

Electronic health records (EHRs) have become a platform for data‐driven granular‐level surveillance in recent years. In this paper, we make use of EHRs for early prevention of childhood obesity. The proposed method simultaneously provides smooth disease mapping and outlier information for obesity prevalence that are useful for raising public awareness and facilitating targeted intervention. More precisely, we consider a penalized multilevel generalized linear model. We decompose regional contribution into smooth and sparse signals, which are automatically identified by a combination of fusion and sparse penalties imposed on the likelihood function. In addition, we weigh the proposed likelihood to account for the missingness and potential nonrepresentativeness arising from the EHR data. We develop a novel alternating minimization algorithm, which is computationally efficient, easy to implement, and guarantees convergence. Simulation studies demonstrate superior performance of the proposed method. Finally, we apply our method to the University of Wisconsin Population Health Information Exchange database.

Suggested Citation

  • Young‐Geun Choi & Lawrence P. Hanrahan & Derek Norton & Ying‐Qi Zhao, 2022. "Simultaneous spatial smoothing and outlier detection using penalized regression, with application to childhood obesity surveillance from electronic health records," Biometrics, The International Biometric Society, vol. 78(1), pages 324-336, March.
  • Handle: RePEc:bla:biomet:v:78:y:2022:i:1:p:324-336
    DOI: 10.1111/biom.13404
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13404
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13404?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robert Tibshirani & Michael Saunders & Saharon Rosset & Ji Zhu & Keith Knight, 2005. "Sparsity and smoothness via the fused lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(1), pages 91-108, February.
    2. Friedman, D.J. & Parrish, R.G. & Ross, D.A., 2013. "Electronic health records and US public health: Current realities and future promise," American Journal of Public Health, American Public Health Association, vol. 103(9), pages 1560-1567.
    3. She, Yiyuan & Owen, Art B., 2011. "Outlier Detection Using Nonconvex Penalized Regression," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 626-639.
    4. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    5. C. P. Farrington & N. J. Andrews & A. D. Beale & M. A. Catchpole, 1996. "A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 159(3), pages 547-563, May.
    6. Yingqi Zhao & Donglin Zeng & Amy H. Herring & Amy Ising & Anna Waller & David Richardson & Michael R. Kosorok, 2011. "Detecting Disease Outbreaks Using Local Spatiotemporal Methods," Biometrics, The International Biometric Society, vol. 67(4), pages 1508-1517, December.
    7. Lee, Sangin & Kwon, Sunghoon & Kim, Yongdai, 2016. "A modified local quadratic approximation algorithm for penalized optimization problems," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 275-286.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Howard D. Bondell & Brian J. Reich, 2009. "Simultaneous Factor Selection and Collapsing Levels in ANOVA," Biometrics, The International Biometric Society, vol. 65(1), pages 169-177, March.
    2. Ryan J. Parker & Brian J. Reich & Jo Eidsvik, 2016. "A Fused Lasso Approach to Nonstationary Spatial Covariance Estimation," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(3), pages 569-587, September.
    3. Hosik Choi & Eunjung Song & Seung-sik Hwang & Woojoo Lee, 2018. "A modified generalized lasso algorithm to detect local spatial clusters for count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(4), pages 537-563, October.
    4. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    5. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    6. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    7. Doyo G Enki & Paul H Garthwaite & C Paddy Farrington & Angela Noufaily & Nick J Andrews & Andre Charlett, 2016. "Comparison of Statistical Algorithms for the Detection of Infectious Disease Outbreaks in Large Multiple Surveillance Systems," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-25, August.
    8. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    9. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    10. Eibich, Peter & Ziebarth, Nicolas, 2014. "Examining the Structure of Spatial Health Effects in Germany Using Hierarchical Bayes Models," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 49, pages 305-320.
    11. Shreosi Sanyal & Thierry Rochereau & Cara Nichole Maesano & Laure Com-Ruelle & Isabella Annesi-Maesano, 2018. "Long-Term Effect of Outdoor Air Pollution on Mortality and Morbidity: A 12-Year Follow-Up Study for Metropolitan France," IJERPH, MDPI, vol. 15(11), pages 1-8, November.
    12. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    13. Francis X. Diebold & Kamil Yilmaz, 2016. "Trans-Atlantic Equity Volatility Connectedness: U.S. and European Financial Institutions, 2004–2014," Journal of Financial Econometrics, Oxford University Press, vol. 14(1), pages 81-127.
    14. Gil, Guilherme Dôco Roberti & Costa, Marcelo Azevedo & Lopes, Ana Lúcia Miranda & Mayrink, Vinícius Diniz, 2017. "Spatial statistical methods applied to the 2015 Brazilian energy distribution benchmarking model: Accounting for unobserved determinants of inefficiencies," Energy Economics, Elsevier, vol. 64(C), pages 373-383.
    15. Jian Guo & Elizaveta Levina & George Michailidis & Ji Zhu, 2010. "Pairwise Variable Selection for High-Dimensional Model-Based Clustering," Biometrics, The International Biometric Society, vol. 66(3), pages 793-804, September.
    16. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    17. Franck Rapaport & Christina Leslie, 2010. "Determining Frequent Patterns of Copy Number Alterations in Cancer," PLOS ONE, Public Library of Science, vol. 5(8), pages 1-10, August.
    18. Vanessa Santos-Sánchez & Juan Antonio Córdoba-Doña & Javier García-Pérez & Antonio Escolar-Pujolar & Lucia Pozzi & Rebeca Ramis, 2020. "Cancer Mortality and Deprivation in the Proximity of Polluting Industrial Facilities in an Industrial Region of Spain," IJERPH, MDPI, vol. 17(6), pages 1-15, March.
    19. Berti, Patrizia & Dreassi, Emanuela & Rigo, Pietro, 2014. "Compatibility results for conditional distributions," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 190-203.
    20. Louise Choo & Stephen G. Walker, 2008. "A new approach to investigating spatial variations of disease," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 171(2), pages 395-405, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:78:y:2022:i:1:p:324-336. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.