IDEAS home Printed from https://ideas.repec.org/a/bla/scjsta/v50y2023i3p1279-1297.html

Variable selection for high‐dimensional generalized linear model with block‐missing data

Author

Listed:
  • Yifan He
  • Yang Feng
  • Xinyuan Song

Abstract

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block‐wise missing data either focus on the single‐block missing pattern or heavily rely on the model structure. In this study, we propose a single regression‐based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block‐wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.

Suggested Citation

  • Yifan He & Yang Feng & Xinyuan Song, 2023. "Variable selection for high‐dimensional generalized linear model with block‐missing data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(3), pages 1279-1297, September.
  • Handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1279-1297
    DOI: 10.1111/sjos.12632
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/sjos.12632
    Download Restriction: no

    File URL: https://libkey.io/10.1111/sjos.12632?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Daniela M. Witten & Robert Tibshirani, 2009. "Covariance‐regularized regression and classification for high dimensional problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 615-636, June.
    2. Yingying Fan & Jinchi Lv, 2013. "Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 1044-1061, September.
    3. Patrick Royston, 2004. "Multiple imputation of missing values," Stata Journal, StataCorp LLC, vol. 4(3), pages 227-241, September.
    4. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    5. Lam, Clifford & Fan, Jianqing, 2009. "Sparsistency and rates of convergence in large covariance matrix estimation," LSE Research Online Documents on Economics 31540, London School of Economics and Political Science, LSE Library.
    6. Tianxi Cai & T. Tony Cai & Anru Zhang, 2016. "Structured Matrix Completion with Applications to Genomic Data Integration," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 621-633, April.
    7. Sang Su Kwak & Kevin J. Washicosky & Emma Brand & Djuna Maydell & Jenna Aronson & Susan Kim & Diane E. Capen & Murat Cetinbas & Ruslan Sadreyev & Shen Ning & Enjana Bylykbashi & Weiming Xia & Steven L, 2020. "Amyloid-β42/40 ratio drives tau pathology in 3D human neural cell culture models of Alzheimer’s disease," Nature Communications, Nature, vol. 11(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aaron J Molstad & Adam J Rothman, 2018. "Shrinking characteristics of precision matrix estimators," Biometrika, Biometrika Trust, vol. 105(3), pages 563-574.
    2. Benjamin Poignard & Manabu Asai, 2023. "Estimation of high-dimensional vector autoregression via sparse precision matrix," The Econometrics Journal, Royal Economic Society, vol. 26(2), pages 307-326.
    3. Huangdi Yi & Qingzhao Zhang & Cunjie Lin & Shuangge Ma, 2022. "Information‐incorporated Gaussian graphical model for gene expression data," Biometrics, The International Biometric Society, vol. 78(2), pages 512-523, June.
    4. Katayama, Shota & Imori, Shinpei, 2014. "Lasso penalized model selection criteria for high-dimensional multivariate linear regression analysis," Journal of Multivariate Analysis, Elsevier, vol. 132(C), pages 138-150.
    5. Ziqi Chen & Chenlei Leng, 2016. "Dynamic Covariance Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1196-1207, July.
    6. Souvik Seal & Qunhua Li & Elle Butler Basner & Laura M Saba & Katerina Kechris, 2023. "RCFGL: Rapid Condition adaptive Fused Graphical Lasso and application to modeling brain region co-expression networks," PLOS Computational Biology, Public Library of Science, vol. 19(1), pages 1-26, January.
    7. Kang, Xiaoning & Kang, Lulu & Chen, Wei & Deng, Xinwei, 2022. "A generative approach to modeling data with quantitative and qualitative responses," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    8. S Klaassen & J Kueck & M Spindler & V Chernozhukov, 2023. "Uniform inference in high-dimensional Gaussian graphical models," Biometrika, Biometrika Trust, vol. 110(1), pages 51-68.
    9. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    10. Zachary D Kurtz & Christian L Müller & Emily R Miraldi & Dan R Littman & Martin J Blaser & Richard A Bonneau, 2015. "Sparse and Compositionally Robust Inference of Microbial Ecological Networks," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-25, May.
    11. Lafit, Ginette & Nogales Martín, Francisco Javier & Zamar, Rubén, 2015. "Ranking Edges and Model Selection in High-Dimensional Graphs," DES - Working Papers. Statistics and Econometrics. WS ws1511, Universidad Carlos III de Madrid. Departamento de Estadística.
    12. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    13. Khai X. Chiong & Hyungsik Roger Moon, 2017. "Estimation of Graphical Models using the $L_{1,2}$ Norm," Papers 1709.10038, arXiv.org, revised Oct 2017.
    14. Wang, Luheng & Chen, Zhao & Wang, Christina Dan & Li, Runze, 2020. "Ultrahigh dimensional precision matrix estimation via refitted cross validation," Journal of Econometrics, Elsevier, vol. 215(1), pages 118-130.
    15. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    16. Liu, Weidong & Luo, Xi, 2015. "Fast and adaptive sparse precision matrix estimation in high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 153-162.
    17. Maboudou-Tchao, Edgard M. & Agboto, Vincent, 2013. "Monitoring the covariance matrix with fewer observations than variables," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 99-112.
    18. Pircalabelu, Eugen & Claeskens, Gerda, 2021. "Linear manifold modeling and graph estimation based on multivariate functional data with different coarseness scales," LIDAM Discussion Papers ISBA 2021032, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    19. Vahe Avagyan & Andrés M. Alonso & Francisco J. Nogales, 2018. "D-trace estimation of a precision matrix using adaptive Lasso penalties," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 425-447, June.
    20. Wang, Ke & Franks, Alexander & Oh, Sang-Yun, 2023. "Learning Gaussian graphical models with latent confounders," Journal of Multivariate Analysis, Elsevier, vol. 198(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1279-1297. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0303-6898 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.