IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i8p1293-d1635229.html
   My bibliography  Save this article

Variable Selection for High-Dimensional Longitudinal Data via Within-Cluster Resampling

Author

Listed:
  • Yue Ma

    (Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518055, China)

  • Xuejun Jiang

    (Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518055, China)

Abstract

The phenomenon of informative cluster size (ICS) emerges when the number of repeated measurements is correlated with the outcome variable. In such scenarios, the prevailing generalized estimating equation (GEE) method often yields biased estimates due to nonignorable cluster size. This study proposes an integrated methodology that explicitly accounts for ICS and provides a robust solution to mitigate its effects. Our approach combines within-cluster resampling (WCR) with a penalized likelihood framework, ensuring consistent model selection and parameter estimation across resampled datasets. Additionally, we introduce a penalized mean regression method to aggregate the estimators from multiple resampled datasets, producing a final estimator that improves the true positive discovery rate while controlling false positives. The proposed penalized likelihood method via WCR ( PL WCR ) is evaluated through extensive simulations and an application to yeast cell-cycle gene expression data. The results demonstrate its robustness and superior performance in high-dimensional longitudinal data analysis with ICS.

Suggested Citation

  • Yue Ma & Xuejun Jiang, 2025. "Variable Selection for High-Dimensional Longitudinal Data via Within-Cluster Resampling," Mathematics, MDPI, vol. 13(8), pages 1-24, April.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:8:p:1293-:d:1635229
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/8/1293/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/8/1293/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Manfred Koranda & Alexander Schleiffer & Lukas Endler & Gustav Ammerer, 2000. "Forkhead-like transcription factors recruit Ndd1 to the chromatin of G2/M-specific promoters," Nature, Nature, vol. 406(6791), pages 94-98, July.
    2. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    3. Zhao, Meng & Kulasekera, K.B., 2006. "Consistent linear model selection," Statistics & Probability Letters, Elsevier, vol. 76(5), pages 520-530, March.
    4. Shi, Chengchun & Song, Rui & Chen, Zhao & Li, Runze, 2019. "Linear hypothesis testing for high dimensional generalized linear models," LSE Research Online Documents on Economics 102108, London School of Economics and Political Science, LSE Library.
    5. Lan Wang & Jianhui Zhou & Annie Qu, 2012. "Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 68(2), pages 353-360, June.
    6. Shi, Chengchun & Fan, Ailin & Song, Rui & Lu, Wenbin, 2018. "High-dimensional A-learning for optimal dynamic treatment regimes," LSE Research Online Documents on Economics 102113, London School of Economics and Political Science, LSE Library.
    7. Chung‐Wei Shen & Yi‐Hau Chen, 2018. "Model selection for semiparametric marginal mean regression accounting for within‐cluster subsampling variability and informative cluster size," Biometrics, The International Biometric Society, vol. 74(3), pages 934-943, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dora Gyori & Bernadett Frida Farkas & Lili Olga Horvath & Daniel Komaromy & Gergely Meszaros & Dora Szentivanyi & Judit Balazs, 2021. "The Association of Nonsuicidal Self-Injury with Quality of Life and Mental Disorders in Clinical Adolescents—A Network Approach," IJERPH, MDPI, vol. 18(4), pages 1-21, February.
    2. Rahul Ghosal & Arnab Maity & Timothy Clark & Stefano B. Longo, 2020. "Variable selection in functional linear concurrent regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(3), pages 565-587, June.
    3. Lian, Heng & Li, Jianbo & Tang, Xingyu, 2014. "SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 50-64.
    4. Hu, Yuao & Lian, Heng, 2013. "Variable selection in a partially linear proportional hazards model with a diverging dimensionality," Statistics & Probability Letters, Elsevier, vol. 83(1), pages 61-69.
    5. Shi, Chengchun & Li, Lexin, 2022. "Testing mediation effects using logic of Boolean matrices," LSE Research Online Documents on Economics 108881, London School of Economics and Political Science, LSE Library.
    6. Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
    7. Zak-Szatkowska, Malgorzata & Bogdan, Malgorzata, 2011. "Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2908-2924, November.
    8. Gregory Vaughan & Robert Aseltine & Kun Chen & Jun Yan, 2017. "Stagewise generalized estimating equations with grouped variables," Biometrics, The International Biometric Society, vol. 73(4), pages 1332-1342, December.
    9. Xiaotong Shen & Wei Pan & Yunzhang Zhu & Hui Zhou, 2013. "On constrained and regularized high-dimensional regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(5), pages 807-832, October.
    10. Guo, Xu & Li, Runze & Liu, Jingyuan & Zeng, Mudong, 2023. "Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic," Journal of Econometrics, Elsevier, vol. 235(1), pages 166-179.
    11. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    12. Lu Tang & Ling Zhou & Peter X. K. Song, 2019. "Fusion learning algorithm to combine partially heterogeneous Cox models," Computational Statistics, Springer, vol. 34(1), pages 395-414, March.
    13. Molly C. Klanderman & Kathryn B. Newhart & Tzahi Y. Cath & Amanda S. Hering, 2020. "Fault isolation for a complex decentralized waste water treatment facility," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 931-951, August.
    14. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    15. Jones, Benjamin A., 2018. "Forest-attacking Invasive Species and Infant Health: Evidence From the Invasive Emerald Ash Borer," Ecological Economics, Elsevier, vol. 154(C), pages 282-293.
    16. Tae-Hwy Lee & Ekaterina Seregina, 2020. "Learning from Forecast Errors: A New Approach to Forecast Combination," Working Papers 202024, University of California at Riverside, Department of Economics.
    17. Chenchen Ma & Jing Ouyang & Gongjun Xu, 2023. "Learning Latent and Hierarchical Structures in Cognitive Diagnosis Models," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 175-207, March.
    18. Gonzalo García-Donato & María Eugenia Castellanos & Alicia Quirós, 2021. "Bayesian Variable Selection with Applications in Health Sciences," Mathematics, MDPI, vol. 9(3), pages 1-16, January.
    19. Li, Gaorong & Lian, Heng & Feng, Sanying & Zhu, Lixing, 2013. "Automatic variable selection for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 174-186.
    20. Yinjun Chen & Hao Ming & Hu Yang, 2024. "Efficient variable selection for high-dimensional multiplicative models: a novel LPRE-based approach," Statistical Papers, Springer, vol. 65(6), pages 3713-3737, August.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:8:p:1293-:d:1635229. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.