IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v8y2025i4p99-d1772872.html

Model-Free Feature Screening Based on Data Aggregation for Ultra-High-Dimensional Longitudinal Data

Author

Listed:
  • Junfeng Chen

    (School of Mathematics, Southwest Jiaotong University, Chengdu 611756, China
    Office of Medical Information and Data, Medical Support Center, The General Hospital of Western Theater Command, PLA, Chengdu 610083, China)

  • Xiaoguang Yang

    (Office of Medical Information and Data, Medical Support Center, The General Hospital of Western Theater Command, PLA, Chengdu 610083, China)

  • Jing Dai

    (Department of Information, Medical Support Center, The General Hospital of Western Theater Command, PLA, Chengdu 610083, China)

  • Yunming Li

    (School of Mathematics, Southwest Jiaotong University, Chengdu 611756, China
    Office of Medical Information and Data, Medical Support Center, The General Hospital of Western Theater Command, PLA, Chengdu 610083, China)

Abstract

Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where feature screening is performed by sample splitting and data aggregation. Distance correlation is used to measure the association at each time point separately, while longitudinal correlation is modeled by a specific cumulative distribution function to achieve efficiency. In addition, we extend this new method to handle situations where the predictors are correlated. Both methods possess excellent asymptotic properties and are capable of handling longitudinal data with unequal numbers of repeated measurements and unequal intervals between repeated measurement time points. Compared to other model-free methods, the two new methods are relatively insensitive to within-subject correlation, and they can help reduce the computational burden when applied to longitudinal data. Finally, we use some simulated and empirical examples to show that both new methods have better screening performance.

Suggested Citation

  • Junfeng Chen & Xiaoguang Yang & Jing Dai & Yunming Li, 2025. "Model-Free Feature Screening Based on Data Aggregation for Ultra-High-Dimensional Longitudinal Data," Stats, MDPI, vol. 8(4), pages 1-21, October.
  • Handle: RePEc:gam:jstats:v:8:y:2025:i:4:p:99-:d:1772872
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/8/4/99/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/8/4/99/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jun Lu & Lu Lin, 2020. "Model-free conditional screening via conditional distance correlation," Statistical Papers, Springer, vol. 61(1), pages 225-244, February.
    2. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    3. Sathish Subramanian & Sayeeda Huq & Tanya Yatsunenko & Rashidul Haque & Mustafa Mahfuz & Mohammed A. Alam & Amber Benezra & Joseph DeStefano & Martin F. Meier & Brian D. Muegge & Michael J. Barratt & , 2014. "Persistent gut microbiota immaturity in malnourished Bangladeshi children," Nature, Nature, vol. 510(7505), pages 417-421, June.
    4. Jing Zhang & Yanyan Liu & Hengjian Cui, 2021. "Model-free feature screening via distance correlation for ultrahigh dimensional survival data," Statistical Papers, Springer, vol. 62(6), pages 2711-2738, December.
    5. Xiaofeng Shao & Jingsi Zhang, 2014. "Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1302-1318, September.
    6. Debmalya Nandy & Francesca Chiaromonte & Runze Li, 2022. "Covariate Information Number for Feature Screening in Ultrahigh-Dimensional Supervised Problems," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(539), pages 1516-1529, September.
    7. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    8. Li-Pang Chen, 2021. "Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error," Computational Statistics, Springer, vol. 36(2), pages 857-884, June.
    9. Chenguang Dai & Buyu Lin & Xin Xing & Jun S. Liu, 2023. "False Discovery Rate Control via Data Splitting," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(544), pages 2503-2520, October.
    10. Wanjun Liu & Yuan Ke & Jingyuan Liu & Runze Li, 2022. "Model-Free Feature Screening and FDR Control With Knockoff Features," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(537), pages 428-443, January.
    11. Chien-Ming Chi & Yingying Fan & Ching-Kang Ing & Jinchi Lv, 2025. "High-Dimensional Knockoffs Inference for Time Series Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 120(551), pages 1763-1774, July.
    12. Xin Xing & Zhigen Zhao & Jun S. Liu, 2023. "Controlling False Discovery Rate Using Gaussian Mirrors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(541), pages 222-241, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tian, Zhentao & Zhang, Zhongzhan, 2025. "Quantile feature screening for infinite dimensional data under FDR control," Computational Statistics & Data Analysis, Elsevier, vol. 206(C).
    2. Jiujing Wu & Hengjian Cui, 2024. "Model-free feature screening based on Hellinger distance for ultrahigh dimensional data," Statistical Papers, Springer, vol. 65(9), pages 5903-5930, December.
    3. Wei Liu & Wenbo Wu & Baoying Yang, 2026. "Feature Screening for High-Dimensional Data with Measurement Errors using Adjusted Martingale Difference Correlation," Statistical Papers, Springer, vol. 67(2), pages 1-49, April.
    4. Wang, Pei & Yin, Xiangrong & Yuan, Qingcong & Kryscio, Richard, 2021. "Feature filter for estimating central mean subspace and its sparse solution," Computational Statistics & Data Analysis, Elsevier, vol. 163(C).
    5. Jun Lu & Dan Wang & Qinqin Hu, 2022. "Interaction screening via canonical correlation," Computational Statistics, Springer, vol. 37(5), pages 2637-2670, November.
    6. Xiaochao Xia & Hao Ming, 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation," Mathematics, MDPI, vol. 10(24), pages 1-32, December.
    7. Li, Lu & Ke, Chenlu & Yin, Xiangrong & Yu, Zhou, 2023. "Generalized martingale difference divergence: Detecting conditional mean independence with applications in variable screening," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    8. Congran Yu & Wenwen Guo & Xinyuan Song & Hengjian Cui, 2023. "Feature screening with latent responses," Biometrics, The International Biometric Society, vol. 79(2), pages 878-890, June.
    9. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    10. Xuewei Cheng & Gang Li & Hong Wang, 2024. "The concordance filter: an adaptive model-free feature screening procedure," Computational Statistics, Springer, vol. 39(5), pages 2413-2436, July.
    11. Chen, Xiaolin & Chen, Xiaojing & Wang, Hong, 2018. "Robust feature screening for ultra-high dimensional right censored data via distance correlation," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 118-138.
    12. Ke, Chenlu & Yang, Wei & Yuan, Qingcong & Li, Lu, 2023. "Partial sufficient variable screening with categorical controls," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    13. Zhong, Wei & Wang, Jiping & Chen, Xiaolin, 2021. "Censored mean variance sure independence screening for ultrahigh dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    14. Yingli Pan & Haoyu Wang & Zhan Liu, 2025. "Model free feature screening for large scale and ultrahigh dimensional survival data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 77(1), pages 155-190, February.
    15. Congran Yu & Hengjian Cui, 2025. "Feature screening via false discovery rate control for linear model with multivariate responses," Statistical Papers, Springer, vol. 66(2), pages 1-29, February.
    16. Sang, Yongli & Dang, Xin, 2024. "Grouped feature screening for ultrahigh-dimensional classification via Gini distance correlation," Journal of Multivariate Analysis, Elsevier, vol. 204(C).
    17. Liu, Yu & Qin, Xu & Cai, Zhibo, 2025. "A tree approach for variable selection and its random forest," Computational Statistics & Data Analysis, Elsevier, vol. 202(C).
    18. Craig, Sarah J.C. & Kenney, Ana M. & Lin, Junli & Paul, Ian M. & Birch, Leann L. & Savage, Jennifer S. & Marini, Michele E. & Chiaromonte, Francesca & Reimherr, Matthew L. & Makova, Kateryna D., 2023. "Constructing a polygenic risk score for childhood obesity using functional data analysis," Econometrics and Statistics, Elsevier, vol. 25(C), pages 66-86.
    19. Shuaishuai Chen & Jun Lu, 2023. "Quantile-Composited Feature Screening for Ultrahigh-Dimensional Data," Mathematics, MDPI, vol. 11(10), pages 1-21, May.
    20. Yuan, Panxu & Jin, Changhan & Li, Gaorong, 2024. "FDR control for linear log-contrast models with high-dimensional compositional covariates," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:8:y:2025:i:4:p:99-:d:1772872. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.