IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i10p2398-d1152615.html
   My bibliography  Save this article

Quantile-Composited Feature Screening for Ultrahigh-Dimensional Data

Author

Listed:
  • Shuaishuai Chen

    (School of Mathematics, Shandong University, Jinan 250100, China)

  • Jun Lu

    (School of Science, National University of Defense and Technology, Changsha 410000, China)

Abstract

Ultrahigh-dimensional grouped data are frequently encountered by biostatisticians working on multi-class categorical problems. To rapidly screen out the null predictors, this paper proposes a quantile-composited feature screening procedure. The new method first transforms the continuous predictor to a Bernoulli variable, by thresholding the predictor at a certain quantile. Consequently, the independence between the response and each predictor is easy to judge, by employing the Pearson chi-square statistic. The newly proposed method has the following salient features: (1) it is robust against high-dimensional heterogeneous data; (2) it is model-free, without specifying any regression structure between the covariate and outcome variable; (3) it enjoys a low computational cost, with the computational complexity controlled at the sample size level. Under some mild conditions, the new method was shown to achieve the sure screening property without imposing any moment condition on the predictors. Numerical studies and real data analyses further confirmed the effectiveness of the new screening procedure.

Suggested Citation

  • Shuaishuai Chen & Jun Lu, 2023. "Quantile-Composited Feature Screening for Ultrahigh-Dimensional Data," Mathematics, MDPI, vol. 11(10), pages 1-21, May.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:10:p:2398-:d:1152615
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/10/2398/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/10/2398/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jun Lu & Lu Lin, 2020. "Model-free conditional screening via conditional distance correlation," Statistical Papers, Springer, vol. 61(1), pages 225-244, February.
    2. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    3. Rui Pan & Hansheng Wang & Runze Li, 2016. "Ultrahigh-Dimensional Multiclass Linear Discriminant Analysis by Pairwise Sure Independence Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 169-179, March.
    4. Hengjian Cui & Runze Li & Wei Zhong, 2015. "Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 630-641, June.
    5. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    6. Jianqing Fan & Yunbei Ma & Wei Dai, 2014. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1270-1284, September.
    7. Jianqing Fan & Yang Feng & Xin Tong, 2012. "A road to classification in high dimensional space: the regularized optimal affine discriminant," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(4), pages 745-771, September.
    8. Wei Zhong & Chen Qian & Wanjun Liu & Liping Zhu & Runze Li, 2023. "Feature Screening for Interval-Valued Response with Application to Study Association between Posted Salary and Required Skills," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(542), pages 805-817, April.
    9. Lin, Lu & Sun, Jing & Zhu, Lixing, 2013. "Nonparametric feature screening," Computational Statistics & Data Analysis, Elsevier, vol. 67(C), pages 162-174.
    10. Danyang Huang & Runze Li & Hansheng Wang, 2014. "Feature Screening for Ultrahigh Dimensional Categorical Data With Applications," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 32(2), pages 237-244, April.
    11. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    12. Xiangyu Wang & Chenlei Leng, 2016. "High dimensional ordinary least squares projection for screening variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 589-611, June.
    13. Wang, Hansheng, 2009. "Forward Regression for Ultra-High Dimensional Variable Screening," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1512-1524.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lu, Jun & Lin, Lu & Wang, WenWu, 2021. "Partition-based feature screening for categorical data via RKHS embeddings," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    2. Sweata Sen & Damitri Kundu & Kiranmoy Das, 2023. "Variable selection for categorical response: a comparative study," Computational Statistics, Springer, vol. 38(2), pages 809-826, June.
    3. Lyu Ni & Fang Fang & Fangjiao Wan, 2017. "Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 80(6), pages 805-828, November.
    4. Li, Yujie & Li, Gaorong & Lian, Heng & Tong, Tiejun, 2017. "Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 155(C), pages 133-150.
    5. Lu, Jun & Lin, Lu, 2018. "Feature screening for multi-response varying coefficient models with ultrahigh dimensional predictors," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 242-254.
    6. Jun Lu & Dan Wang & Qinqin Hu, 2022. "Interaction screening via canonical correlation," Computational Statistics, Springer, vol. 37(5), pages 2637-2670, November.
    7. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    8. Yang, Baoying & Yin, Xiangrong & Zhang, Nan, 2019. "Sufficient variable selection using independence measures for continuous response," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 480-493.
    9. Qiu, Debin & Ahn, Jeongyoun, 2020. "Grouped variable screening for ultra-high dimensional data for linear model," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    10. Ke, Chenlu & Yang, Wei & Yuan, Qingcong & Li, Lu, 2023. "Partial sufficient variable screening with categorical controls," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    11. Zhao, Shaofei & Fu, Guifang, 2022. "Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    12. Li, Xingxiang & Cheng, Guosheng & Wang, Liming & Lai, Peng & Song, Fengli, 2017. "Ultrahigh dimensional feature screening via projection," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 88-104.
    13. Liming Wang & Xingxiang Li & Xiaoqing Wang & Peng Lai, 2022. "Unified mean-variance feature screening for ultrahigh-dimensional regression," Computational Statistics, Springer, vol. 37(4), pages 1887-1918, September.
    14. Ping Wang & Lu Lin, 2023. "Conditional characteristic feature screening for massive imbalanced data," Statistical Papers, Springer, vol. 64(3), pages 807-834, June.
    15. Fengli Song & Peng Lai & Baohua Shen, 2020. "Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 799-820, October.
    16. Jing Zhang & Yanyan Liu & Hengjian Cui, 2021. "Model-free feature screening via distance correlation for ultrahigh dimensional survival data," Statistical Papers, Springer, vol. 62(6), pages 2711-2738, December.
    17. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Conditional screening for ultrahigh-dimensional survival data in case-cohort studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 632-661, October.
    18. Dong, Yuexiao & Yu, Zhou & Zhu, Liping, 2020. "Model-free variable selection for conditional mean in regression," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    19. Akira Shinkyu, 2023. "Forward Selection for Feature Screening and Structure Identification in Varying Coefficient Models," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 485-511, February.
    20. Zhang, Shen & Zhao, Peixin & Li, Gaorong & Xu, Wangli, 2019. "Nonparametric independence screening for ultra-high dimensional generalized varying coefficient models with longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 37-52.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:10:p:2398-:d:1152615. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.