IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p951-963.html
   My bibliography  Save this article

A general framework of nonparametric feature selection in high‐dimensional data

Author

Listed:
  • Hang Yu
  • Yuanjia Wang
  • Donglin Zeng

Abstract

Nonparametric feature selection for high‐dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies.

Suggested Citation

  • Hang Yu & Yuanjia Wang & Donglin Zeng, 2023. "A general framework of nonparametric feature selection in high‐dimensional data," Biometrics, The International Biometric Society, vol. 79(2), pages 951-963, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963
    DOI: 10.1111/biom.13664
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13664
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13664?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. L. A. Stefanski & Yichao Wu & Kyle White, 2014. "Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 574-589, June.
    2. Fan, Jianqing & Feng, Yang & Song, Rui, 2011. "Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 544-557.
    3. Yichao Wu & Leonard A. Stefanski, 2015. "Automatic structure recovery for additive models," Biometrika, Biometrika Trust, vol. 102(2), pages 381-395.
    4. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    5. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yoshida, Takuma, 2018. "Semiparametric method for model structure discovery in additive regression models," Econometrics and Statistics, Elsevier, vol. 5(C), pages 124-136.
    2. Doksum, Kjell A. & Jiang, Jiancheng & Sun, Bo & Wang, Shuzhen, 2017. "Nearest neighbor estimates of regression," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 64-74.
    3. Randy C. S. Lai & Jan Hannig & Thomas C. M. Lee, 2015. "Generalized Fiducial Inference for Ultrahigh-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 760-772, June.
    4. Lin, Hongmei & Lian, Heng & Liang, Hua, 2019. "Rank reduction for high-dimensional generalized additive models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 672-684.
    5. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    6. Kuang-Yao Lee & Bing Li & Hongyu Zhao, 2016. "Variable selection via additive conditional independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 1037-1055, November.
    7. Umberto Amato & Anestis Antoniadis & Italia De Feis, 2016. "Additive model selection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(4), pages 519-564, November.
    8. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    9. Peng Lai & Xi Yan & Xin Sun & Haozhe Pang & Yanqiu Zhou, 2023. "Variable selection for nonparametric quantile regression via measurement error model," Statistical Papers, Springer, vol. 64(6), pages 2207-2224, December.
    10. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    11. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    12. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    13. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    14. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    15. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    16. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    17. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    18. Azadkia, Mona & Chatterjee, Sourav, 2021. "A simple measure of conditional dependence," LSE Research Online Documents on Economics 125584, London School of Economics and Political Science, LSE Library.
    19. Fan, Jianqing & Feng, Yang & Xia, Lucy, 2020. "A projection-based conditional dependence measure with applications to high-dimensional undirected graphical models," Journal of Econometrics, Elsevier, vol. 218(1), pages 119-139.
    20. Li, Degui & Linton, Oliver & Lu, Zudi, 2015. "A flexible semiparametric forecasting model for time series," Journal of Econometrics, Elsevier, vol. 187(1), pages 345-357.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.