IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p951-963.html
   My bibliography  Save this article

A general framework of nonparametric feature selection in high‐dimensional data

Author

Listed:
  • Hang Yu
  • Yuanjia Wang
  • Donglin Zeng

Abstract

Nonparametric feature selection for high‐dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies.

Suggested Citation

  • Hang Yu & Yuanjia Wang & Donglin Zeng, 2023. "A general framework of nonparametric feature selection in high‐dimensional data," Biometrics, The International Biometric Society, vol. 79(2), pages 951-963, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963
    DOI: 10.1111/biom.13664
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13664
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13664?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. L. A. Stefanski & Yichao Wu & Kyle White, 2014. "Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 574-589, June.
    2. Yichao Wu & Leonard A. Stefanski, 2015. "Automatic structure recovery for additive models," Biometrika, Biometrika Trust, vol. 102(2), pages 381-395.
    3. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yoshida, Takuma, 2018. "Semiparametric method for model structure discovery in additive regression models," Econometrics and Statistics, Elsevier, vol. 5(C), pages 124-136.
    2. Doksum, Kjell A. & Jiang, Jiancheng & Sun, Bo & Wang, Shuzhen, 2017. "Nearest neighbor estimates of regression," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 64-74.
    3. Jiang, Liewen & Bondell, Howard D. & Wang, Huixia Judy, 2014. "Interquantile shrinkage and variable selection in quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 208-219.
    4. Bhatnagar, Sahir R. & Lu, Tianyuan & Lovato, Amanda & Olds, David L. & Kobor, Michael S. & Meaney, Michael J. & O'Donnell, Kieran & Yang, Archer Y. & Greenwood, Celia M.T., 2023. "A sparse additive model for high-dimensional interactions with an exposure variable," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    5. Takuma Yoshida, 2019. "Two stage smoothing in additive models with missing covariates," Statistical Papers, Springer, vol. 60(6), pages 1803-1826, December.
    6. Zhu, Ying, 2015. "Sparse Linear Models and l1−Regularized 2SLS with High-Dimensional Endogenous Regressors and Instruments," MPRA Paper 81217, University Library of Munich, Germany.
    7. Xia Cui & Heng Peng & Songqiao Wen & Lixing Zhu, 2013. "Component Selection in the Additive Regression Model," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 40(3), pages 491-510, September.
    8. Randy C. S. Lai & Jan Hannig & Thomas C. M. Lee, 2015. "Generalized Fiducial Inference for Ultrahigh-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 760-772, June.
    9. Fabian Scheipl & Thomas Kneib & Ludwig Fahrmeir, 2013. "Penalized likelihood and Bayesian function selection in regression models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 349-385, October.
    10. Feng, Zheng-Hui & Lin, Lu & Zhu, Ruo-Qing & Zhu, Li-Xing, 2018. "Nonparametric Variable Selection and Its Application to Additive Models," IRTG 1792 Discussion Papers 2018-002, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    11. Zhu, Ying, 2013. "Sparse Linear Models and Two-Stage Estimation in High-Dimensional Settings with Possibly Many Endogenous Regressors," MPRA Paper 49846, University Library of Munich, Germany.
    12. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    13. Lin, Hongmei & Lian, Heng & Liang, Hua, 2019. "Rank reduction for high-dimensional generalized additive models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 672-684.
    14. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    15. Nardi, Y. & Rinaldo, A., 2011. "Autoregressive process modeling via the Lasso procedure," Journal of Multivariate Analysis, Elsevier, vol. 102(3), pages 528-549, March.
    16. Kyle R. White & Leonard A. Stefanski & Yichao Wu, 2017. "Variable Selection in Kernel Regression Using Measurement Error Selection Likelihoods," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1587-1597, October.
    17. Li Liu & Hao Wang & Yanyan Liu & Jian Huang, 2021. "Model pursuit and variable selection in the additive accelerated failure time model," Statistical Papers, Springer, vol. 62(6), pages 2627-2659, December.
    18. Yi Liu & Veronika Ročková & Yuexi Wang, 2021. "Variable selection with ABC Bayesian forests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 453-481, July.
    19. Kuang-Yao Lee & Bing Li & Hongyu Zhao, 2016. "Variable selection via additive conditional independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 1037-1055, November.
    20. Umberto Amato & Anestis Antoniadis & Italia De Feis, 2016. "Additive model selection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(4), pages 519-564, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:951-963. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.