IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v72y2016i4p1155-1163.html
   My bibliography  Save this article

Flexible variable selection for recovering sparsity in nonadditive nonparametric models

Author

Listed:
  • Zaili Fang
  • Inyoung Kim
  • Patrick Schaumont

Abstract

Variable selection for recovering sparsity in nonadditive and nonparametric models with high‐dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high‐dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this article we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric regression model. The advantages of our approach are that it can: (i) recover the sparsity; (ii) automatically model unknown and complicated interactions; (iii) connect with several existing approaches including linear nonnegative garrote and multiple kernel learning; and (iv) provide flexibility for both additive and nonadditive nonparametric models. Our approach can be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a Least Squares Kernel Machine (LSKM) and construct the nonnegative garrote objective function as the function of the sparse scale parameters of kernel machine to recover sparsity of input variables whose relevances to the response are measured by the scale parameters. We also provide the asymptotic properties of our approach. We show that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve the power.

Suggested Citation

  • Zaili Fang & Inyoung Kim & Patrick Schaumont, 2016. "Flexible variable selection for recovering sparsity in nonadditive nonparametric models," Biometrics, The International Biometric Society, vol. 72(4), pages 1155-1163, December.
  • Handle: RePEc:bla:biomet:v:72:y:2016:i:4:p:1155-1163
    DOI: 10.1111/biom.12518
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.12518
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.12518?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Arnab Maity & Xihong Lin, 2011. "Powerful Tests for Detecting a Gene Effect in the Presence of Possible Gene–Gene Interactions Using Garrote Kernel Machines," Biometrics, The International Biometric Society, vol. 67(4), pages 1271-1284, December.
    2. Radchenko, Peter & James, Gareth M., 2010. "Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1541-1553.
    3. Dawei Liu & Xihong Lin & Debashis Ghosh, 2007. "Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models," Biometrics, The International Biometric Society, vol. 63(4), pages 1079-1088, December.
    4. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lulu Cheng & Inyoung Kim & Herbert Pang, 2016. "Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(4), pages 641-662, December.
    2. Zaili Fang & Inyoung Kim & Jeesun Jung, 2018. "Semiparametric Kernel-Based Regression for Evaluating Interaction Between Pathway Effect and Covariate," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(1), pages 129-152, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xia Zheng & Yaohua Rong & Ling Liu & Weihu Cheng, 2021. "A More Accurate Estimation of Semiparametric Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-12, September.
    2. Zaili Fang & Inyoung Kim & Jeesun Jung, 2018. "Semiparametric Kernel-Based Regression for Evaluating Interaction Between Pathway Effect and Covariate," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(1), pages 129-152, March.
    3. Bhatnagar, Sahir R. & Lu, Tianyuan & Lovato, Amanda & Olds, David L. & Kobor, Michael S. & Meaney, Michael J. & O'Donnell, Kieran & Yang, Archer Y. & Greenwood, Celia M.T., 2023. "A sparse additive model for high-dimensional interactions with an exposure variable," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    4. Fabian Scheipl & Thomas Kneib & Ludwig Fahrmeir, 2013. "Penalized likelihood and Bayesian function selection in regression models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 349-385, October.
    5. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    6. Giordano, Francesco & Parrella, Maria Lucia, 2016. "Bias-corrected inference for multivariate nonparametric regression: Model selection and oracle property," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 71-93.
    7. Yi Liu & Veronika Ročková & Yuexi Wang, 2021. "Variable selection with ABC Bayesian forests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 453-481, July.
    8. Lulu Cheng & Inyoung Kim & Herbert Pang, 2016. "Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(4), pages 641-662, December.
    9. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    10. Radchenko, Peter, 2015. "High dimensional single index models," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 266-282.
    11. Hang Yu & Yuanjia Wang & Donglin Zeng, 2023. "A general framework of nonparametric feature selection in high‐dimensional data," Biometrics, The International Biometric Society, vol. 79(2), pages 951-963, June.
    12. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    13. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    14. Arnab Maity & Xihong Lin, 2011. "Powerful Tests for Detecting a Gene Effect in the Presence of Possible Gene–Gene Interactions Using Garrote Kernel Machines," Biometrics, The International Biometric Society, vol. 67(4), pages 1271-1284, December.
    15. Luu, Tung Duy & Fadili, Jalal & Chesneau, Christophe, 2019. "PAC-Bayesian risk bounds for group-analysis sparse regression by exponential weighting," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 209-233.
    16. Long Qu & Tobias Guennel & Scott L. Marshall, 2013. "Linear Score Tests for Variance Components in Linear Mixed Models and Applications to Genetic Association Studies," Biometrics, The International Biometric Society, vol. 69(4), pages 883-892, December.
    17. Fan, Jianqing & Feng, Yang & Xia, Lucy, 2020. "A projection-based conditional dependence measure with applications to high-dimensional undirected graphical models," Journal of Econometrics, Elsevier, vol. 218(1), pages 119-139.
    18. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    19. Teran Hidalgo, Sebastian J. & Wu, Michael C. & Engel, Stephanie M. & Kosorok, Michael R., 2018. "Goodness-of-fit test for nonparametric regression models: Smoothing spline ANOVA models as example," Computational Statistics & Data Analysis, Elsevier, vol. 122(C), pages 135-155.
    20. Lin Zhang & Inyoung Kim, 2021. "Finite mixtures of semiparametric Bayesian survival kernel machine regressions: Application to breast cancer gene pathway subgroup analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(2), pages 251-269, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:72:y:2016:i:4:p:1155-1163. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.