IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v90y2015icp15-35.html
   My bibliography  Save this article

Grouped variable importance with random forests and application to multiple functional data analysis

Author

Listed:
  • Gregorutti, Baptiste
  • Michel, Bertrand
  • Saint-Pierre, Philippe

Abstract

The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression models. Second, an original method for selecting functional variables based on the grouped variable importance measure is developed. Using a wavelet basis, it is proposed to regroup all of the wavelet coefficients for a given functional variable and use a wrapper selection algorithm with these groups. Various other groupings which take advantage of the frequency and time localization of the wavelet basis are proposed. An extensive simulation study is performed to illustrate the use of the grouped importance measure in this context. The method is applied to a real life problem coming from aviation safety.

Suggested Citation

  • Gregorutti, Baptiste & Michel, Bertrand & Saint-Pierre, Philippe, 2015. "Grouped variable importance with random forests and application to multiple functional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 15-35.
  • Handle: RePEc:eee:csdana:v:90:y:2015:i:c:p:15-35
    DOI: 10.1016/j.csda.2015.04.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947315000997
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2015.04.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lukas Meier & Sara Van De Geer & Peter Bühlmann, 2008. "The group lasso for logistic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 53-71, February.
    2. Amato, U. & Antoniadis, A. & De Feis, I., 2006. "Dimension reduction in functional regression with applications," Computational Statistics & Data Analysis, Elsevier, vol. 50(9), pages 2422-2446, May.
    3. Cardot, Hervé & Ferraty, Frédéric & Sarda, Pascal, 1999. "Functional linear model," Statistics & Probability Letters, Elsevier, vol. 45(1), pages 11-22, October.
    4. Antoniadis, Anestis & Bigot, Jeremie & Sapatinas, Theofanis, 2001. "Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 6(i06).
    5. Matsui, Hidetoshi, 2014. "Variable and boundary selection for functional data via multiclass logistic regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 176-185.
    6. Matsui, Hidetoshi & Konishi, Sadanori, 2011. "Variable selection for functional regression models via the L1 regularization," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3304-3310, December.
    7. Tomasz Górecki & Łukasz Smaga, 2015. "A comparison of tests for the one-way ANOVA problem for functional data," Computational Statistics, Springer, vol. 30(4), pages 987-1010, December.
    8. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pedro Delicado, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 334-337, June.
    2. Simon Valentin & Maximilian Harkotte & Tzvetan Popov, 2020. "Interpreting neural decoding models using grouped model reliance," PLOS Computational Biology, Public Library of Science, vol. 16(1), pages 1-17, January.
    3. Christophe Denis & Charlotte Dion & Miguel Martinez, 2020. "Consistent procedures for multiclass classification of discrete diffusion paths," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(2), pages 516-554, June.
    4. Patrick J. Comer & Jon C. Hak & Marion S. Reid & Stephanie L. Auer & Keith A. Schulz & Healy H. Hamilton & Regan L. Smyth & Matthew M. Kling, 2019. "Habitat Climate Change Vulnerability Index Applied to Major Vegetation Types of the Western Interior United States," Land, MDPI, vol. 8(7), pages 1-27, July.
    5. Neska Haouij & Jean-Michel Poggi & Raja Ghozi & Sylvie Sevestre-Ghalila & Mériem Jaïdane, 2019. "Random forest-based approach for physiological functional variable selection for driver’s stress level classification," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(1), pages 157-185, March.
    6. Fabrizio Maturo & Rosanna Verde, 2023. "Supervised classification of curves via a combined use of functional data analysis and tree-based methods," Computational Statistics, Springer, vol. 38(1), pages 419-459, March.
    7. A. Poterie & J.-F. Dupuy & V. Monbet & L. Rouvière, 2019. "Classification tree algorithm for grouped variables," Computational Statistics, Springer, vol. 34(4), pages 1613-1648, December.
    8. T. Górecki & Ł. Smaga, 2017. "Multivariate analysis of variance for functional data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(12), pages 2172-2189, September.
    9. Epifanio, Irene, 2016. "Functional archetype and archetypoid analysis," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 24-34.
    10. Antoniadis, Anestis & Lambert-Lacroix, Sophie & Poggi, Jean-Michel, 2021. "Random forests for global sensitivity analysis: A selective review," Reliability Engineering and System Safety, Elsevier, vol. 206(C).
    11. Pedro Delicado & Daniel Peña, 2023. "Understanding complex predictive models with ghost variables," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(1), pages 107-145, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Łukasz Smaga & Hidetoshi Matsui, 2018. "A note on variable selection in functional regression via random subspace method," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(3), pages 455-477, August.
    2. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Matsui, Hidetoshi, 2014. "Variable and boundary selection for functional data via multiclass logistic regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 176-185.
    4. Andreas Groll & Trevor Hastie & Gerhard Tutz, 2017. "Selection of effects in Cox frailty models by regularization methods," Biometrics, The International Biometric Society, vol. 73(3), pages 846-856, September.
    5. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    6. Ye, Ya-Fen & Shao, Yuan-Hai & Deng, Nai-Yang & Li, Chun-Na & Hua, Xiang-Yu, 2017. "Robust Lp-norm least squares support vector regression with feature selection," Applied Mathematics and Computation, Elsevier, vol. 305(C), pages 32-52.
    7. Mirosław Krzyśko & Łukasz Smaga, 2017. "An Application Of Functional Multivariate Regression Model To Multiclass Classification," Statistics in Transition New Series, Polish Statistical Association, vol. 18(3), pages 433-442, September.
    8. Caner, Mehmet, 2023. "Generalized linear models with structured sparsity estimators," Journal of Econometrics, Elsevier, vol. 236(2).
    9. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    10. Olga Klopp & Marianna Pensky, 2013. "Sparse High-dimensional Varying Coefficient Model : Non-asymptotic Minimax Study," Working Papers 2013-30, Center for Research in Economics and Statistics.
    11. Jiang, Cuixia & Xiong, Wei & Xu, Qifa & Liu, Yezheng, 2021. "Predicting default of listed companies in mainland China via U-MIDAS Logit model with group lasso penalty," Finance Research Letters, Elsevier, vol. 38(C).
    12. Osamu Komori & Shinto Eguchi & John B. Copas, 2015. "Generalized t-statistic for two-group classification," Biometrics, The International Biometric Society, vol. 71(2), pages 404-416, June.
    13. Wei, Fengrong & Zhu, Hongxiao, 2012. "Group coordinate descent algorithms for nonconvex penalized regression," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 316-326.
    14. Luu, Tung Duy & Fadili, Jalal & Chesneau, Christophe, 2019. "PAC-Bayesian risk bounds for group-analysis sparse regression by exponential weighting," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 209-233.
    15. Zeng, Yaohui & Yang, Tianbao & Breheny, Patrick, 2021. "Hybrid safe–strong rules for efficient optimization in lasso-type problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    16. Lee, In Gyu & Yoon, Sang Won & Won, Daehan, 2022. "A Mixed Integer Linear Programming Support Vector Machine for Cost-Effective Group Feature Selection: Branch-Cut-and-Price Approach," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1055-1068.
    17. Ruidi Chen & Ioannis Ch. Paschalidis, 2022. "Robust Grouped Variable Selection Using Distributionally Robust Optimization," Journal of Optimization Theory and Applications, Springer, vol. 194(3), pages 1042-1071, September.
    18. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    19. A. Karagrigoriou & C. Koukouvinos & K. Mylona, 2010. "On the advantages of the non-concave penalized likelihood model selection method with minimum prediction errors in large-scale medical studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(1), pages 13-24.
    20. Liu, Jianyu & Yu, Guan & Liu, Yufeng, 2019. "Graph-based sparse linear discriminant analysis for high-dimensional classification," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 250-269.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:90:y:2015:i:c:p:15-35. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.