Ensemble classification based on generalized additive models
Generalized additive models (GAMs) are a generalization of generalized linear models (GLMs) and constitute a powerful technique which has successfully proven its ability to capture nonlinear relationships between explanatory variables and a response variable in many domains. In this paper, GAMs are proposed as base classifiers for ensemble learning. Three alternative ensemble strategies for binary classification using GAMs as base classifiers are proposed: (i) GAMbag based on Bagging, (ii) GAMrsm based on the Random Subspace Method (RSM), and (iii) GAMens as a combination of both. In an experimental validation performed on 12 data sets from the UCI repository, the proposed algorithms are benchmarked to a single GAM and to decision tree based ensemble classifiers (i.e. RSM, Bagging, Random Forest, and the recently proposed Rotation Forest). From the results a number of conclusions can be drawn. Firstly, the use of an ensemble of GAMs instead of a single GAM always leads to improved prediction performance. Secondly, GAMrsm and GAMens perform comparably, while both versions outperform GAMbag. Finally, the value of using GAMs as base classifiers in an ensemble instead of standard decision trees is demonstrated. GAMbag demonstrates comparable performance to ordinary Bagging. Moreover, GAMrsm and GAMens outperform RSM and Bagging, while these two GAM ensemble variations perform comparably to Random Forest and Rotation Forest. Sensitivity analyses are included for the number of member classifiers in the ensemble, the number of variables included in a random feature subspace and the number of degrees of freedom for GAM spline estimation.
|Date of creation:||Feb 2010|
|Contact details of provider:|| Web page: http://research.hubrussel.be|
More information through EDIRC
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Zwane, E. N. & van der Heijden, P. G. M., 2004. "Semiparametric models for capture-recapture studies with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 47(4), pages 729-743, November.
- Croux, Christophe & Joossens, Kristel & Lemmens, Aurelie, 2007. "Trimmed bagging," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 362-368, September.
- Hothorn, Torsten & Lausen, Berthold, 2005. "Bundling classifiers by bagging trees," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1068-1078, June.
- Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.
- Marx, Brian D. & Eilers, Paul H. C., 1998. "Direct generalized additive modeling with penalized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 28(2), pages 193-209, August.
- Borra, Simone & Di Ciaccio, Agostino, 2002. "Improving nonparametric regression methods by bagging and boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 407-420, February.
- Baccini, Michela & Biggeri, Annibale & Lagazio, Corrado & Lertxundi, Aitana & Saez, Marc, 2007. "Parametric and semi-parametric approaches in the analysis of short-term effects of air pollution on health," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4324-4336, May.
- A. Prinzie & D. Van Den Poel, 2007. "Random Forrests for Multiclass classification: Random Multinomial Logit," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 07/435, Ghent University, Faculty of Economics and Business Administration.