IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i12p4046-4072.html
   My bibliography  Save this article

Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Author

Listed:
  • Rokach, Lior

Abstract

Ensemble methodology, which builds a classification model by integrating multiple classifiers, can be used for improving prediction performance. Researchers from various disciplines such as statistics, pattern recognition, and machine learning have seriously explored the use of ensemble methodology. This paper presents an updated survey of ensemble methods in classification tasks, while introducing a new taxonomy for characterizing them. The new taxonomy, presented from the algorithm designer's point of view, is based on five dimensions: inducer, combiner, diversity, size, and members' dependency. We also propose several selection criteria, presented from the practitioner's point of view, for choosing the most suitable ensemble method.

Suggested Citation

  • Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4046-4072
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00263-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Merler, Stefano & Caprile, Bruno & Furlanello, Cesare, 2007. "Parallelizing AdaBoost by weights dynamics," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2487-2498, February.
    2. Adem, Jan & Gochet, Willy, 2004. "Aggregating classifiers with mathematical programming," Computational Statistics & Data Analysis, Elsevier, vol. 47(4), pages 791-807, November.
    3. Croux, Christophe & Joossens, Kristel & Lemmens, Aurelie, 2007. "Trimmed bagging," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 362-368, September.
    4. Hothorn, Torsten & Lausen, Berthold, 2005. "Bundling classifiers by bagging trees," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1068-1078, June.
    5. Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.
    6. Drucker, Harris, 2002. "Effect of pruning and early stopping on performance of a boosting ensemble," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 393-406, February.
    7. Buttrey, Samuel E. & Karo, Ciril, 2002. "Using k-nearest-neighbor classification in the leaves of a tree," Computational Statistics & Data Analysis, Elsevier, vol. 40(1), pages 27-37, July.
    8. Sexton, Joseph & Laake, Petter, 2008. "LogitBoost with errors-in-variables," Computational Statistics & Data Analysis, Elsevier, vol. 52(5), pages 2549-2559, January.
    9. Kim, Yuwon & Koo, Ja-Yong, 2005. "Inverse boosting for monotone regression functions," Computational Statistics & Data Analysis, Elsevier, vol. 49(3), pages 757-770, June.
    10. Tsao, C. Andy & Chang, Yuan-chin Ivan, 2007. "A stochastic approximation view of boosting," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 325-334, September.
    11. Christmann, Andreas & Steinwart, Ingo & Hubert, Mia, 2007. "Robust learning from bites for data mining," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 347-361, September.
    12. Denison, D. G. T. & Adams, N. M. & Holmes, C. C. & Hand, D. J., 2002. "Bayesian partition modelling," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 475-485, February.
    13. Gey, Servane & Poggi, Jean-Michel, 2006. "Boosting and instability for regression trees," Computational Statistics & Data Analysis, Elsevier, vol. 50(2), pages 533-550, January.
    14. Ahn, Hongshik & Moon, Hojin & Fazzari, Melissa J. & Lim, Noha & Chen, James J. & Kodell, Ralph L., 2007. "Classification by ensembles from random partitions of high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6166-6179, August.
    15. Moskovitch, Robert & Elovici, Yuval & Rokach, Lior, 2008. "Detection of unknown computer worms based on behavioral classification of the host," Computational Statistics & Data Analysis, Elsevier, vol. 52(9), pages 4544-4566, May.
    16. Menahem, Eitan & Shabtai, Asaf & Rokach, Lior & Elovici, Yuval, 2009. "Improving malware detection by applying multi-inducer ensemble," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1483-1494, February.
    17. Zhang, Chun-Xia & Zhang, Jiang-She, 2008. "A local boosting algorithm for solving classification problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1928-1941, January.
    18. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    19. Rokach, Lior, 2009. "Collective-agreement-based pruning of ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1015-1026, February.
    20. Ridgeway, Greg, 2002. "Looking for lumps: boosting and bagging for density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 379-392, February.
    21. Yuval Elovici & Bracha Shapira & Paul B. Kantor, 2006. "A decision theoretic approach to combining information filters: An analytical and empirical evaluation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 306-320, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1, March.
    2. Kesriklioğlu, Esma & Oktay, Erkan & Karaaslan, Abdulkerim, 2023. "Predicting total household energy expenditures using ensemble learning methods," Energy, Elsevier, vol. 276(C).
    3. Barrow, Devon K. & Crone, Sven F., 2016. "A comparison of AdaBoost algorithms for time series forecast combination," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1103-1119.
    4. Marie-Hélène Roy & Denis Larocque, 2012. "Robustness of random forests for regression," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 993-1006, December.
    5. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    6. Chun-Xia Zhang & Guan-Wei Wang & Jiang-She Zhang, 2012. "An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 829-850, September.
    7. Xudong Hu & Hongbo Mei & Han Zhang & Yuanyuan Li & Mengdi Li, 2021. "Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 105(2), pages 1663-1689, January.
    8. Mojirsheibani, Majid & Kong, Jiajie, 2016. "An asymptotically optimal kernel combined classifier," Statistics & Probability Letters, Elsevier, vol. 119(C), pages 91-100.
    9. Tsai, Chih-Fong & Sue, Kuen-Liang & Hu, Ya-Han & Chiu, Andy, 2021. "Combining feature selection, instance selection, and ensemble classification techniques for improved financial distress prediction," Journal of Business Research, Elsevier, vol. 130(C), pages 200-209.
    10. Sergio Davalos & Fei Leng & Ehsan H. Feroz & Zhiyan Cao, 2014. "Designing An If–Then Rules‐Based Ensemble Of Heterogeneous Bankruptcy Classifiers: A Genetic Algorithm Approach," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 21(3), pages 129-153, July.
    11. Adler, Werner & Brenning, Alexander & Potapov, Sergej & Schmid, Matthias & Lausen, Berthold, 2011. "Ensemble classification of paired data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1933-1941, May.
    12. Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.
    13. Chun-Xia Zhang & Jiang-She Zhang & Sang-Woon Kim, 2016. "PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection," Computational Statistics, Springer, vol. 31(4), pages 1237-1262, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Chun-Xia & Zhang, Jiang-She & Zhang, Gai-Ying, 2009. "Using Boosting to prune Double-Bagging ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1218-1231, February.
    2. Tsao, C. Andy & Chang, Yuan-chin Ivan, 2007. "A stochastic approximation view of boosting," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 325-334, September.
    3. De Bock, Koen W. & Coussement, Kristof & Van den Poel, Dirk, 2010. "Ensemble classification based on generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1535-1546, June.
    4. Adler, Werner & Lausen, Berthold, 2009. "Bootstrap estimated true and false positive rates and ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 718-729, January.
    5. Ollech, Daniel & Webel, Karsten, 2020. "A random forest-based approach to identifying the most informative seasonality tests," Discussion Papers 55/2020, Deutsche Bundesbank.
    6. Rokach, Lior, 2009. "Collective-agreement-based pruning of ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1015-1026, February.
    7. Stefan Lessmann & Stefan Voß, 2010. "Customer-Centric Decision Support," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 2(2), pages 79-93, April.
    8. Chung, Dongjun & Kim, Hyunjoong, 2015. "Accurate ensemble pruning with PL-bagging," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 1-13.
    9. Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
    10. Croux, Christophe & Joossens, Kristel & Lemmens, Aurelie, 2007. "Trimmed bagging," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 362-368, September.
    11. Chun-Xia Zhang & Guan-Wei Wang & Jiang-She Zhang, 2012. "An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 829-850, September.
    12. Martinez, Waldyn & Gray, J. Brian, 2016. "Noise peeling methods to improve boosting algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 483-497.
    13. Mansoor, Umer & Jamal, Arshad & Su, Junbiao & Sze, N.N. & Chen, Anthony, 2023. "Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations," Transport Policy, Elsevier, vol. 139(C), pages 21-38.
    14. Binh Thai Pham & Chongchong Qi & Lanh Si Ho & Trung Nguyen-Thoi & Nadhir Al-Ansari & Manh Duc Nguyen & Huu Duy Nguyen & Hai-Bang Ly & Hiep Van Le & Indra Prakash, 2020. "A Novel Hybrid Soft Computing Model Using Random Forest and Particle Swarm Optimization for Estimation of Undrained Shear Strength of Soil," Sustainability, MDPI, vol. 12(6), pages 1-16, March.
    15. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    16. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    17. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    18. Tim Voigt & Martin Kohlhase & Oliver Nelles, 2021. "Incremental DoE and Modeling Methodology with Gaussian Process Regression: An Industrially Applicable Approach to Incorporate Expert Knowledge," Mathematics, MDPI, vol. 9(19), pages 1-26, October.
    19. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    20. Zhu, Haibin & Bai, Lu & He, Lidan & Liu, Zhi, 2023. "Forecasting realized volatility with machine learning: Panel data perspective," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 251-271.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4046-4072. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.