An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes

My bibliography Save this article

An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes

Author

Listed:

Chun-Xia Zhang
Guan-Wei Wang
Jiang-She Zhang

Registered:

Abstract

DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) is a classifier combination technique to construct a set of diverse base classifiers using additional artificially generated training instances. The predictions from the base classifiers are then integrated into one by the mean combination rule. In order to gain more insight about its effectiveness and advantages, this paper utilizes a large experiment to study the bias--variance analysis of DECORATE as well as some other widely used ensemble methods (such as bagging, AdaBoost, random forest) at different training sample sizes. The experimental results yield the following conclusions. For small training sets, DECORATE has a dominant advantage over its rivals and its success is attributed to the larger bias reduction achieved by it than the other algorithms. With increase in training data, AdaBoost benefits most and the bias reduced by it gradually turns to be significant while its variance reduction is also medium. Thus, AdaBoost performs best with large training samples. Moreover, random forest behaves always second best regardless of small or large training sets and it is seen to mainly decrease variance while maintaining low bias. Bagging seems to be an intermediate one since it reduces variance primarily.

Suggested Citation

Chun-Xia Zhang & Guan-Wei Wang & Jiang-She Zhang, 2012. "An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 829-850, September.

Handle: RePEc:taf:japsta:v:39:y:2012:i:4:p:829-850
DOI: 10.1080/02664763.2011.620949

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Zhang, Chun-Xia & Zhang, Jiang-She, 2008. "A local boosting algorithm for solving classification problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1928-1941, January.
Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
Tsao, C. Andy & Chang, Yuan-chin Ivan, 2007. "A stochastic approximation view of boosting," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 325-334, September.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
Chun-Xia Zhang & Guan-Wei Wang & Jun-Min Liu, 2015. "RandGA: injecting randomness into parallel genetic algorithm for variable selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 630-647, March.
Zhang, Chun-Xia & Zhang, Jiang-She & Zhang, Gai-Ying, 2009. "Using Boosting to prune Double-Bagging ensembles," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1218-1231, February.
Chun-Xia Zhang & Jiang-She Zhang & Sang-Woon Kim, 2016. "PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection," Computational Statistics, Springer, vol. 31(4), pages 1237-1262, December.
John Martin & Sona Taheri & Mali Abdollahian, 2024. "Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards," Mathematics, MDPI, vol. 12(6), pages 1, March.
Bernd Bischl & Julia Schiffner & Claus Weihs, 2013. "Benchmarking local classification methods," Computational Statistics, Springer, vol. 28(6), pages 2599-2619, December.
Barrow, Devon K. & Crone, Sven F., 2016. "A comparison of AdaBoost algorithms for time series forecast combination," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1103-1119.
Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
Ivan Chang, Yuan-Chin & Huang, Yufen & Huang, Yu-Pai, 2010. "Early stopping in L2Boosting," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2203-2213, October.
Sergio Davalos & Fei Leng & Ehsan H. Feroz & Zhiyan Cao, 2014. "Designing An If–Then Rules‐Based Ensemble Of Heterogeneous Bankruptcy Classifiers: A Genetic Algorithm Approach," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 21(3), pages 129-153, July.
Jasdeep S. Banga & B. Wade Brorsen, 2019. "Profitability of alternative methods of combining the signals from technical trading systems," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 26(1), pages 32-45, January.
Kesriklioğlu, Esma & Oktay, Erkan & Karaaslan, Abdulkerim, 2023. "Predicting total household energy expenditures using ensemble learning methods," Energy, Elsevier, vol. 276(C).
Adler, Werner & Brenning, Alexander & Potapov, Sergej & Schmid, Matthias & Lausen, Berthold, 2011. "Ensemble classification of paired data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1933-1941, May.
Mojirsheibani, Majid & Kong, Jiajie, 2016. "An asymptotically optimal kernel combined classifier," Statistics & Probability Letters, Elsevier, vol. 119(C), pages 91-100.
Zhang, Mingzhu & He, Changzheng & Gu, Xin & Liatsis, Panos & Zhu, Bing, 2013. "D-GMDH: A novel inductive modelling approach in the forecasting of the industrial economy," Economic Modelling, Elsevier, vol. 30(C), pages 514-520.
Marie-Hélène Roy & Denis Larocque, 2012. "Robustness of random forests for regression," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 993-1006, December.
Xudong Hu & Hongbo Mei & Han Zhang & Yuanyuan Li & Mengdi Li, 2021. "Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 105(2), pages 1663-1689, January.
Martinez, Waldyn & Gray, J. Brian, 2016. "Noise peeling methods to improve boosting algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 483-497.
Tsai, Chih-Fong & Sue, Kuen-Liang & Hu, Ya-Han & Chiu, Andy, 2021. "Combining feature selection, instance selection, and ensemble classification techniques for improved financial distress prediction," Journal of Business Research, Elsevier, vol. 130(C), pages 200-209.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:39:y:2012:i:4:p:829-850. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

An empirical bias--variance analysis of DECORATE ensemble method at different training sample sizes

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data