IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i4p1208-1217.html
   My bibliography  Save this article

A similarity measure to assess the stability of classification trees

Author

Listed:
  • Briand, Bénédicte
  • Ducharme, Gilles R.
  • Parache, Vanessa
  • Mercat-Rommens, Catherine

Abstract

It has been recognized that Classification trees (CART) are unstable; a small perturbation in the input variables or a fresh sample can lead to a very different classification tree. Some approaches exist that try to correct this instability. However, their benefits can, at present, be appreciated only qualitatively. A similarity measure between two classification trees is introduced that can measure their closeness. Its usefulness is illustrated with synthetic data on the impact of radioactivity deposit through the environment. In this context, a modified node level stabilizing technique, referred to as the NLS-REP method, is introduced and shown to be more stable than the classical CART method.

Suggested Citation

  • Briand, Bénédicte & Ducharme, Gilles R. & Parache, Vanessa & Mercat-Rommens, Catherine, 2009. "A similarity measure to assess the stability of classification trees," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1208-1217, February.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1208-1217
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00497-0
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Archer, Kellie J. & Kimes, Ryan V., 2008. "Empirical characterization of random forest variable importance measures," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2249-2260, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aniek Sies & Iven Mechelen, 2020. "C443: a Methodology to See a Forest for the Trees," Journal of Classification, Springer;The Classification Society, vol. 37(3), pages 730-753, October.
    2. Karolis Matikonis & Matthew Gobey, 2024. "Small Business Property Tax Reductions and Firm Productivity," Small Business Economics, Springer, vol. 62(1), pages 307-324, January.
    3. Piccarreta, Raffaella, 2010. "Binary trees for dissimilarity data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1516-1524, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Binh Thai Pham & Chongchong Qi & Lanh Si Ho & Trung Nguyen-Thoi & Nadhir Al-Ansari & Manh Duc Nguyen & Huu Duy Nguyen & Hai-Bang Ly & Hiep Van Le & Indra Prakash, 2020. "A Novel Hybrid Soft Computing Model Using Random Forest and Particle Swarm Optimization for Estimation of Undrained Shear Strength of Soil," Sustainability, MDPI, vol. 12(6), pages 1-16, March.
    2. Lamperti, Francesco & Roventini, Andrea & Sani, Amir, 2018. "Agent-based model calibration using machine learning surrogates," Journal of Economic Dynamics and Control, Elsevier, vol. 90(C), pages 366-389.
    3. Jung-sik Hong & Hyeongyu Yeo & Nam-Wook Cho & Taeuk Ahn, 2018. "Identification of Core Suppliers Based on E-Invoice Data Using Supervised Machine Learning," JRFM, MDPI, vol. 11(4), pages 1-13, October.
    4. Mohamed Zine & Fouzi Harrou & Mohammed Terbeche & Mohammed Bellahcene & Abdelkader Dairi & Ying Sun, 2023. "E-Learning Readiness Assessment Using Machine Learning Methods," Sustainability, MDPI, vol. 15(11), pages 1-22, June.
    5. repec:hal:spmain:info:hdl:2441/13thfd12aa8rmplfudlgvgahff is not listed on IDEAS
    6. Chen, Enhui & Stathopoulos, Amanda & Nie, Yu (Marco), 2022. "Transfer station choice in a multimodal transit system: An empirical study," Transportation Research Part A: Policy and Practice, Elsevier, vol. 165(C), pages 337-355.
    7. Yigit Aydede & Jan Ditzen, 2022. "Identifying the regional drivers of influenza-like illness in Nova Scotia with dominance analysis," Papers 2212.06684, arXiv.org.
    8. Lotfi Boudabsa & Damir Filipovi'c, 2022. "Ensemble learning for portfolio valuation and risk management," Papers 2204.05926, arXiv.org.
    9. Lorilla, Roxanne Suzette & Poirazidis, Konstantinos & Detsis, Vassilis & Kalogirou, Stamatis & Chalkias, Christos, 2020. "Socio-ecological determinants of multiple ecosystem services on the Mediterranean landscapes of the Ionian Islands (Greece)," Ecological Modelling, Elsevier, vol. 422(C).
    10. De Bock, Koen W. & Coussement, Kristof & Van den Poel, Dirk, 2010. "Ensemble classification based on generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1535-1546, June.
    11. Zeynep Ceylan & Abdulkadir Atalan, 2021. "Estimation of healthcare expenditure per capita of Turkey using artificial intelligence techniques with genetic algorithm‐based feature selection," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(2), pages 279-290, March.
    12. Ollech, Daniel & Webel, Karsten, 2020. "A random forest-based approach to identifying the most informative seasonality tests," Discussion Papers 55/2020, Deutsche Bundesbank.
    13. Ilias Thomas & Alex M. Dickens & Jussi P. Posti & Endre Czeiter & Daniel Duberg & Tim Sinioja & Matilda Kråkström & Isabel R. A. Retel Helmrich & Kevin K. W. Wang & Andrew I. R. Maas & Ewout W. Steyer, 2022. "Serum metabolome associated with severity of acute traumatic brain injury," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    14. Lu, Xuefei & Baraldi, Piero & Zio, Enrico, 2020. "A data-driven framework for identifying important components in complex systems," Reliability Engineering and System Safety, Elsevier, vol. 204(C).
    15. Mahyar Jahaninasab & Ehsan Taheran & S. Alireza Zarabadi & Mohammadreza Aghaei & Ali Rajabpour, 2023. "A Novel Approach for Reducing Feature Space Dimensionality and Developing a Universal Machine Learning Model for Coated Tubes in Cross-Flow Heat Exchangers," Energies, MDPI, vol. 16(13), pages 1-13, July.
    16. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    17. Amini, Shahram & Elmore, Ryan & Öztekin, Özde & Strauss, Jack, 2021. "Can machines learn capital structure dynamics?," Journal of Corporate Finance, Elsevier, vol. 70(C).
    18. Rokach, Lior, 2009. "Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4046-4072, October.
    19. Gilletly, Samuel D. & Jackson, Nicole D. & Staid, Andrea, 2023. "Evaluating the impact of wildfire smoke on solar photovoltaic production," Applied Energy, Elsevier, vol. 348(C).
    20. Wei, Pengfei & Lu, Zhenzhou & Song, Jingwen, 2015. "Variable importance analysis: A comprehensive review," Reliability Engineering and System Safety, Elsevier, vol. 142(C), pages 399-432.
    21. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1208-1217. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.