IDEAS home Printed from https://ideas.repec.org/a/bla/istatr/v82y2014i3p329-348.html
   My bibliography  Save this article

Fifty Years of Classification and Regression Trees

Author

Listed:
  • Wei-Yin Loh

Abstract

type="main" xml:id="insr12016-abs-0001"> Fifty years have passed since the publication of the first regression tree algorithm. New techniques have added capabilities that far surpass those of the early methods. Modern classification trees can partition the data with linear splits on subsets of variables and fit nearest neighbor, kernel density, and other models in the partitions. Regression trees can fit almost every kind of traditional statistical model, including least-squares, quantile, logistic, Poisson, and proportional hazards models, as well as models for longitudinal and multiresponse data. Greater availability and affordability of software (much of which is free) have played a significant role in helping the techniques gain acceptance and popularity in the broader scientific community. This article surveys the developments and briefly reviews the key ideas behind some of the major algorithms.

Suggested Citation

  • Wei-Yin Loh, 2014. "Fifty Years of Classification and Regression Trees," International Statistical Review, International Statistical Institute, vol. 82(3), pages 329-348, December.
  • Handle: RePEc:bla:istatr:v:82:y:2014:i:3:p:329-348
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1111/insr.12016
    Download Restriction: Access to full text is restricted to subscribers.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lee, Paul H. & Yu, Philip L.H., 2010. "Distance-based tree models for ranking data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1672-1682, June.
    2. Ahn, Hongshik, 1996. "Log-normal regression modeling through recursive partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 21(4), pages 381-398, April.
    3. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    4. Ciampi, Antonio, 1991. "Generalized regression trees," Computational Statistics & Data Analysis, Elsevier, vol. 12(1), pages 57-78, August.
    5. David R. Larsen & Paul L. Speckman, 2004. "Multivariate Regression Trees for Analysis of Abundance Data," Biometrics, The International Biometric Society, vol. 60(2), pages 543-549, June.
    6. Hothorn, Torsten & Lausen, Berthold, 2005. "Bundling classifiers by bagging trees," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1068-1078, June.
    7. Gao, Feng & Manatunga, Amita K. & Chen, Shande, 2004. "Identification of prognostic factors with multivariate survival data," Computational Statistics & Data Analysis, Elsevier, vol. 45(4), pages 813-824, May.
    8. Buttrey, Samuel E. & Karo, Ciril, 2002. "Using k-nearest-neighbor classification in the leaves of a tree," Computational Statistics & Data Analysis, Elsevier, vol. 40(1), pages 27-37, July.
    9. Elise Dusseldorp & Jacqueline Meulman, 2004. "The regression trunk approach to discover treatment covariate interaction," Psychometrika, Springer;The Psychometric Society, vol. 69(3), pages 355-374, September.
    10. Hsiao, Wei-Cheng & Shih, Yu-Shan, 2007. "Splitting variable selection for multivariate regression trees," Statistics & Probability Letters, Elsevier, vol. 77(3), pages 265-271, February.
    11. Shih, Yu-Shan & Tsai, Hsin-Wen, 2004. "Variable selection bias in regression trees with constant fits," Computational Statistics & Data Analysis, Elsevier, vol. 45(3), pages 595-607, April.
    12. Keon Lee, Seong, 2005. "On generalized multivariate decision tree by using GEE," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 1105-1119, June.
    13. Gray, J. Brian & Fan, Guangzhe, 2008. "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1362-1372, January.
    14. Loh, Wei-Yin, 1991. "Survival modeling through recursive stratification," Computational Statistics & Data Analysis, Elsevier, vol. 12(3), pages 295-313, November.
    15. Harper, Paul R., 2005. "A review and comparison of classification algorithms for medical decision making," Health Policy, Elsevier, vol. 71(3), pages 315-331, March.
    16. Fan, Juanjuan & Su, Xiao-Gang & Levine, Richard A. & Nunn, Martha E. & LeBlanc, Michael, 2006. "Trees for Correlated Survival Data by Goodness of Split, With Applications to Tooth Prognosis," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 959-967, September.
    17. Taddy, Matthew A. & Gramacy, Robert B. & Polson, Nicholas G., 2011. "Dynamic Trees for Learning and Design," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 109-123.
    18. Hemant Ishwaran & Eugene H. Blackstone & Claire E. Pothier & Michael S. Lauer, 2004. "Relative Risk Forests for Exercise Heart Rate Recovery as a Predictor of Mortality," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 591-600, January.
    19. Ciampi, Antonio & Thiffault, Johanne & Nakache, Jean-Pierre & Asselain, Bernard, 1986. "Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 4(3), pages 185-204, October.
    20. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    21. Xiaogang Su & Juanjuan Fan, 2004. "Multivariate Survival Trees: A Maximum Likelihood Approach Based on Frailty Models," Biometrics, The International Biometric Society, vol. 60(1), pages 93-99, March.
    22. Choi, Yunhee & Ahn, Hongshik & Chen, James J., 2005. "Regression trees for analysis of count data with extra Poisson variation," Computational Statistics & Data Analysis, Elsevier, vol. 49(3), pages 893-915, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. repec:eee:intfor:v:35:y:2019:i:1:p:297-312 is not listed on IDEAS
    2. Renato Bruni & Gianpiero Bianchi, 2018. "Robustness Analysis of a Website Categorization Procedure based on Machine Learning," DIAG Technical Reports 2018-04, Department of Computer, Control and Management Engineering, Universita' degli Studi di Roma "La Sapienza".
    3. repec:gam:jeners:v:12:y:2019:i:13:p:2530-:d:244639 is not listed on IDEAS
    4. repec:eee:ejores:v:278:y:2019:i:2:p:514-532 is not listed on IDEAS
    5. repec:ksa:szemle:1775 is not listed on IDEAS
    6. repec:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0332-3 is not listed on IDEAS
    7. repec:gam:jeners:v:10:y:2017:i:5:p:607-:d:97318 is not listed on IDEAS
    8. repec:spr:compst:v:34:y:2019:i:4:d:10.1007_s00180-019-00894-y is not listed on IDEAS

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:istatr:v:82:y:2014:i:3:p:329-348. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Wiley Content Delivery). General contact details of provider: http://edirc.repec.org/data/isiiinl.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.