IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v71y2014icp789-802.html
   My bibliography  Save this article

Classification with decision trees from a nonparametric predictive inference perspective

Author

Listed:
  • Abellán, Joaquín
  • Baker, Rebecca M.
  • Coolen, Frank P.A.
  • Crossman, Richard J.
  • Masegosa, Andrés R.

Abstract

An application of nonparametric predictive inference for multinomial data (NPI) to classification tasks is presented. This model is applied to an established procedure for building classification trees using imprecise probabilities and uncertainty measures, thus far used only with the imprecise Dirichlet model (IDM), that is defined through the use of a parameter expressing previous knowledge. The accuracy of that procedure of classification has a significant dependence on the value of the parameter used when the IDM is applied. A detailed study involving 40 data sets shows that the procedure using the NPI model (which has no parameter dependence) obtains a better trade-off between accuracy and size of tree than does the procedure when the IDM is used, whatever the choice of parameter. In a bias-variance study of the errors, it is proved that the procedure with the NPI model has a lower variance than the one with the IDM, implying a lower level of over-fitting.

Suggested Citation

  • Abellán, Joaquín & Baker, Rebecca M. & Coolen, Frank P.A. & Crossman, Richard J. & Masegosa, Andrés R., 2014. "Classification with decision trees from a nonparametric predictive inference perspective," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 789-802.
  • Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:789-802
    DOI: 10.1016/j.csda.2013.02.009
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947313000534
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2013.02.009?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Weijie & Yousef, Waleed A. & Gallas, Brandon D. & Hsu, Elizabeth R. & Lababidi, Samir & Tang, Rong & Pennello, Gene A. & Symmans, W. Fraser & Pusztai, Lajos, 2012. "Uncertainty estimation with a finite dataset in the assessment of classification models," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1016-1027.
    2. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    3. Abellán, Joaquín & Baker, Rebecca M. & Coolen, Frank P.A., 2011. "Maximising entropy on the nonparametric predictive inference model for multinomial data," European Journal of Operational Research, Elsevier, vol. 212(1), pages 112-122, July.
    4. Abellán, Joaquín & Masegosa, Andrés R., 2010. "An ensemble method using credal decision trees," European Journal of Operational Research, Elsevier, vol. 205(1), pages 218-226, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Frank PA Coolen & Tahani Coolen-Maturi & Abdullah H Al-nefaiee, 2014. "Nonparametric predictive inference for system reliability using the survival signature," Journal of Risk and Reliability, , vol. 228(5), pages 437-448, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Liangyuan Hu & Lihua Li, 2022. "Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series," IJERPH, MDPI, vol. 19(23), pages 1-13, December.
    2. Weijun Wang & Dan Zhao & Liguo Fan & Yulong Jia, 2019. "Study on Icing Prediction of Power Transmission Lines Based on Ensemble Empirical Mode Decomposition and Feature Selection Optimized Extreme Learning Machine," Energies, MDPI, vol. 12(11), pages 1-21, June.
    3. Abellán, Joaquín & Baker, Rebecca M. & Coolen, Frank P.A., 2011. "Maximising entropy on the nonparametric predictive inference model for multinomial data," European Journal of Operational Research, Elsevier, vol. 212(1), pages 112-122, July.
    4. Coolen-Maturi, Tahani & Elkhafifi, Faiza F. & Coolen, Frank P.A., 2014. "Three-group ROC analysis: A nonparametric predictive approach," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 69-81.
    5. Chikalov, Igor & Hussain, Shahid & Moshkov, Mikhail, 2018. "Bi-criteria optimization of decision trees with applications to data analysis," European Journal of Operational Research, Elsevier, vol. 266(2), pages 689-701.
    6. Silke Janitza & Ender Celik & Anne-Laure Boulesteix, 2018. "A computationally fast variable importance test for random forests for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(4), pages 885-915, December.
    7. Lkhagvadorj Munkhdalai & Tsendsuren Munkhdalai & Oyun-Erdene Namsrai & Jong Yun Lee & Keun Ho Ryu, 2019. "An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments," Sustainability, MDPI, vol. 11(3), pages 1-23, January.
    8. Houlding, B. & Coolen, F.P.A., 2012. "Nonparametric predictive utility inference," European Journal of Operational Research, Elsevier, vol. 221(1), pages 222-230.
    9. Cang, Shuang & Yu, Hongnian, 2014. "A combination selection algorithm on forecasting," European Journal of Operational Research, Elsevier, vol. 234(1), pages 127-139.
    10. Zardad Khan & Asma Gul & Aris Perperoglou & Miftahuddin Miftahuddin & Osama Mahmoud & Werner Adler & Berthold Lausen, 2020. "Ensemble of optimal trees, random forest and random projection ensemble classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 97-116, March.
    11. Lessmann, Stefan & Sung, Ming-Chien & Johnson, Johnnie E.V. & Ma, Tiejun, 2012. "A new methodology for generating and combining statistical forecasting models to enhance competitive event prediction," European Journal of Operational Research, Elsevier, vol. 218(1), pages 163-174.
    12. Jin Li & Maggie Tran & Justy Siwabessy, 2016. "Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-29, February.
    13. Saurabh Saxena & Darius Roman & Valentin Robu & David Flynn & Michael Pecht, 2021. "Battery Stress Factor Ranking for Accelerated Degradation Test Planning Using Machine Learning," Energies, MDPI, vol. 14(3), pages 1-17, January.
    14. Fellinghauer, Bernd & Bühlmann, Peter & Ryffel, Martin & von Rhein, Michael & Reinhardt, Jan D., 2013. "Stable graphical model estimation with Random Forests for discrete, continuous, and mixed variables," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 132-152.
    15. Bryan Keller, 2020. "Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 119-142, April.
    16. Hermel Homburger & Manuel K Schneider & Sandra Hilfiker & Andreas Lüscher, 2014. "Inferring Behavioral States of Grazing Livestock from High-Frequency Position Data Alone," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-22, December.
    17. Ingrida Vaiciulyte & Zivile Kalsyte & Leonidas Sakalauskas & Darius Plikynas, 2017. "Assessment of market reaction on the share performance on the basis of its visualization in 2D space," Journal of Business Economics and Management, Taylor & Francis Journals, vol. 18(2), pages 309-318, March.
    18. Fernández, Arturo J., 2012. "Minimizing the area of a Pareto confidence region," European Journal of Operational Research, Elsevier, vol. 221(1), pages 205-212.
    19. Frank PA Coolen & Tahani Coolen-Maturi & Abdullah H Al-nefaiee, 2014. "Nonparametric predictive inference for system reliability using the survival signature," Journal of Risk and Reliability, , vol. 228(5), pages 437-448, October.
    20. Azad, Mohammad & Moshkov, Mikhail, 2017. "Multi-stage optimization of decision and inhibitory trees for decision tables with many-valued decisions," European Journal of Operational Research, Elsevier, vol. 263(3), pages 910-921.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:789-802. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.