IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v39y2022i1d10.1007_s00357-021-09397-2.html
   My bibliography  Save this article

Comparing Boosting and Bagging for Decision Trees of Rankings

Author

Listed:
  • Antonella Plaia

    (University of Palermo)

  • Simona Buscemi

    (University of Palermo)

  • Johannes Fürnkranz

    (Johannes Kepler University Linz)

  • Eneldo Loza Mencía

    (Technische Universität Darmstadt)

Abstract

Decision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference data. In this paper, we extend decision trees, and in the following also ensemble methods to ranking data. In particular, we propose a theoretical and computational definition of bagging and boosting, two of the best known ensemble methods. In an experimental study using simulated data and real-world datasets, our results confirm that known results from classification, such as that boosting outperforms bagging, could be successfully carried over to the ranking case.

Suggested Citation

  • Antonella Plaia & Simona Buscemi & Johannes Fürnkranz & Eneldo Loza Mencía, 2022. "Comparing Boosting and Bagging for Decision Trees of Rankings," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 78-99, March.
  • Handle: RePEc:spr:jclass:v:39:y:2022:i:1:d:10.1007_s00357-021-09397-2
    DOI: 10.1007/s00357-021-09397-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-021-09397-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-021-09397-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Antonella Plaia & Mariangela Sciandra, 2019. "Weighted distance-based trees for ranking data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 427-444, June.
    2. Lee, Paul H. & Yu, Philip L.H., 2010. "Distance-based tree models for ranking data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1672-1682, June.
    3. Antonella Plaia & Simona Buscemi & Mariangela Sciandra, 2021. "Consensus among preference rankings: a new weighted correlation coefficient for linear and weak orderings," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 1015-1037, December.
    4. Alfaro, Esteban & Gamez, Matias & García, Noelia, 2013. "adabag: An R Package for Classification with Boosting and Bagging," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i02).
    5. Peter Hall & Michael G. Schimek, 2012. "Moderate-Deviation-Based Inference for Random Degeneration in Paired Rank Lists," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 661-672, June.
    6. Daniel Müllensiefen & Christian Hennig & Hedie Howells, 2018. "Using clustering of rankings to explain brand preferences with personality and socio-demographic variables," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(6), pages 1009-1029, April.
    7. Can, Burak, 2014. "Weighted distances between preferences," Journal of Mathematical Economics, Elsevier, vol. 51(C), pages 109-115.
    8. Antonio D’Ambrosio & Willem J. Heiser, 2016. "A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 774-794, September.
    9. Švendová, Vendula & Schimek, Michael G., 2017. "A novel method for estimating the common signals for consensus across multiple ranked lists," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 122-135.
    10. Piccarreta, Raffaella, 2010. "Binary trees for dissimilarity data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1516-1524, June.
    11. Amodio, S. & D’Ambrosio, A. & Siciliano, R., 2016. "Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the Kemeny axiomatic approach," European Journal of Operational Research, Elsevier, vol. 249(2), pages 667-676.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antonio D’Ambrosio & Carmela Iorio & Michele Staiano & Roberta Siciliano, 2019. "Median constrained bucket order rank aggregation," Computational Statistics, Springer, vol. 34(2), pages 787-802, June.
    2. Antonella Plaia & Simona Buscemi & Mariangela Sciandra, 2021. "Consensus among preference rankings: a new weighted correlation coefficient for linear and weak orderings," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 1015-1037, December.
    3. Cascón, J.M. & González-Arteaga, T. & de Andrés Calle, R., 2022. "A new preference classification approach: The λ-dissensus cluster algorithm," Omega, Elsevier, vol. 111(C).
    4. Antonella Plaia & Mariangela Sciandra, 2019. "Weighted distance-based trees for ranking data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 427-444, June.
    5. Yoo, Yeawon & Escobedo, Adolfo R. & Skolfield, J. Kyle, 2020. "A new correlation coefficient for comparing and aggregating non-strict and incomplete rankings," European Journal of Operational Research, Elsevier, vol. 285(3), pages 1025-1041.
    6. Yu-Shan Shih & Kuang-Hsun Liu, 2019. "Regression trees for detecting preference patterns from rank data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 683-702, September.
    7. Antonio D’Ambrosio & Willem J. Heiser, 2016. "A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances," Psychometrika, Springer;The Psychometric Society, vol. 81(3), pages 774-794, September.
    8. Can, Burak & Pourpouneh, Mohsen & Storcken, Ton, 2017. "Cost of transformation: a measure on matchings," Research Memorandum 015, Maastricht University, Graduate School of Business and Economics (GSBE).
    9. Carmela Iorio & Giuseppe Pandolfo & Antonio D’Ambrosio & Roberta Siciliano, 2020. "Mining big data in tourism," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(5), pages 1655-1669, December.
    10. Fuchs Sebastian & McCord Yann, 2019. "On the lower bound of Spearman’s footrule," Dependence Modeling, De Gruyter, vol. 7(1), pages 126-132, January.
    11. Amaya, Johanna & Arellana, Julian & Delgado-Lindeman, Maira, 2020. "Stakeholders perceptions to sustainable urban freight policies in emerging markets," Transportation Research Part A: Policy and Practice, Elsevier, vol. 132(C), pages 329-348.
    12. Matthias Studer & Gilbert Ritschard & Alexis Gabadinho & Nicolas S. Müller, 2011. "Discrepancy Analysis of State Sequences," Sociological Methods & Research, , vol. 40(3), pages 471-510, August.
    13. Akbari, Sina & Escobedo, Adolfo R., 2023. "Beyond kemeny rank aggregation: A parameterizable-penalty framework for robust ranking aggregation with ties," Omega, Elsevier, vol. 119(C).
    14. Gong, Joonho & Kim, Hyunjoong, 2017. "RHSBoost: Improving classification performance in imbalance data," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 1-13.
    15. Eric Kamwa & Vincent Merlin, 2018. "The Likelihood of the Consistency of Collective Rankings under Preferences Aggregation with Four Alternatives using Scoring Rules: A General Formula and the Optimal Decision Rule," Working Papers hal-01757742, HAL.
    16. Matteo Brunelli & Michele Fedrizzi, 2019. "A general formulation for some inconsistency indices of pairwise comparisons," Annals of Operations Research, Springer, vol. 274(1), pages 155-169, March.
    17. João V. Ferreira & Erik Schokkaert & Benoît Tarroux, 2023. "How group deliberation affects individual distributional preferences: An experimental study," Working Papers 2301, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    18. Huremović, Kenan & Ozkes, Ali I., 2022. "Polarization in networks: Identification–alienation framework," Journal of Mathematical Economics, Elsevier, vol. 102(C).
    19. Donald Margaret R. & Wilson Susan R., 2017. "Comparison and visualisation of agreement for paired lists of rankings," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(1), pages 31-45, March.
    20. Pierpaolo D’Urso & Vincenzina Vitale, 2022. "A Kemeny Distance-Based Robust Fuzzy Clustering for Preference Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 600-647, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:39:y:2022:i:1:d:10.1007_s00357-021-09397-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.