IDEAS home Printed from https://ideas.repec.org/a/eee/jomega/v107y2022ics0305048321001523.html
   My bibliography  Save this article

Interpreting clusters via prototype optimization

Author

Listed:
  • Carrizosa, Emilio
  • Kurishchenko, Kseniia
  • Marín, Alfredo
  • Romero Morales, Dolores

Abstract

In this paper, we tackle the problem of enhancing the interpretability of the results of Cluster Analysis. Our goal is to find an explanation for each cluster, such that clusters are characterized as precisely and distinctively as possible, i.e., the explanation is fulfilled by as many as possible individuals of the corresponding cluster, true positive cases, and by as few as possible individuals in the remaining clusters, false positive cases. We assume that a dissimilarity between the individuals is given, and propose distance-based explanations, namely those defined by individuals that are close to its so-called prototype. To find the set of prototypes, we address the biobjective optimization problem that maximizes the total number of true positive cases across all clusters and minimizes the total number of false positive cases, while controlling the true positive rate as well as the false positive rate in each cluster. We develop two mathematical optimization models, inspired by classic Location Analysis problems, that differ in the way individuals are allocated to prototypes. We illustrate the explanations provided by these models and their accuracy in both real-life data as well as simulated data.

Suggested Citation

  • Carrizosa, Emilio & Kurishchenko, Kseniia & Marín, Alfredo & Romero Morales, Dolores, 2022. "Interpreting clusters via prototype optimization," Omega, Elsevier, vol. 107(C).
  • Handle: RePEc:eee:jomega:v:107:y:2022:i:c:s0305048321001523
    DOI: 10.1016/j.omega.2021.102543
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0305048321001523
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.omega.2021.102543?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Onur Seref & Ya-Ju Fan & Wanpracha Art Chaovalitwongse, 2014. "Mathematical Programming Formulations and Algorithms for Discrete k-Median Clustering of Time-Series Data," INFORMS Journal on Computing, INFORMS, vol. 26(1), pages 160-172, February.
    2. Carrizosa, Emilio & Nogales-Gómez, Amaya & Romero Morales, Dolores, 2017. "Clustering categories in support vector machines," Omega, Elsevier, vol. 66(PA), pages 28-37.
    3. Febrero-Bande, Manuel & de la Fuente, Manuel Oviedo, 2012. "Statistical Computing in Functional Data Analysis: The R Package fda.usc," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i04).
    4. Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018. "Human Decisions and Machine Predictions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(1), pages 237-293.
    5. Bart Baesens & Rudy Setiono & Christophe Mues & Jan Vanthienen, 2003. "Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation," Management Science, INFORMS, vol. 49(3), pages 312-329, March.
    6. Emilio Carrizosa & Belén Martín-Barragán & Frank Plastria & Dolores Romero Morales, 2007. "On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification," INFORMS Journal on Computing, INFORMS, vol. 19(3), pages 470-479, August.
    7. Benítez-Peña, Sandra & Bogetoft, Peter & Romero Morales, Dolores, 2020. "Feature Selection in Data Envelopment Analysis: A Mathematical Optimization approach," Omega, Elsevier, vol. 96(C).
    8. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Doumpos, Michalis & Zopounidis, Constantin & Gounopoulos, Dimitrios & Platanakis, Emmanouil & Zhang, Wenke, 2023. "Operational research and artificial intelligence methods in banking," European Journal of Operational Research, Elsevier, vol. 306(1), pages 1-16.
    2. Emilio Carrizosa & Vanesa Guerrero & Dolores Romero Morales, 2023. "On mathematical optimization for clustering categories in contingency tables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 407-429, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emilio Carrizosa & Cristina Molero-Río & Dolores Romero Morales, 2021. "Mathematical optimization in classification and regression trees," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 5-33, April.
    2. Benítez-Peña, Sandra & Carrizosa, Emilio & Guerrero, Vanesa & Jiménez-Gamero, M. Dolores & Martín-Barragán, Belén & Molero-Río, Cristina & Ramírez-Cobo, Pepa & Romero Morales, Dolores & Sillero-Denami, 2021. "On sparse ensemble methods: An application to short-term predictions of the evolution of COVID-19," European Journal of Operational Research, Elsevier, vol. 295(2), pages 648-663.
    3. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Morales, Dolores Romero, 2022. "On sparse optimal regression trees," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1045-1054.
    4. Emilio Carrizosa & Vanesa Guerrero & Dolores Romero Morales, 2023. "On mathematical optimization for clustering categories in contingency tables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 407-429, June.
    5. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    6. Hoffmann, F. & Baesens, B. & Mues, C. & Van Gestel, T. & Vanthienen, J., 2007. "Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms," European Journal of Operational Research, Elsevier, vol. 177(1), pages 540-555, February.
    7. Dionissi Aliprantis & Hal Martin & Kristen Tauber, 2020. "What Determines the Success of Housing Mobility Programs?," Working Papers 20-36R, Federal Reserve Bank of Cleveland, revised 19 Oct 2022.
    8. Manuel Oviedo-de la Fuente & Carlos Cabo & Celestino Ordóñez & Javier Roca-Pardiñas, 2021. "A Distance Correlation Approach for Optimum Multiscale Selection in 3D Point Cloud Classification," Mathematics, MDPI, vol. 9(12), pages 1-19, June.
    9. Febrero-Bande, Manuel & González-Manteiga, Wenceslao & Prallon, Brenda & Saporito, Yuri F., 2023. "Functional classification of bitcoin addresses," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    10. Kwon, He-Boong & Lee, Jooh, 2019. "Exploring the differential impact of environmental sustainability, operational efficiency, and corporate reputation on market valuation in high-tech-oriented firms," International Journal of Production Economics, Elsevier, vol. 211(C), pages 1-14.
    11. Martens, David & Baesens, Bart & Van Gestel, Tony & Vanthienen, Jan, 2007. "Comprehensible credit scoring models using rule extraction from support vector machines," European Journal of Operational Research, Elsevier, vol. 183(3), pages 1466-1476, December.
    12. Daníelsson, Jón & Macrae, Robert & Uthemann, Andreas, 2022. "Artificial intelligence and systemic risk," Journal of Banking & Finance, Elsevier, vol. 140(C).
    13. Yucheng Yang & Zhong Zheng & Weinan E, 2020. "Interpretable Neural Networks for Panel Data Analysis in Economics," Papers 2010.05311, arXiv.org, revised Nov 2020.
    14. Daniel Carter & Amelia Acker & Dan Sholler, 2021. "Investigative approaches to researching information technology companies," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(6), pages 655-666, June.
    15. Zhao, Shuping & Xu, Kai & Wang, Zhao & Liang, Changyong & Lu, Wenxing & Chen, Bo, 2022. "Financial distress prediction by combining sentiment tone features," Economic Modelling, Elsevier, vol. 106(C).
    16. Garbero, Alessandra & Sakos, Grayson & Cerulli, Giovanni, 2023. "Towards data-driven project design: Providing optimal treatment rules for development projects," Socio-Economic Planning Sciences, Elsevier, vol. 89(C).
    17. Maude Lavanchy & Patrick Reichert & Jayanth Narayanan & Krishna Savani, 2023. "Applicants’ Fairness Perceptions of Algorithm-Driven Hiring Procedures," Journal of Business Ethics, Springer, vol. 188(1), pages 125-150, November.
    18. Ivan A Canay & Magne Mogstad & Jack Mount, 2024. "On the Use of Outcome Tests for Detecting Bias in Decision Making," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(4), pages 2135-2167.
    19. Guo, Mengzhuo & Zhang, Qingpeng & Liao, Xiuwu & Chen, Frank Youhua & Zeng, Daniel Dajun, 2021. "A hybrid machine learning framework for analyzing human decision-making through learning preferences," Omega, Elsevier, vol. 101(C).
    20. M. Naresh Kumar & V. Sree Hari Rao, 2015. "A New Methodology for Estimating Internal Credit Risk and Bankruptcy Prediction under Basel II Regime," Computational Economics, Springer;Society for Computational Economics, vol. 46(1), pages 83-102, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jomega:v:107:y:2022:i:c:s0305048321001523. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/375/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.