IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0247751.html
   My bibliography  Save this article

A comparison of 71 binary similarity coefficients: The effect of base rates

Author

Listed:
  • Michael Brusco
  • J Dennis Cradit
  • Douglas Steinley

Abstract

There are many psychological applications that require collapsing the information in a two-mode (e.g., respondents-by-attributes) binary matrix into a one-mode (e.g., attributes-by-attributes) similarity matrix. This process requires the selection of a measure of similarity between binary attributes. A vast number of binary similarity coefficients have been proposed in fields such as biology, geology, and ecology. Although previous studies have reported cluster analyses of binary similarity coefficients, there has been little exploration of how cluster memberships are affected by the base rates (percentage of ones) for the binary attributes. We conducted a simulation experiment that compared two-cluster K-median partitions of 71 binary similarity coefficients based on their pairwise correlations obtained under 15 different base-rate configurations. The results reveal that some subsets of coefficients consistently group together regardless of the base rates. However, there are other subsets of coefficients that group together for some base rates, but not for others.

Suggested Citation

  • Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.
  • Handle: RePEc:plo:pone00:0247751
    DOI: 10.1371/journal.pone.0247751
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0247751
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0247751&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0247751?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mladenovic, Nenad & Brimberg, Jack & Hansen, Pierre & Moreno-Perez, Jose A., 2007. "The p-median problem: A survey of metaheuristic approaches," European Journal of Operational Research, Elsevier, vol. 179(3), pages 927-939, June.
    2. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    3. Michael Brusco & Hans-Friedrich Köhn, 2009. "Erratum to: Exemplar-Based Clustering via Simulated Annealing," Psychometrika, Springer;The Psychometric Society, vol. 74(4), pages 755-755, December.
    4. S. L. Hakimi, 1965. "Optimum Distribution of Switching Centers in a Communication Network and Some Related Graph Theoretic Problems," Operations Research, INFORMS, vol. 13(3), pages 462-475, June.
    5. J. Gower & P. Legendre, 1986. "Metric and Euclidean properties of dissimilarity coefficients," Journal of Classification, Springer;The Classification Society, vol. 3(1), pages 5-48, March.
    6. J. Straat & L. Ark & Klaas Sijtsma, 2013. "Comparing Optimization Algorithms for Item Selection in Mokken Scale Analysis," Journal of Classification, Springer;The Classification Society, vol. 30(1), pages 75-99, April.
    7. Michael B. Teitz & Polly Bart, 1968. "Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph," Operations Research, INFORMS, vol. 16(5), pages 955-961, October.
    8. Michael Brusco & Hans-Friedrich Köhn, 2008. "Optimal Partitioning of a Data Set Based on the p-Median Model," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 89-105, March.
    9. Michael Brusco & Hans-Friedrich Köhn, 2009. "Exemplar-Based Clustering via Simulated Annealing," Psychometrika, Springer;The Psychometric Society, vol. 74(3), pages 457-475, September.
    10. S. L. Hakimi, 1964. "Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph," Operations Research, INFORMS, vol. 12(3), pages 450-459, June.
    11. van der Ark, L. Andries, 2012. "New Developments in Mokken Scale Analysis in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i05).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. da F. Costa, Luciano, 2023. "Multiset neurons," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    2. Criado-Alonso, Ángeles & Aleja, David & Romance, Miguel & Criado, Regino, 2022. "Derivative of a hypergraph as a tool for linguistic pattern analysis," Chaos, Solitons & Fractals, Elsevier, vol. 163(C).
    3. Rumen Iliev & Will Bennis, 2023. "The Convergence of Positivity: Are Happy People All Alike?," Journal of Happiness Studies, Springer, vol. 24(5), pages 1643-1662, June.
    4. Costa, Luciano da F., 2022. "On similarity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 599(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Brusco & Douglas Steinley, 2015. "Affinity Propagation and Uncapacitated Facility Location Problems," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 443-480, October.
    2. Vladimir Marianov & Daniel Serra, 2009. "Median problems in networks," Economics Working Papers 1151, Department of Economics and Business, Universitat Pompeu Fabra.
    3. Antiopi Panteli & Basilis Boutsinas & Ioannis Giannikos, 2021. "On solving the multiple p-median problem based on biclustering," Operational Research, Springer, vol. 21(1), pages 775-799, March.
    4. Tao Zhuolin & Zheng Qingjing & Kong Hui, 2018. "A Modified Gravity p-Median Model for Optimizing Facility Locations," Journal of Systems Science and Information, De Gruyter, vol. 6(5), pages 421-434, October.
    5. Amir Hossein Sadeghi & Ziyuan Sun & Amirreza Sahebi-Fakhrabad & Hamid Arzani & Robert Handfield, 2023. "A Mixed-Integer Linear Formulation for a Dynamic Modified Stochastic p-Median Problem in a Competitive Supply Chain Network Design," Logistics, MDPI, vol. 7(1), pages 1-24, March.
    6. Rolland, Erik & Schilling, David A. & Current, John R., 1997. "An efficient tabu search procedure for the p-Median Problem," European Journal of Operational Research, Elsevier, vol. 96(2), pages 329-342, January.
    7. Schilling, D. A. & Rosing, K. E. & ReVelle, C. S., 2000. "Network distance characteristics that affect computational effort in p-median location problems," European Journal of Operational Research, Elsevier, vol. 127(3), pages 525-536, December.
    8. Snežana Tadić & Mladen Krstić & Željko Stević & Miloš Veljović, 2023. "Locating Collection and Delivery Points Using the p -Median Location Problem," Logistics, MDPI, vol. 7(1), pages 1-17, February.
    9. ReVelle, C. S. & Eiselt, H. A., 2005. "Location analysis: A synthesis and survey," European Journal of Operational Research, Elsevier, vol. 165(1), pages 1-19, August.
    10. Alcaraz, Javier & Landete, Mercedes & Monge, Juan F., 2012. "Design and analysis of hybrid metaheuristics for the Reliability p-Median Problem," European Journal of Operational Research, Elsevier, vol. 222(1), pages 54-64.
    11. Simon Blanchard & Daniel Aloise & Wayne DeSarbo, 2012. "The Heterogeneous P-Median Problem for Categorization Based Clustering," Psychometrika, Springer;The Psychometric Society, vol. 77(4), pages 741-762, October.
    12. Rosing, K. E. & ReVelle, C. S. & Rolland, E. & Schilling, D. A. & Current, J. R., 1998. "Heuristic concentration and Tabu search: A head to head comparison," European Journal of Operational Research, Elsevier, vol. 104(1), pages 93-99, January.
    13. H K Smith & G Laporte & P R Harper, 2009. "Locational analysis: highlights of growth to maturity," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(1), pages 140-148, May.
    14. Hribar, Michelle & Daskin, Mark S., 1997. "A dynamic programming heuristic for the P-median problem," European Journal of Operational Research, Elsevier, vol. 101(3), pages 499-508, September.
    15. Daniel Serra & Vladimir Marianov, 1996. "The P-median problem in a changing network: The case of Barcelona," Economics Working Papers 180, Department of Economics and Business, Universitat Pompeu Fabra.
    16. Mark S. Daskin, 2008. "What you should know about location modeling," Naval Research Logistics (NRL), John Wiley & Sons, vol. 55(4), pages 283-294, June.
    17. Michael Brusco & Hans-Friedrich Köhn, 2009. "Exemplar-Based Clustering via Simulated Annealing," Psychometrika, Springer;The Psychometric Society, vol. 74(3), pages 457-475, September.
    18. K.E. Rosing & C.S. ReVelle, 1997. "Heuristic Concentration and Tabu Search: A Nose to Nose Comparison," Tinbergen Institute Discussion Papers 97-058/3, Tinbergen Institute.
    19. Bader F. AlBdaiwi & Diptesh Ghosh & Boris Goldengorin, 2011. "Data aggregation for p-median problems," Journal of Combinatorial Optimization, Springer, vol. 21(3), pages 348-363, April.
    20. ReVelle, C.S. & Eiselt, H.A. & Daskin, M.S., 2008. "A bibliography for some fundamental problem categories in discrete location science," European Journal of Operational Research, Elsevier, vol. 184(3), pages 817-848, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0247751. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.