IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v42y2025i2d10.1007_s00357-024-09498-8.html
   My bibliography  Save this article

Two-Group k-Adic Similarity Coefficients for Binary Classifiers

Author

Listed:
  • Perišić Ana

    (University of Split
    Sibenik University of Applied Sciences)

  • Vanbelle Sophie

    (Maastricht University)

Abstract

When using two different sets of binary classification rules on the same items, we obtain two sets of binary vectors. We can, for example, consider the case of two groups of doctors with different experiences classifying patients as diseased or disease-free or two sets of different algorithms classifying consumers as churners or non-churners. In this paper, we propose to extend the well-known Jaccard coefficient and simple matching coefficient to quantify the similarity between two sets of binary vectors. The generalization will be based on the k-adic definition of similarity within sets. We derive the large sample variances of the new coefficients, investigate desirable properties of the established similarity coefficients, and present the applications to real-world datasets.

Suggested Citation

  • Perišić Ana & Vanbelle Sophie, 2025. "Two-Group k-Adic Similarity Coefficients for Binary Classifiers," Journal of Classification, Springer;The Classification Society, vol. 42(2), pages 391-413, July.
  • Handle: RePEc:spr:jclass:v:42:y:2025:i:2:d:10.1007_s00357-024-09498-8
    DOI: 10.1007/s00357-024-09498-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-024-09498-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-024-09498-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Vladimir Batagelj & Matevz Bren, 1995. "Comparing resemblance measures," Journal of Classification, Springer;The Classification Society, vol. 12(1), pages 73-90, March.
    2. F. Baulieu, 1989. "A classification of presence/absence based dissimilarity coefficients," Journal of Classification, Springer;The Classification Society, vol. 6(1), pages 233-246, December.
    3. Matthijs Warrens, 2009. "k-Adic Similarity Coefficients for Binary (Presence/Absence) Data," Journal of Classification, Springer;The Classification Society, vol. 26(2), pages 227-245, August.
    4. Matthijs Warrens, 2008. "Bounds of Resemblance Measures for Binary (Presence/Absence) Variables," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 195-208, November.
    5. Matthijs Warrens, 2008. "On the Indeterminacy of Resemblance Measures for Binary (Presence/Absence) Data," Journal of Classification, Springer;The Classification Society, vol. 25(1), pages 125-136, June.
    6. Madiha Qayyum & Etienne E. Kerre & Samina Ashraf, 2023. "A Parametric Family of Fuzzy Similarity Measures for Intuitionistic Fuzzy Sets," Mathematics, MDPI, vol. 11(14), pages 1-10, July.
    7. R. M. Fewster & S. T. Buckland, 2001. "Similarity Indices for Spatia I Ecological Data," Biometrics, The International Biometric Society, vol. 57(2), pages 495-501, June.
    8. Charles F. Manski, 2015. "Communicating Uncertainty in Official Economic Statistics: An Appraisal Fifty Years after Morgenstern," Journal of Economic Literature, American Economic Association, vol. 53(3), pages 631-653, September.
    9. Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.
    10. J. Gower & P. Legendre, 1986. "Metric and Euclidean properties of dissimilarity coefficients," Journal of Classification, Springer;The Classification Society, vol. 3(1), pages 5-48, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matthijs J. Warrens & Alexandra de Raadt, 2015. "Ordering Properties of the First Eigenvector of Certain Similarity Matrices," Journal of Mathematics, Hindawi, vol. 2015, pages 1-5, November.
    2. Matthijs Warrens, 2008. "Bounds of Resemblance Measures for Binary (Presence/Absence) Variables," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 195-208, November.
    3. Matthijs Warrens, 2008. "On the Indeterminacy of Resemblance Measures for Binary (Presence/Absence) Data," Journal of Classification, Springer;The Classification Society, vol. 25(1), pages 125-136, June.
    4. Martin G. Moehrle, 2010. "Measures for textual patent similarities: a guided way to select appropriate approaches," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(1), pages 95-109, October.
    5. Matthijs Warrens, 2008. "On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 177-183, November.
    6. Matthijs Warrens, 2009. "On Robinsonian dissimilarities, the consecutive ones property and latent variable models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(2), pages 169-184, September.
    7. Francis Caillez & Pascale Kuntz, 1996. "A contribution to the study of the metric and Euclidean structures of dissimilarities," Psychometrika, Springer;The Psychometric Society, vol. 61(2), pages 241-253, June.
    8. Matthijs J. Warrens & Hanneke Hoef, 2022. "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 487-509, November.
    9. Matthijs J. Warrens, 2016. "Inequalities Between Similarities for Numerical Data," Journal of Classification, Springer;The Classification Society, vol. 33(1), pages 141-148, April.
    10. Jonathon J. O’Brien & Michael T. Lawson & Devin K. Schweppe & Bahjat F. Qaqish, 2020. "Suboptimal Comparison of Partitions," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 435-461, July.
    11. Matthijs Warrens, 2009. "k-Adic Similarity Coefficients for Binary (Presence/Absence) Data," Journal of Classification, Springer;The Classification Society, vol. 26(2), pages 227-245, August.
    12. Matthijs Warrens, 2010. "A Kraemer-type Rescaling that Transforms the Odds Ratio into the Weighted Kappa Coefficient," Psychometrika, Springer;The Psychometric Society, vol. 75(2), pages 328-330, June.
    13. Isabella Morlini & Sergio Zani, 2012. "A New Class of Weighted Similarity Indices Using Polytomous Variables," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 199-226, July.
    14. Thomas C. Ford & John M. Colombi & David R. Jacques & Scott R. Graham, 2009. "On the application of classification concepts to systems engineering design and evaluation," Systems Engineering, John Wiley & Sons, vol. 12(2), pages 141-154, June.
    15. Guohuan Su & Adam Mertel & Sébastien Brosse & Justin M. Calabrese, 2023. "Species invasiveness and community invasibility of North American freshwater fish fauna revealed via trait-based analysis," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    16. Aliprantis, Dionissi & Martin, Hal & Tauber, Kristen, 2024. "What determines the success of housing mobility programs?," Journal of Housing Economics, Elsevier, vol. 65(C).
    17. Dionissi Aliprantis & Daniel R. Carroll & Eric Young, 2021. "The Racial Wealth Gap and Access to Opportunity Neighborhoods," Economic Commentary, Federal Reserve Bank of Cleveland, vol. 2021(18), pages 1-5, September.
    18. Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.
    19. repec:cdl:itsrrp:qt6cb1f85c is not listed on IDEAS
    20. Niemann, Helen & Moehrle, Martin G. & Frischkorn, Jonas, 2017. "Use of a new patent text-mining and visualization method for identifying patenting patterns over time: Concept, method and test application," Technological Forecasting and Social Change, Elsevier, vol. 115(C), pages 210-220.
    21. Michael J. Greenacre & Patrick J. F. Groenen, 2016. "Weighted Euclidean Biplots," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 442-459, October.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:42:y:2025:i:2:d:10.1007_s00357-024-09498-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.