IDEAS home Printed from https://ideas.repec.org/a/cup/polals/v30y2022i4p590-596_8.html
   My bibliography  Save this article

Adaptive Fuzzy String Matching: How to Merge Datasets with Only One (Messy) Identifying Field

Author

Listed:
  • Kaufman, Aaron R.
  • Klevs, Aja

Abstract

A single dataset is rarely sufficient to address a question of substantive interest. Instead, most applied data analysis combines data from multiple sources. Very rarely do two datasets contain the same identifiers with which to merge datasets; fields like name, address, and phone number may be entered incorrectly, missing, or in dissimilar formats. Combining multiple datasets absent a unique identifier that unambiguously connects entries is called the record linkage problem. While recent work has made great progress in the case where there are many possible fields on which to match, the much more uncertain case of only one identifying field remains unsolved: this fuzzy string matching problem, both its own problem and a component of standard record linkage problems, is our focus. We design and validate an algorithmic solution called Adaptive Fuzzy String Matching rooted in adaptive learning, and show that our tool identifies more matches, with higher precision, than existing solutions. Finally, we illustrate its validity and practical value through applications to matching organizations, places, and individuals.

Suggested Citation

  • Kaufman, Aaron R. & Klevs, Aja, 2022. "Adaptive Fuzzy String Matching: How to Merge Datasets with Only One (Messy) Identifying Field," Political Analysis, Cambridge University Press, vol. 30(4), pages 590-596, October.
  • Handle: RePEc:cup:polals:v:30:y:2022:i:4:p:590-596_8
    as

    Download full text from publisher

    File URL: https://www.cambridge.org/core/product/identifier/S1047198721000383/type/journal_article
    File Function: link to article abstract page
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cup:polals:v:30:y:2022:i:4:p:590-596_8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Kirk Stebbing (email available below). General contact details of provider: https://www.cambridge.org/pan .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.