IDEAS home Printed from https://ideas.repec.org/p/tin/wpaper/20230055.html
   My bibliography  Save this paper

Fuzzy firm name matching: Merging Amadeus firm data to PATSTAT

Author

Listed:
  • Leon Bremer

    (Vrije Universiteit Amsterdam)

Abstract

When merging firms across large databases in the absence of common identifiers, text algorithms can help. I propose a high-performance fuzzy firm name matching algorithm that uses existing computational methods and works even under hardware restrictions. The algorithm consists of four steps, namely (1) cleaning, (2) similarity scoring, (3) a decision rule based on supervised machine learning, and (4) group identification using community detection. The algorithm is applied to merging firms in the Amadeus Financials and Subsidiaries databases, containing firm-level business and ownership information, to applicants in PATSTAT, a worldwide patent database. For the application the algorithm vastly outperforms an exact string match by increasing the number of matched firms in the Amadeus Financials (Subsidiaries) database with 116% (160%). 53% (74%) of this improvement is due to cleaning, and another 41% (50%) improvement is due to similarity matching. 18.1% of all patent applications since 1950 are matched to firms in the Amadeus databases, compared to 2.6% for an exact name match.

Suggested Citation

  • Leon Bremer, 2023. "Fuzzy firm name matching: Merging Amadeus firm data to PATSTAT," Tinbergen Institute Discussion Papers 23-055/VIII, Tinbergen Institute.
  • Handle: RePEc:tin:wpaper:20230055
    as

    Download full text from publisher

    File URL: https://papers.tinbergen.nl/23055.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bryan Kelly & Dimitris Papanikolaou & Amit Seru & Matt Taddy, 2021. "Measuring Technological Innovation over the Long Run," American Economic Review: Insights, American Economic Association, vol. 3(3), pages 303-320, September.
    2. Florian Seliger & Gaéran de Rassenfosse & Jan Kozak, 2019. "Geocoding of worldwide patent data," KOF Working papers 19-458, KOF Swiss Economic Institute, ETH Zurich.
    3. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    4. Michele Peruzzi & Georg Zachmann & Reinhilde Veugelers, 2014. "Remerge- regression-based record linkage with an application to PATSTAT," Working Papers 852, Bruegel.
    5. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    6. Eugenie Dugoua & Marion Dumas & Joëlle Noailly, 2022. "Text as Data in Environmental Economics and Policy," Review of Environmental Economics and Policy, University of Chicago Press, vol. 16(2), pages 346-356.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ferrucci, Edoardo, 2020. "Migration, innovation and technological diversion: German patenting after the collapse of the Soviet Union," Research Policy, Elsevier, vol. 49(9).
    2. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    3. Niccolò Innocenti & Francesco Capone & Luciana Lazzeretti & Sergio Petralia, 2022. "The role of inventors’ networks and variety for breakthrough inventions," Papers in Regional Science, Wiley Blackwell, vol. 101(1), pages 37-57, February.
    4. Tubiana, Matteo & Miguelez, Ernest & Moreno, Rosina, 2022. "In knowledge we trust: Learning-by-interacting and the productivity of inventors," Research Policy, Elsevier, vol. 51(1).
    5. Dieter F. Kogler & Jürgen Essletzbichler & David L. Rigby, 2017. "The evolution of specialization in the EU15 knowledge space," Journal of Economic Geography, Oxford University Press, vol. 17(2), pages 345-373.
    6. Bergé, Laurent & Carayol, Nicolas & Roux, Pascale, 2018. "How do inventor networks affect urban invention?," Regional Science and Urban Economics, Elsevier, vol. 71(C), pages 137-162.
    7. Ronald B. Davies & Dieter Franz Kogler & Ryan M. Hynes, 2020. "Patent Boxes and the Success Rate of Applications," Working Papers 202018, School of Economics, University College Dublin.
    8. Abbasiharofteh, Milad & Kogler, Dieter F. & Lengyel, Balázs, 2023. "Atypical combinations of technologies in regional co-inventor networks," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 52(10), pages 1-1.
    9. Ferrucci, Edoardo & Lissoni, Francesco, 2019. "Foreign inventors in Europe and the United States: Diversity and Patent Quality," Research Policy, Elsevier, vol. 48(9), pages 1-1.
    10. Carlo Corradini, 2019. "Location determinants of green technological entry: evidence from European regions," Small Business Economics, Springer, vol. 52(4), pages 845-858, April.
    11. Cristelli, Gabriele & Lissoni, Francesco, 2020. "Free movement of inventors: open-border policy and innovation in Switzerland," MPRA Paper 107433, University Library of Munich, Germany.
    12. Clément Gorin, 2017. "Accessibility, absorptive capacity and innovation in European urban areas," Working Papers 1722, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    13. Holger Graf, 2017. "Regional Innovator Networks - A Review and an Application with R," Jena Economics Research Papers 2017-016, Friedrich-Schiller-University Jena.
    14. Ufuk Akcigit & Santiago Caicedo & Ernest Miguelez & Stefanie Stantcheva & Valerio Sterzi, 2018. "Dancing with the Stars: Innovation Through Interactions," NBER Working Papers 24466, National Bureau of Economic Research, Inc.
    15. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    16. Francesco Capone & Luciana Lazzeretti & Niccolò Innocenti, 2021. "Innovation and diversity: the role of knowledge networks in the inventive capacity of cities," Small Business Economics, Springer, vol. 56(2), pages 773-788, February.
    17. Dorner, Matthias & Harhoff, Dietmar & Gaessler, Fabian & Hoisl, Karin & Poege, Felix, 2019. "Linked Inventor Biography Data 1980-2014 : (INV-BIO ADIAB 8014)," FDZ Datenreport. Documentation on Labour Market Data 201803_en, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    18. Shuo Xu & Ling Li & Xin An, 2023. "Do academic inventors have diverse interests?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1023-1053, February.
    19. Stefano Breschi & Francesco Lissoni & Ernest Miguelez, 2017. "Foreign-origin inventors in the USA: testing for diaspora and brain gain effects," Journal of Economic Geography, Oxford University Press, vol. 17(5), pages 1009-1038.
    20. Monica Coffano & Dominique Foray & Michele Pezzoni, 2017. "Does inventor centrality foster regional innovation? The case of the Swiss medical devices sector," Regional Studies, Taylor & Francis Journals, vol. 51(8), pages 1206-1218, August.

    More about this item

    Keywords

    Fuzzy name matching; supervised machine learning; name disambiguation; patents;
    All these keywords.

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • O34 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Intellectual Property and Intellectual Capital

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tin:wpaper:20230055. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tinbergen Office +31 (0)10-4088900 (email available below). General contact details of provider: https://edirc.repec.org/data/tinbenl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.