IDEAS home Printed from https://ideas.repec.org/p/tin/wpaper/20230055.html

Fuzzy firm name matching: Merging Amadeus firm data to PATSTAT

Author

Listed:
  • Leon Bremer

    (Vrije Universiteit Amsterdam)

Abstract

When merging firms across large databases in the absence of common identifiers, text algorithms can help. I propose a high-performance fuzzy firm name matching algorithm that uses existing computational methods and works even under hardware restrictions. The algorithm consists of four steps, namely (1) cleaning, (2) similarity scoring, (3) a decision rule based on supervised machine learning, and (4) group identification using community detection. The algorithm is applied to merging firms in the Amadeus Financials and Subsidiaries databases, containing firm-level business and ownership information, to applicants in PATSTAT, a worldwide patent database. For the application the algorithm vastly outperforms an exact string match by increasing the number of matched firms in the Amadeus Financials (Subsidiaries) database with 116% (160%). 53% (74%) of this improvement is due to cleaning, and another 41% (50%) improvement is due to similarity matching. 18.1% of all patent applications since 1950 are matched to firms in the Amadeus databases, compared to 2.6% for an exact name match.

Suggested Citation

  • Leon Bremer, 2023. "Fuzzy firm name matching: Merging Amadeus firm data to PATSTAT," Tinbergen Institute Discussion Papers 23-055/VIII, Tinbergen Institute.
  • Handle: RePEc:tin:wpaper:20230055
    as

    Download full text from publisher

    File URL: https://papers.tinbergen.nl/23055.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Florian Seliger & Gaéran de Rassenfosse & Jan Kozak, 2019. "Geocoding of worldwide patent data," KOF Working papers 19-458, KOF Swiss Economic Institute, ETH Zurich.
    2. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    3. Bryan Kelly & Dimitris Papanikolaou & Amit Seru & Matt Taddy, 2021. "Measuring Technological Innovation over the Long Run," American Economic Review: Insights, American Economic Association, vol. 3(3), pages 303-320, September.
    4. Michele Peruzzi & Georg Zachmann & Reinhilde Veugelers, 2014. "Remerge- regression-based record linkage with an application to PATSTAT," Bruegel Working Papers 852, Bruegel.
    5. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    6. Manuel Trajtenberg & Gil Shiff & Ran Melamed, 2009. "The "Names Game": Harnessing Inventors, Patent Data for Economic Research," Annals of Economics and Statistics, GENES, issue 93-94, pages 67-77.
    7. Eugenie Dugoua & Marion Dumas & Joëlle Noailly, 2022. "Text as Data in Environmental Economics and Policy," Review of Environmental Economics and Policy, University of Chicago Press, vol. 16(2), pages 346-356.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    2. Ferrucci, Edoardo, 2020. "Migration, innovation and technological diversion: German patenting after the collapse of the Soviet Union," Research Policy, Elsevier, vol. 49(9).
    3. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    4. Niccolò Innocenti & Francesco Capone & Luciana Lazzeretti & Sergio Petralia, 2022. "The role of inventors’ networks and variety for breakthrough inventions," Papers in Regional Science, Wiley Blackwell, vol. 101(1), pages 37-57, February.
    5. Tubiana, Matteo & Miguelez, Ernest & Moreno, Rosina, 2022. "In knowledge we trust: Learning-by-interacting and the productivity of inventors," Research Policy, Elsevier, vol. 51(1).
    6. Dieter F. Kogler & Jürgen Essletzbichler & David L. Rigby, 2017. "The evolution of specialization in the EU15 knowledge space," Journal of Economic Geography, Oxford University Press, vol. 17(2), pages 345-373.
    7. Ronald B. Davies & Dieter Franz Kogler & Ryan M. Hynes, 2020. "Patent Boxes and the Success Rate of Applications," Working Papers 202018, School of Economics, University College Dublin.
    8. Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
    9. Carlo Corradini, 2019. "Location determinants of green technological entry: evidence from European regions," Small Business Economics, Springer, vol. 52(4), pages 845-858, April.
    10. Christian Rutzer & Dragan Filimonovic & Jeffrey T. Macher & Rolf Weder, 2026. "Towards Measuring Disruptive Innovation Across Countries," Papers 2603.17881, arXiv.org.
    11. Abbasiharofteh, Milad & Kogler, Dieter F. & Lengyel, Balázs, 2023. "Atypical combinations of technologies in regional co-inventor networks," Research Policy, Elsevier, vol. 52(10).
    12. DIODATO Dario, 2024. "Handbook of Economic Complexity for Policy," JRC Research Reports JRC138666, Joint Research Centre.
    13. Clément Gorin, 2017. "Accessibility, absorptive capacity and innovation in European urban areas," Working Papers 1722, Groupe d'Analyse et de Théorie Economique Lyon St-Etienne (GATE Lyon St-Etienne), Université de Lyon.
    14. Bergé, Laurent & Carayol, Nicolas & Roux, Pascale, 2018. "How do inventor networks affect urban invention?," Regional Science and Urban Economics, Elsevier, vol. 71(C), pages 137-162.
    15. Grazia Sveva Ascione & Valerio Sterzi & Andrea Vezzulli, 2025. "Terrorizer: a novel algorithm for patent assignee name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(12), pages 7303-7342, December.
    16. Holger Graf, 2017. "Regional Innovator Networks - A Review and an Application with R," Jena Economics Research Papers 2017-016, Friedrich-Schiller-University Jena.
    17. Francesco Capone & Luciana Lazzeretti & Niccolò Innocenti, 2021. "Innovation and diversity: the role of knowledge networks in the inventive capacity of cities," Small Business Economics, Springer, vol. 56(2), pages 773-788, February.
    18. Dorner, Matthias & Harhoff, Dietmar & Gaessler, Fabian & Hoisl, Karin & Poege, Felix, 2019. "Linked Inventor Biography Data 1980-2014 : (INV-BIO ADIAB 8014)," FDZ Datenreport. Documentation on Labour Market Data 201803_en, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    19. Shuo Xu & Ling Li & Xin An, 2023. "Do academic inventors have diverse interests?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1023-1053, February.
    20. Ferrucci, Edoardo & Lissoni, Francesco, 2019. "Foreign inventors in Europe and the United States: Diversity and Patent Quality," Research Policy, Elsevier, vol. 48(9), pages 1-1.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • O34 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Intellectual Property and Intellectual Capital

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tin:wpaper:20230055. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tinbergen Office +31 (0)10-4088900 (email available below). General contact details of provider: https://edirc.repec.org/data/tinbenl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.