IDEAS home Printed from https://ideas.repec.org/p/ces/ifowps/_409.html
   My bibliography  Save this paper

Machine Learning Based Linkage of Company Data for Economic Research: Application to the EBDC Business Panels

Author

Listed:
  • Valentin Reich

Abstract

This article presents a comprehensive approach to probabilistic linkage of German com pany data using Machine Learning and Natural Language Processing techniques. Here, the long-running ifo Institute surveys are linked to fnancial information in the Orbis database by addressing the unique challenges of company data linkage, such as corporate structures and linguistic nuances in company names. Compared to a previous linkage, the approach achieves improved match rates and is able to re-evaluate existing matches. This article contributes best practice advice for company data linkage and serves as a documentation for the resulting research dataset.

Suggested Citation

  • Valentin Reich, 2024. "Machine Learning Based Linkage of Company Data for Economic Research: Application to the EBDC Business Panels," ifo Working Paper Series 409, ifo Institute - Leibniz Institute for Economic Research at the University of Munich.
  • Handle: RePEc:ces:ifowps:_409
    as

    Download full text from publisher

    File URL: https://www.ifo.de/DocDL/wp-2024-409_reich_linkage-of-company-data.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bruce D. Meyer & Nikolas Mittag, 2019. "Using Linked Survey and Administrative Data to Better Measure Income: Implications for Poverty, Program Effectiveness, and Holes in the Safety Net," American Economic Journal: Applied Economics, American Economic Association, vol. 11(2), pages 176-204, April.
    2. Zeno Enders & Franziska Hünnekes & Gernot Müller, 2022. "Firm Expectations and Economic Activity," Journal of the European Economic Association, European Economic Association, vol. 20(6), pages 2396-2439.
    3. Kilian Huber, 2018. "Disentangling the Effects of a Banking Crisis: Evidence from German Firms and Counties," American Economic Review, American Economic Association, vol. 108(3), pages 868-898, March.
    4. Jamie C. Moore & Peter W. F. Smith & Gabriele B. Durrant, 2018. "Correlates of record linkage and estimating risks of non‐linkage biases in business data sets," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1211-1230, October.
    5. Link, Sebastian & Peichl, Andreas & Roth, Christopher & Wohlfart, Johannes, 2023. "Information frictions among firms and households," Journal of Monetary Economics, Elsevier, vol. 135(C), pages 99-115.
    6. Abowd, John M. & Vilhuber, Lars, 2005. "The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 133-152, April.
    7. John Cuffe & Nathan Goldschlag, 2018. "Squeezing More Out of Your Data: Business Record Linkage with Python," Working Papers 18-46, Center for Economic Studies, U.S. Census Bureau.
    8. Michele Peruzzi & Georg Zachmann & Reinhilde Veugelers, 2014. "Remerge- regression-based record linkage with an application to PATSTAT," Bruegel Working Papers 852, Bruegel.
    9. Sebastian Link, 2018. "Harmonization and Interpretation of the ifo Business Survey's Micro Data," CESifo Working Paper Series 7427, CESifo.
    10. Anna Gumpert & Henrike Steimer & Manfred Antoni, 2022. "Firm Organization with Multiple Establishments [“Organizing Offshoring: Middle Managers and Communication Costs]," The Quarterly Journal of Economics, Oxford University Press, vol. 137(2), pages 1091-1138.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Link Sebastian, 2020. "Harmonization of the ifo Business Survey’s Micro Data," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 240(4), pages 543-555, August.
    2. Stefan Sauer & Klaus Wohlrabe, 2024. "What Is Behind the ifo Business Climate? Evidence from a Meta-Survey," CESifo Working Paper Series 11482, CESifo.
    3. John Abowd & Martha Stinson, 2011. "Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data," Working Papers 11-20, Center for Economic Studies, U.S. Census Bureau.
    4. Ma, Yongfan & Hu, Xingcun, 2024. "Shadow banking and SME investment: Evidence from China's new asset management regulations," International Review of Economics & Finance, Elsevier, vol. 93(PA), pages 332-349.
    5. Dixon, Huw D. & Grimme, Christian, 2022. "State-dependent or time-dependent pricing? New evidence from a monthly firm-level survey: 1980–2017," European Economic Review, Elsevier, vol. 150(C).
    6. Michael Berlemann & Vera Jahn & Robert Lehmann, 2018. "Auswege aus dem Dilemma der empirischen Mittelstandsforschung," ifo Schnelldienst, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 71(23), pages 22-28, December.
    7. Chen, Cheng & Senga, Tatsuro & Sun, Chang & Zhang, Hongyong, 2023. "Uncertainty, imperfect information, and expectation formation over the firm’s life cycle," Journal of Monetary Economics, Elsevier, vol. 140(C), pages 60-77.
    8. Straub, Ludwig & Ulbricht, Robert, 2015. "Endogenous Uncertainty and Credit Crunches," TSE Working Papers 15-604, Toulouse School of Economics (TSE), revised Dec 2017.
    9. Bustos, Emil, 2023. "The Effect of Financial Constraints on Inventory Holdings," Working Paper Series 1463, Research Institute of Industrial Economics.
    10. Fetzer, Thiemo & Yotzov, Ivan, 2023. "(How) Do electoral surprises drive business cycles? Evidence from a new dataset," CAGE Online Working Paper Series 672, Competitive Advantage in the Global Economy (CAGE).
    11. Wehrhöfer, Nils, 2023. "Energy prices and inflation expectations: Evidence from households and firms," Discussion Papers 28/2023, Deutsche Bundesbank.
    12. Fabiano Schivardi & Enrico Sette & Guido Tabellini, 2022. "Credit Misallocation During the European Financial Crisis," The Economic Journal, Royal Economic Society, vol. 132(641), pages 391-423.
    13. Link, Sebastian, 2024. "The price and employment response of firms to the introduction of minimum wages," Journal of Public Economics, Elsevier, vol. 239(C).
    14. Berger, Allen N. & Molyneux, Phil & Wilson, John O.S., 2020. "Banks and the real economy: An assessment of the research," Journal of Corporate Finance, Elsevier, vol. 62(C).
    15. Meyer, Bruce D. & Mittag, Nikolas, 2019. "Combining Administrative and Survey Data to Improve Income Measurement," IZA Discussion Papers 12266, Institute of Labor Economics (IZA).
    16. Cl'ement de Chaisemartin & Xavier D'Haultf{oe}uille, 2020. "Difference-in-Differences Estimators of Intertemporal Treatment Effects," Papers 2007.04267, arXiv.org, revised Dec 2024.
    17. Breen, Casey & Goldstein, Joshua R., 2022. "Berkeley Unified Numident Mortality Database: Public Administrative Records for Individual-Level Mortality Research," SocArXiv pc294, Center for Open Science.
    18. Sebastian Doerr & Stefan Gissler & José-Luis Peydró & Hans-Joachim Voth, 2018. "From finance to fascism," Economics Working Papers 1651, Department of Economics and Business, Universitat Pompeu Fabra, revised Nov 2020.
    19. Kabiri, Ali & Malone, Vlad & Roland, Isabelle Angeline Madeleine & Spatareanu, Mariana, 2020. "Bank default risk propagation along supply chains: evidence from the UK," LSE Research Online Documents on Economics 121832, London School of Economics and Political Science, LSE Library.
    20. Alfaro, Laura & García-Santana, Manuel & Moral-Benito, Enrique, 2021. "On the direct and indirect real effects of credit supply shocks," Journal of Financial Economics, Elsevier, vol. 139(3), pages 895-921.

    More about this item

    Keywords

    record linkage; company data; orbis; survey data;
    All these keywords.

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ces:ifowps:_409. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Klaus Wohlrabe (email available below). General contact details of provider: https://edirc.repec.org/data/ifooode.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.