IDEAS home Printed from https://ideas.repec.org/p/nsr/escoet/escoe-tr-14.html

Matching UK Business Microdata – A Study Using ONS and CBI Business Surveys

Author

Listed:
  • Michael J Mahony
  • Josh Martin

Abstract

Business data linkage is a powerful tool to unlock new insights, that are often not possible using data from one source alone. However, it can be challenging and often requires a number of decisions to be made on how the linking should be conducted. Such decisions can affect the match rates and conclusions drawn from the linked data. To provide some useful information to researchers on the common pitfalls when doing data linkage, and some potential solutions, we provide an account of a business data linkage exercise. We link three sources: a survey of businesses conducted by the Confederation of British Industry (CBI), the FAME dataset of business financial data from Bureau van Dijk, and the Inter-Departmental Business Register (IDBR). This requires the use of business names and addresses as linking 'keys' which are subject to error and imprecision, resulting in less than complete matches. We detail a novel solution to choose among ‘multiple matches’ when a propensity-score matching approach is unable to select a definitive match, which we implemented when linking the CBI data with the IDBR. We report match results, which are around 50 per cent when linking the CBI survey with FAME, and around 90 per cent when linking the CBI survey with IDBR. We also report variation by geography, size and time-period. We then use the IDBR-linked CBI data to match on data from various ONS business surveys, which typically have match rates of less than 50 per cent, and in some cases far lower. We conclude with some recommendations for researchers when conducting data linkage.

Suggested Citation

  • Michael J Mahony & Josh Martin, 2022. "Matching UK Business Microdata – A Study Using ONS and CBI Business Surveys," Economic Statistics Centre of Excellence (ESCoE) Technical Reports ESCOE-TR-14, Economic Statistics Centre of Excellence (ESCoE).
  • Handle: RePEc:nsr:escoet:escoe-tr-14
    as

    Download full text from publisher

    File URL: https://escoe-website.s3.amazonaws.com/wp-content/uploads/2022/01/18094535/ESCoE-TR-14.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kevin Lee & Michael Mahony & Paul Mizen, 2020. "The CBI Suite of Business Surveys," Economic Statistics Centre of Excellence (ESCoE) Technical Reports ESCOE-TR-08, Economic Statistics Centre of Excellence (ESCoE).
    2. Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nils Grashof, 2020. "Sinking or swimming in the cluster labour pool? A firm-specific analysis of the effect of specialized labour," Jena Economics Research Papers 2020-006, Friedrich-Schiller-University Jena.
    2. Battke, Benedikt & Schmidt, Tobias S. & Stollenwerk, Stephan & Hoffmann, Volker H., 2016. "Internal or external spillovers—Which kind of knowledge is more likely to flow within or across technologies," Research Policy, Elsevier, vol. 45(1), pages 27-41.
    3. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    4. Favaro, Donata & Ninka, Eniel & Turvani, Margherita, 2012. "Productivity in innovation: the role of inventor connections and mobility," MPRA Paper 38950, University Library of Munich, Germany.
    5. Paulo Vinícius Marcondes Cordeiro & Dario Eduardo Amaral Dergint & Kazuo Hatakeyama, 2014. "Proposal Of Method For An Automatic Complementarities Search Between Companies' R&D," International Journal of Innovation and Technology Management (IJITM), World Scientific Publishing Co. Pte. Ltd., vol. 11(02), pages 1-21.
    6. Foray, D. & Raffo, J., 2014. "The emergence of an educational tool industry: Opportunities and challenges for innovation in education," Research Policy, Elsevier, vol. 43(10), pages 1707-1715.
    7. Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
    8. Zi‐Lin He & Tony W. Tong & Yuchen Zhang & Wenlong He, 2018. "Constructing a Chinese Patent Database of listed firms in China: Descriptions, lessons, and insights," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 579-606, September.
    9. Diane Coyle & Ayantola Alayande, 2023. "Investment in the UK - Longer Term Trends," Working Papers 040, The Productivity Institute.
    10. Martin Kalthaus, 2020. "Knowledge recombination along the technology life cycle," Journal of Evolutionary Economics, Springer, vol. 30(3), pages 643-704, July.
    11. Loriane Py & Antoine Bozio & Delphine Irac, 2014. "Impact of research tax credit on R&D and innovation: evidence from the 2008 French reform," EcoMod2014 6873, EcoMod.
    12. Grashof, Nils, 2020. "Putting the watering can away Towards a targeted (problem-oriented) cluster policy framework," Papers in Innovation Studies 2020/4, Lund University, CIRCLE - Centre for Innovation Research.
    13. Martin Ganco & Rosemarie H. Ziedonis & Rajshree Agarwal, 2015. "More stars stay, but the brightest ones still leave: Job hopping in the shadow of patent enforcement," Strategic Management Journal, Wiley Blackwell, vol. 36(5), pages 659-685, May.
    14. Simeth, Markus & Lhuillery, Stephane, 2015. "How do firms develop capabilities for scientific disclosure?," Research Policy, Elsevier, vol. 44(7), pages 1283-1295.
    15. Guangyuan Hu & Stephen Carley & Li Tang, 2012. "Visualizing nanotechnology research in Canada: evidence from publication activities, 1990–2009," The Journal of Technology Transfer, Springer, vol. 37(4), pages 550-562, August.
    16. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    17. Josh Martin & Kyle Jones, 2023. "An Occupation and Asset-Driven Approach to Capital Utilization Adjustment in Productivity Statistics," NBER Chapters, in: Technology, Productivity, and Economic Growth, pages 285-322, National Bureau of Economic Research, Inc.
    18. William Arant & Dirk Fornahl & Nils Grashof & Kolja Hesse & Cathrin Söllner, 2019. "University-industry collaborations—The key to radical innovations? [Universität-Industrie-Kooperationen – Der Schlüssel zu radikalen Innovationen?]," Review of Regional Research: Jahrbuch für Regionalwissenschaft, Springer;Gesellschaft für Regionalforschung (GfR), vol. 39(2), pages 119-141, October.
    19. Feldman, Maryann & Kenney, Martin & Lissoni, Francesco, 2015. "The New Data Frontier," Research Policy, Elsevier, vol. 44(9), pages 1629-1632.
    20. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.

    More about this item

    Keywords

    ;
    ;
    ;

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C89 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nsr:escoet:escoe-tr-14. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ESCoE Centre Manager (email available below). General contact details of provider: https://edirc.repec.org/data/escoeuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.