IDEAS home Printed from https://ideas.repec.org/p/zbw/zewdip/21021.html
   My bibliography  Save this paper

Disambiguation by namesake risk assessment

Author

Listed:
  • Doherr, Thorsten

Abstract

Most bibliometric databases only provide names as the handle to their careers leading to the issue of namesakes. We introduce a universal method to assess the risk of linking documents of different individuals sharing the same name with the goal of collecting the documents into personalized clusters. A theoretical setup for the probability of drawing a namesake depending on the number of namesakes in the population and the size of the observed unit replaces the need for training datasets, thereby avoiding a namesake bias caused by the inherent underestimation of namesakes in training/benchmark data. A Poisson model based on a master sample of unambiguously identified individuals estimates the main component, the number of namesakes for any given name. To implement the algorithm, we reduce the complexity in the data by resolving similarity in properties. At the core of the implementation is a mechanism returning the unit size of the intersected mutual properties linking two documents. Because of the high computational demands of this mechanism, it is a necessity to discuss means to optimize the procedure.

Suggested Citation

  • Doherr, Thorsten, 2021. "Disambiguation by namesake risk assessment," ZEW Discussion Papers 21-021, ZEW - Leibniz Centre for European Economic Research.
  • Handle: RePEc:zbw:zewdip:21021
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/231411/1/1750558505.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Benjamin F. Jones, 2005. "The burden of knowledge and the ‘death of the Renaissance man’: Is innovation getting harder?," Proceedings, Federal Reserve Bank of San Francisco.
    2. Benjamin F. Jones, 2009. "The Burden of Knowledge and the "Death of the Renaissance Man": Is Innovation Getting Harder?," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 76(1), pages 283-317.
    3. Doherr, Thorsten, 2017. "Inventor mobility index: A method to disambiguate inventor careers," ZEW Discussion Papers 17-018, ZEW - Leibniz Centre for European Economic Research.
    4. repec:plo:pmed00:0030249 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jordan Bisset & Dirk Czarnitzki & Thorsten Doherr, 2022. "High Skilled Mobility Under Uncertainty," Working Papers of Department of Management, Strategy and Innovation, Leuven 700195, KU Leuven, Faculty of Economics and Business (FEB), Department of Management, Strategy and Innovation, Leuven.
    2. Jordan Bisset & Dirk Czarnitzki & Thorsten Doherr, 2022. "Policy Uncertainty and Inventor Mobility," Working Papers of ECOOM - Centre for Research and Development Monitoring 700195, KU Leuven, Faculty of Economics and Business (FEB), ECOOM - Centre for Research and Development Monitoring.
    3. Doherr, Thorsten, 2023. "The SearchEngine: A holistic approach to matching," ZEW Discussion Papers 23-001, ZEW - Leibniz Centre for European Economic Research.
    4. Bisset, Jordan & Czarnitzki, Dirk & Doherr, Thorsten, 2024. "Inventor mobility under uncertainty," Research Policy, Elsevier, vol. 53(1).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Manuel Trajtenberg & Gil Shiff & Ran Melamed, 2009. "The "Names Game": Harnessing Inventors, Patent Data for Economic Research," Annals of Economics and Statistics, GENES, issue 93-94, pages 67-77.
    2. Grant C. Black & Paula E. Stephan, 2010. "The Economics of University Science and the Role of Foreign Graduate Students and Postdoctoral Scholars," NBER Chapters, in: American Universities in a Global Market, pages 129-161, National Bureau of Economic Research, Inc.
    3. Benjamin F. Jones, 2008. "The Knowledge Trap: Human Capital and Development Reconsidered," NBER Working Papers 14138, National Bureau of Economic Research, Inc.
    4. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    5. L. Rachel Ngai & Roberto M. Samaniego, 2006. "An R&D-Based Model of Multi-Sector Growth," CEP Discussion Papers dp0762, Centre for Economic Performance, LSE.
    6. Molina-Domene, Maria, 2018. "Labor specialization as a source of market frictions," LSE Research Online Documents on Economics 91703, London School of Economics and Political Science, LSE Library.
    7. Jensen, Scott & Liu, Xiaozhong & Yu, Yingying & Milojevic, Staša, 2016. "Generation of topic evolution trees from heterogeneous bibliographic networks," Journal of Informetrics, Elsevier, vol. 10(2), pages 606-621.
    8. Hiroyasu Inoue, 2018. "The community structure of business establishments and its properties: evidence from joint patent applications," Evolutionary and Institutional Economics Review, Springer, vol. 15(2), pages 465-475, December.
    9. Maria Molina-Domene, 2018. "Labor specialization as a source of market frictions," CEP Discussion Papers dp1580, Centre for Economic Performance, LSE.
    10. Laura Barbieri & Daniela Bragoli & Flavia Cortelezzi & Giovanni Marseguerra, 2015. "Public Support to Innovation Strategies," DISCE - Quaderni del Dipartimento di Scienze Economiche e Sociali dises1509, Università Cattolica del Sacro Cuore, Dipartimenti e Istituti di Scienze Economiche (DISCE).
    11. Hussinger, Katrin & Pellens, Maikel, 2019. "Guilt by association: How scientific misconduct harms prior collaborators," Research Policy, Elsevier, vol. 48(2), pages 516-530.
    12. Naudé, Wim & Nagler, Paula, 2022. "The Ossified Economy: The Case of Germany, 1870-2020," IZA Discussion Papers 15607, Institute of Labor Economics (IZA).
    13. Dam, Alje van & Frenken, Koen, 2022. "Variety, complexity and economic development," Research Policy, Elsevier, vol. 51(8).
    14. Singh, Anuraag & Triulzi, Giorgio & Magee, Christopher L., 2021. "Technological improvement rate predictions for all technologies: Use of patent data and an extended domain description," Research Policy, Elsevier, vol. 50(9).
    15. David Grosse Kathoefer & Jens Leker, 2012. "Knowledge transfer in academia: an exploratory study on the Not-Invented-Here Syndrome," The Journal of Technology Transfer, Springer, vol. 37(5), pages 658-675, October.
    16. Laurent R. Bergé, 2017. "Network proximity in the geography of research collaboration," Papers in Regional Science, Wiley Blackwell, vol. 96(4), pages 785-815, November.
    17. Balland, Pierre-Alexandre & Broekel, Tom & Diodato, Dario & Giuliani, Elisa & Hausmann, Ricardo & O'Clery, Neave & Rigby, David, 2022. "Reprint of The new paradigm of economic complexity," Research Policy, Elsevier, vol. 51(8).
    18. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    19. Ajay Bhaskarbhatla & Luis Cabral & Deepak Hegde & Thomas (T.L.P.R.) Peeters, 2017. "Human Capital, Firm Capabilities, and Innovation," Tinbergen Institute Discussion Papers 17-115/VII, Tinbergen Institute, revised 03 Mar 2020.
    20. Michele Pezzoni & Fabiana Visentin, 2024. "Gender bias in team formation: the case of the European Science Foundation’s grants," Science and Public Policy, Oxford University Press, vol. 51(2), pages 247-260.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C36 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Instrumental Variables (IV) Estimation

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:zewdip:21021. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/zemande.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.