IDEAS home Printed from https://ideas.repec.org/p/eti/dpaper/18018.html

Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)

Author

Listed:
  • YIN Deyun
  • Kazuyuki MOTOHASHI

Abstract

This paper presents the first systematic disambiguation result of all Chinese patent inventors in the State Intellectual Property Office of China (SIPO) patent database from 1985 to 2016. We provide a method of constructing high-qualitative training data from lists of rare names and evidence for the reliability of these generated labels when large-scale and representative hand-labeled data are crucial but expensive, prone to error, and even impossible to obtain. We then compare the performances of seven supervised models, i.e., naive Bayes, logistic, linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA), as well as tree-based methods (random forest, AdaBoost, and gradient boosting decision trees), and found that gradient boosting classifier outperforms all other classifiers with the highest F1-score and stable performance in solving the homonym problem prevailing in Chinese names. In the last step, instead of adopting the more popular hierarchical clustering method, we clustered records with the density-based spatial clustering of applications with noise (DBSCAN) based on the distance matrix predicated by the GBDT classifier. Varying across different testing data and parameters of DBSCAN, our algorithm yielded a F1-score ranging from 93.5%-99.3% with splitting error within the range 0.5%-3% and lumping error between 0.056%-0.37%. Based on our disambiguated result, we provide an overview of Chinese inventors' regional mobility.

Suggested Citation

  • YIN Deyun & Kazuyuki MOTOHASHI, 2018. "Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)," Discussion papers 18018, Research Institute of Economy, Trade and Industry (RIETI).
  • Handle: RePEc:eti:dpaper:18018
    as

    Download full text from publisher

    File URL: https://www.rieti.go.jp/jp/publications/dp/18e018.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    2. Brent D Fegley & Vetle I Torvik, 2013. "Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption?," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-16, July.
    3. Lorenzo Cassi & Nicolas Carayol, 2009. "Who's Who in Patents. A Bayesian approach," Working Papers hal-00631750, HAL.
    4. Bronwyn H. Hall & Adam B. Jaffe & Manuel Trajtenberg, 2001. "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools," NBER Working Papers 8498, National Bureau of Economic Research, Inc.
    5. Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2006. "The market value of patents and R&D: Evidence from European firms," KITeS Working Papers 186, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Nov 2006.
    6. Bronwyn H. Hall & Adam Jaffe & Manuel Trajtenberg, 2005. "Market Value and Patent Citations," RAND Journal of Economics, The RAND Corporation, vol. 36(1), pages 16-38, Spring.
    7. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    8. Hongqi Han & Changqing Yao & Yuan Fu & Yongsheng Yu & Yunliang Zhang & Shuo Xu, 2017. "Semantic fingerprints-based author name disambiguation in Chinese documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1879-1896, June.
    9. Jasjit Singh, 2005. "Collaborative Networks as Determinants of Knowledge Diffusion Patterns," Management Science, INFORMS, vol. 51(5), pages 756-770, May.
    10. Lee Fleming & Charles King & Adam I. Juda, 2007. "Small Worlds and Regional Innovation," Organization Science, INFORMS, vol. 18(6), pages 938-954, December.
    11. Kenta IKEUCHI & Kazuyuki MOTOHASHI & Ryuichi TAMURA & Naotoshi TSUKADA, 2017. "Measuring Science Intensity of Industry using Linked Dataset of Science, Technology and Industry," Discussion papers 17056, Research Institute of Economy, Trade and Industry (RIETI).
    12. Gupeng Zhang & Jiancheng Guan & Xielin Liu, 2014. "The impact of small world on patent productivity in China," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 945-960, February.
    13. Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Valentina Di Iasio & Ernest Miguelez, 2022. "The ties that bind and transform: knowledge remittances, relatedness and the direction of technical change [Brain drain or brain bank? The impact of skilled emigration on poor-country innovation]," Journal of Economic Geography, Oxford University Press, vol. 22(2), pages 423-448.
    2. Florian Seliger & Gaéran de Rassenfosse & Jan Kozak, 2019. "Geocoding of worldwide patent data," KOF Working papers 19-458, KOF Swiss Economic Institute, ETH Zurich.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    2. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    3. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    4. Bergé, Laurent & Carayol, Nicolas & Roux, Pascale, 2018. "How do inventor networks affect urban invention?," Regional Science and Urban Economics, Elsevier, vol. 71(C), pages 137-162.
    5. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    6. Harpreet Singh & David Kryscynski & Xinxin Li & Ram Gopal, 2016. "Pipes, pools, and filters: How collaboration networks affect innovative performance," Strategic Management Journal, Wiley Blackwell, vol. 37(8), pages 1649-1666, August.
    7. Markus Simeth & Michele Cincera, 2016. "Corporate Science, Innovation, and Firm Value," Management Science, INFORMS, vol. 62(7), pages 1970-1981, July.
    8. Clément Gorin, 2017. "Accessibility, absorptive capacity and innovation in European urban areas," Working Papers 1722, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    9. Massimiliano Ferrara & Roberto Mavilia & Bruno Antonio Pansera, 2017. "Extracting knowledge patterns with a social network analysis approach: an alternative methodology for assessing the impact of power inventors," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1593-1625, December.
    10. Stefano Breschi & Francesco Lissoni & Ernest Miguelez, 2017. "Foreign-origin inventors in the USA: testing for diaspora and brain gain effects," Journal of Economic Geography, Oxford University Press, vol. 17(5), pages 1009-1038.
    11. Wipatkrut, Pattharaporn & Su, Hsin-Ning, 2025. "Exploring the impact of technological convergence on General-Purpose Technologies: A multi-level generality perspective," Technology in Society, Elsevier, vol. 82(C).
    12. Forman, Chris & van Zeebroeck, Nicolas, 2019. "Digital technology adoption and knowledge flows within firms: Can the Internet overcome geographic and technological distance?," Research Policy, Elsevier, vol. 48(8), pages 1-1.
    13. Chongfeng Wang & Gupeng Zhang, 2019. "Examining the moderating effect of technology spillovers embedded in the intra- and inter-regional collaborative innovation networks of China," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 561-593, May.
    14. Niccolò Innocenti & Francesco Capone & Luciana Lazzeretti & Sergio Petralia, 2022. "The role of inventors’ networks and variety for breakthrough inventions," Papers in Regional Science, Wiley Blackwell, vol. 101(1), pages 37-57, February.
    15. Tubiana, Matteo & Miguelez, Ernest & Moreno, Rosina, 2022. "In knowledge we trust: Learning-by-interacting and the productivity of inventors," Research Policy, Elsevier, vol. 51(1).
    16. Myriam Mariani & Marzia Romanelli, 2006. ""Stacking" or "Picking" Patents? The Inventors' Choice Between Quantity and Quality," LEM Papers Series 2006/06, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    17. Mohd Shadab Danish & Pritam Ranjan & Ruchi Sharma, 2022. "Assessing the Impact of Patent Attributes on the Value of Discrete and Complex Innovations," Papers 2208.07222, arXiv.org.
    18. Bahar, Dany & Choudhury, Prithwiraj & Miguelez, Ernest & Signorelli, Sara, 2024. "Global Mobile Inventors," Journal of Development Economics, Elsevier, vol. 171(C).
    19. Mohd Shadab Danish & Pritam Ranjan & Ruchi Sharma, 2021. "Identification of “Valuable” Technologies via Patent Statistics in India: An Analysis Based on Renewal Information," BASE University Working Papers 13/2021, BASE University, Bengaluru, India.
    20. Dieter F. Kogler & Jürgen Essletzbichler & David L. Rigby, 2017. "The evolution of specialization in the EU15 knowledge space," Journal of Economic Geography, Oxford University Press, vol. 17(2), pages 345-373.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eti:dpaper:18018. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: TANIMOTO, Toko (email available below). General contact details of provider: https://edirc.repec.org/data/rietijp.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.