IDEAS home Printed from https://ideas.repec.org/p/eti/dpaper/18018.html
   My bibliography  Save this paper

Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)

Author

Listed:
  • YIN Deyun
  • MOTOHASHI Kazuyuki

Abstract

This paper presents the first systematic disambiguation result of all Chinese patent inventors in the State Intellectual Property Office of China (SIPO) patent database from 1985 to 2016. We provide a method of constructing high-qualitative training data from lists of rare names and evidence for the reliability of these generated labels when large-scale and representative hand-labeled data are crucial but expensive, prone to error, and even impossible to obtain. We then compare the performances of seven supervised models, i.e., naive Bayes, logistic, linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA), as well as tree-based methods (random forest, AdaBoost, and gradient boosting decision trees), and found that gradient boosting classifier outperforms all other classifiers with the highest F1-score and stable performance in solving the homonym problem prevailing in Chinese names. In the last step, instead of adopting the more popular hierarchical clustering method, we clustered records with the density-based spatial clustering of applications with noise (DBSCAN) based on the distance matrix predicated by the GBDT classifier. Varying across different testing data and parameters of DBSCAN, our algorithm yielded a F1-score ranging from 93.5%-99.3% with splitting error within the range 0.5%-3% and lumping error between 0.056%-0.37%. Based on our disambiguated result, we provide an overview of Chinese inventors' regional mobility.

Suggested Citation

  • YIN Deyun & MOTOHASHI Kazuyuki, 2018. "Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)," Discussion papers 18018, Research Institute of Economy, Trade and Industry (RIETI).
  • Handle: RePEc:eti:dpaper:18018
    as

    Download full text from publisher

    File URL: https://www.rieti.go.jp/jp/publications/dp/18e018.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jian Wang & Kaspars Berzins & Diana Hicks & Julia Melkers & Fang Xiao & Diogo Pinheiro, 2012. "A boosted-trees method for name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 391-411, November.
    2. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    3. Bronwyn H. Hall & Grid Thoma & Salvatore Torrisi, 2006. "The market value of patents and R&D: Evidence from European firms," KITeS Working Papers 186, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Nov 2006.
    4. Bronwyn H. Hall & Adam Jaffe & Manuel Trajtenberg, 2005. "Market Value and Patent Citations," RAND Journal of Economics, The RAND Corporation, vol. 36(1), pages 16-38, Spring.
    5. Hall, B. & Jaffe, A. & Trajtenberg, M., 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," Papers 2001-29, Tel Aviv.
    6. Jasjit Singh, 2005. "Collaborative Networks as Determinants of Knowledge Diffusion Patterns," Management Science, INFORMS, vol. 51(5), pages 756-770, May.
    7. Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
    8. IKEUCHI Kenta & MOTOHASHI Kazuyuki & TAMURA Ryuichi & TSUKADA Naotoshi, 2017. "Measuring Science Intensity of Industry using Linked Dataset of Science, Technology and Industry," Discussion papers 17056, Research Institute of Economy, Trade and Industry (RIETI).
    9. Nicolas CARAYOL & Lorenzo CASSI, 2009. "Who\'s Who in Patents. A Bayesian approach," Cahiers du GREThA (2007-2019) 2009-07, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
    10. Brent D Fegley & Vetle I Torvik, 2013. "Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption?," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-16, July.
    11. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    12. Hongqi Han & Changqing Yao & Yuan Fu & Yongsheng Yu & Yunliang Zhang & Shuo Xu, 2017. "Semantic fingerprints-based author name disambiguation in Chinese documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1879-1896, June.
    13. Lee Fleming & Charles King & Adam I. Juda, 2007. "Small Worlds and Regional Innovation," Organization Science, INFORMS, vol. 18(6), pages 938-954, December.
    14. Gupeng Zhang & Jiancheng Guan & Xielin Liu, 2014. "The impact of small world on patent productivity in China," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 945-960, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Valentina Di Iasio & Ernest Miguelez, 2022. "The ties that bind and transform: knowledge remittances, relatedness and the direction of technical change [Brain drain or brain bank? The impact of skilled emigration on poor-country innovation]," Journal of Economic Geography, Oxford University Press, vol. 22(2), pages 423-448.
    2. Florian Seliger & Gaéran de Rassenfosse & Jan Kozak, 2019. "Geocoding of worldwide patent data," KOF Working papers 19-458, KOF Swiss Economic Institute, ETH Zurich.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    2. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    3. Bergé, Laurent & Carayol, Nicolas & Roux, Pascale, 2018. "How do inventor networks affect urban invention?," Regional Science and Urban Economics, Elsevier, vol. 71(C), pages 137-162.
    4. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    5. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    6. Stefano Breschi & Francesco Lissoni & Ernest Miguelez, 2017. "Foreign-origin inventors in the USA: testing for diaspora and brain gain effects," Journal of Economic Geography, Oxford University Press, vol. 17(5), pages 1009-1038.
    7. Clément Gorin, 2017. "Accessibility, absorptive capacity and innovation in European urban areas," Working Papers 1722, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    8. Massimiliano Ferrara & Roberto Mavilia & Bruno Antonio Pansera, 2017. "Extracting knowledge patterns with a social network analysis approach: an alternative methodology for assessing the impact of power inventors," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1593-1625, December.
    9. Harpreet Singh & David Kryscynski & Xinxin Li & Ram Gopal, 2016. "Pipes, pools, and filters: How collaboration networks affect innovative performance," Strategic Management Journal, Wiley Blackwell, vol. 37(8), pages 1649-1666, August.
    10. Niccolò Innocenti & Francesco Capone & Luciana Lazzeretti & Sergio Petralia, 2022. "The role of inventors’ networks and variety for breakthrough inventions," Papers in Regional Science, Wiley Blackwell, vol. 101(1), pages 37-57, February.
    11. Tubiana, Matteo & Miguelez, Ernest & Moreno, Rosina, 2022. "In knowledge we trust: Learning-by-interacting and the productivity of inventors," Research Policy, Elsevier, vol. 51(1).
    12. Markus Simeth & Michele Cincera, 2016. "Corporate Science, Innovation, and Firm Value," Management Science, INFORMS, vol. 62(7), pages 1970-1981, July.
    13. Dieter F. Kogler & Jürgen Essletzbichler & David L. Rigby, 2017. "The evolution of specialization in the EU15 knowledge space," Journal of Economic Geography, Oxford University Press, vol. 17(2), pages 345-373.
    14. Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
    15. Francesco Capone & Luciana Lazzeretti & Niccolò Innocenti, 2021. "Innovation and diversity: the role of knowledge networks in the inventive capacity of cities," Small Business Economics, Springer, vol. 56(2), pages 773-788, February.
    16. Shuo Xu & Ling Li & Xin An, 2023. "Do academic inventors have diverse interests?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1023-1053, February.
    17. Forman, Chris & van Zeebroeck, Nicolas, 2019. "Digital technology adoption and knowledge flows within firms: Can the Internet overcome geographic and technological distance?," Research Policy, Elsevier, vol. 48(8), pages 1-1.
    18. Ferrucci, Edoardo, 2020. "Migration, innovation and technological diversion: German patenting after the collapse of the Soviet Union," Research Policy, Elsevier, vol. 49(9).
    19. Chongfeng Wang & Gupeng Zhang, 2019. "Examining the moderating effect of technology spillovers embedded in the intra- and inter-regional collaborative innovation networks of China," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 561-593, May.
    20. Stefano Breschi & Francesco Lissoni & Gianluca Tarasconi, 2014. "Inventor Data for Research on Migration and Innovation: A Survey and a Pilot," WIPO Economic Research Working Papers 17, World Intellectual Property Organization - Economics and Statistics Division.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eti:dpaper:18018. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: TANIMOTO, Toko (email available below). General contact details of provider: https://edirc.repec.org/data/rietijp.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.