IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v111y2017i3d10.1007_s11192-017-2338-6.html
   My bibliography  Save this article

Semantic fingerprints-based author name disambiguation in Chinese documents

Author

Listed:
  • Hongqi Han

    (Institute of Scientific and Technical Information of China)

  • Changqing Yao

    (Institute of Scientific and Technical Information of China)

  • Yuan Fu

    (Institute of Scientific and Technical Information of China)

  • Yongsheng Yu

    (Institute of Scientific and Technical Information of China)

  • Yunliang Zhang

    (Institute of Scientific and Technical Information of China)

  • Shuo Xu

    (Institute of Scientific and Technical Information of China)

Abstract

Author name disambiguation is an important problem that needs to be resolved in bibliometric analysis or tech mining. Many techniques have been presented; however, most of them require a long run time or additional information. A new method based on semantic fingerprints was presented to disambiguate author names without external data. A manually annotated dataset was built to testify on the efficiency of the presented method. Experiments using co-author features, institution features, and text fingerprints were conducted respectively. We found that the first two methods had higher precision, but their recall was low, and the text fingerprint method had higher recall and satisfied precision. Based on these results, we integrated co-author features, institution features, and text fingerprints to provide semantic fingerprints for disambiguating author names and achieving better performance on the F-measure.

Suggested Citation

  • Hongqi Han & Changqing Yao & Yuan Fu & Yongsheng Yu & Yunliang Zhang & Shuo Xu, 2017. "Semantic fingerprints-based author name disambiguation in Chinese documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1879-1896, June.
  • Handle: RePEc:spr:scient:v:111:y:2017:i:3:d:10.1007_s11192-017-2338-6
    DOI: 10.1007/s11192-017-2338-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-017-2338-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-017-2338-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anne-Wil Harzing, 2015. "Health warning: might contain multiple personalities—the problem of homonyms in Thomson Reuters Essential Science Indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2259-2270, December.
    2. Li Tang & John P. Walsh, 2010. "Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(3), pages 763-784, September.
    3. Feriha Ibriyamova & Samuel Kogan & Galla Salganik-Shoshan & David Stolin, 2017. "Using semantic fingerprinting in finance," Applied Economics, Taylor & Francis Journals, vol. 49(28), pages 2719-2735, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jinseok Kim, 2019. "A fast and integrative algorithm for clustering performance evaluation in author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 661-681, August.
    2. Wang, Zhiqi & Chen, Yue & Glänzel, Wolfgang, 2020. "Preprints as accelerator of scholarly communication: An empirical analysis in Mathematics," Journal of Informetrics, Elsevier, vol. 14(4).
    3. Li Zhang & Wei Lu & Jinqing Yang, 2023. "LAGOS‐AND: A large gold standard dataset for scholarly author name disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 168-185, February.
    4. YIN Deyun & MOTOHASHI Kazuyuki, 2018. "Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)," Discussion papers 18018, Research Institute of Economy, Trade and Industry (RIETI).
    5. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    6. Shuo Xu & Ling Li & Xin An, 2023. "Do academic inventors have diverse interests?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1023-1053, February.
    7. Xu, Shuo & Hao, Liyuan & Yang, Guancan & Lu, Kun & An, Xin, 2021. "A topic models based framework for detecting and forecasting emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 162(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Liu, Weishu, 2021. "Caveats for the use of Web of Science Core Collection in old literature retrieval and historical bibliometric analysis," Technological Forecasting and Social Change, Elsevier, vol. 172(C).
    2. Keith Head & Yao Amber Li & Asier Minondo, 2019. "Geography, Ties, and Knowledge Flows: Evidence from Citations in Mathematics," The Review of Economics and Statistics, MIT Press, vol. 101(4), pages 713-727, October.
    3. Deyun Yin & Kazuyuki Motohashi & Jianwei Dang, 2020. "Large-scale name disambiguation of Chinese patent inventors (1985–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 765-790, February.
    4. Dangzhi Zhao & Andreas Strotmann, 2020. "Telescopic and panoramic views of library and information science research 2011–2018: a comparison of four weighting schemes for author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 255-270, July.
    5. Li Tang & Philip Shapira, 2011. "Regional development and interregional collaboration in the growth of nanotechnology research in China," Scientometrics, Springer;Akadémiai Kiadó, vol. 86(2), pages 299-315, February.
    6. Alison M. J. Buchan & Eva Jurczyk & Ruth Isserlin & Gary D. Bader, 2016. "Global neuroscience and mental health research: a bibliometrics case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(1), pages 515-531, October.
    7. Dejian Yu & Sun Meng, 2018. "An overview of biomass energy research with bibliometric indicators," Energy & Environment, , vol. 29(4), pages 576-590, June.
    8. Liu, Meijun & Hu, Xiao, 2021. "Will collaborators make scientists move? A Generalized Propensity Score analysis," Journal of Informetrics, Elsevier, vol. 15(1).
    9. Omar Hernando Avila-Poveda, 2014. "Technical report: the trend of author compound names and its implications for authorship identity identification," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 833-846, October.
    10. Sameer Kumar & Jariah Mohd. Jan, 2013. "Mapping research collaborations in the business and management field in Malaysia, 1980–2010," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(3), pages 491-517, December.
    11. Abdelghani Maddi & Lesya Baudoin, 2022. "The quality of the web of science data: a longitudinal study on the completeness of authors-addresses links," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6279-6292, November.
    12. Wang, Guoyan & Hu, Guangyuan & Li, Chuanfeng & Tang, Li, 2018. "Long live the scientists: Tracking the scientific fame of great minds in physics," Journal of Informetrics, Elsevier, vol. 12(4), pages 1089-1098.
    13. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    14. Agrawal, Ajay & McHale, John & Oettl, Alexander, 2017. "How stars matter: Recruiting and peer effects in evolutionary biology," Research Policy, Elsevier, vol. 46(4), pages 853-867.
    15. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    16. Liu, Weishu & Hu, Guangyuan & Tang, Li, 2018. "Missing author address information in Web of Science—An explorative study," Journal of Informetrics, Elsevier, vol. 12(3), pages 985-997.
    17. Ajay Agrawal & John McHale & Alexander Oettl, 2014. "Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary Biology," NBER Chapters, in: The Changing Frontier: Rethinking Science and Innovation Policy, pages 75-102, National Bureau of Economic Research, Inc.
    18. Yuta Kikuchi & Ryo Nakajima, 2016. "Evaluating Professor Value-added: Evidence from Professor and Student Matching in Physics," Keio-IES Discussion Paper Series 2016-002, Institute for Economics Studies, Keio University.
    19. Gordana Budimir & Sophia Rahimeh & Sameh Tamimi & Primož Južnič, 2021. "Comparison of self-citation patterns in WoS and Scopus databases based on national scientific production in Slovenia (1996–2020)," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2249-2267, March.
    20. Hu, Zhigang & Tian, Wencan & Xu, Shenmeng & Zhang, Chunbo & Wang, Xianwen, 2018. "Four pitfalls in normalizing citation indicators: An investigation of ESI’s selection of highly cited papers," Journal of Informetrics, Elsevier, vol. 12(4), pages 1133-1145.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:111:y:2017:i:3:d:10.1007_s11192-017-2338-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.