IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v9y2015i3p455-465.html
   My bibliography  Save this article

Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods

Author

Listed:
  • Yan, Erjia
  • Zhu, Yongjun

Abstract

The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabulary-based methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabulary-based methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level.

Suggested Citation

  • Yan, Erjia & Zhu, Yongjun, 2015. "Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods," Journal of Informetrics, Elsevier, vol. 9(3), pages 455-465.
  • Handle: RePEc:eee:infome:v:9:y:2015:i:3:p:455-465
    DOI: 10.1016/j.joi.2015.04.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157715000474
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2015.04.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    2. Don R. Swanson & Neil R. Smalheiser & Vetle I. Torvik, 2006. "Ranking indirect connections in literature‐based discovery: The role of medical subject headings," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(11), pages 1427-1439, September.
    3. Erjia Yan & Ying Ding & Elin K. Jacob, 2012. "Overlaying communities and topics: an analysis on publication networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 499-513, February.
    4. Erjia Yan & Ying Ding, 2012. "Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(7), pages 1313-1326, July.
    5. Erjia Yan & Ying Ding, 2012. "Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(7), pages 1313-1326, July.
    6. Graeme Hirst, 1978. "Discipline impact factors: A method for determining core journal lists," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(4), pages 171-172, July.
    7. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yongjun Zhu & Min Song & Erjia Yan, 2016. "Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-14, May.
    2. Erjia Yan & Chaojiang Wu & Min Song, 2018. "The funding factor: a cross-disciplinary examination of the association between research funding and citation impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 369-384, April.
    3. Ma, Jing & Abrams, Natalie F. & Porter, Alan L. & Zhu, Donghua & Farrell, Dorothy, 2019. "Identifying translational indicators and technology opportunities for nanomedical research using tech mining: The case of gold nanostructures," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 767-775.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    2. Erjia Yan, 2014. "Topic-based Pagerank: toward a topic-level scientific evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(2), pages 407-437, August.
    3. Chaoqun Ni & Cassidy R. Sugimoto & Blaise Cronin, 2013. "Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 1161-1173, March.
    4. Yuen-Hsien Tseng & Ming-Yueh Tsay, 2013. "Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 503-528, May.
    5. María Pinto & Rosaura Fernández-Pascual & David Caballero-Mariscal & Dora Sales, 2020. "Information literacy trends in higher education (2006–2019): visualizing the emerging field of mobile information literacy," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1479-1510, August.
    6. Guan-Can Yang & Gang Li & Chun-Ya Li & Yun-Hua Zhao & Jing Zhang & Tong Liu & Dar-Zen Chen & Mu-Hsuan Huang, 2015. "Using the comprehensive patent citation network (CPC) to evaluate patent value," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1319-1346, December.
    7. An, Lu & Yu, Chuanming & Li, Gang, 2014. "Visual topical analysis of Chinese and American Library and Information Science research institutions," Journal of Informetrics, Elsevier, vol. 8(1), pages 217-233.
    8. repec:plo:pone00:0189137 is not listed on IDEAS
    9. Ziyan Zhang & Junyan Zhang & Pushi Wang, 2024. "Measurement of disruptive innovation and its validity based on improved disruption index," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 6477-6531, November.
    10. Ibiso Isoboye Damieibi & Isoboye Jacob Damieibi, 2023. "Impact of Networking on Resources Sharing in Rivers State University, Port Harcourt, Nigeria," International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 7(5), pages 1229-1248, May.
    11. Yi Bu & Binglu Wang & Win-bin Huang & Shangkun Che & Yong Huang, 2018. "Using the appearance of citations in full text on author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 275-289, July.
    12. Mu-hsuan Huang & Wang-Ching Shaw & Chi-Shiou Lin, 2019. "One category, two communities: subfield differences in “Information Science and Library Science” in Journal Citation Reports," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 1059-1079, May.
    13. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    14. Ali Gazni & Fereshteh Didegah, 2016. "The relationship between authors’ bibliographic coupling and citation exchange: analyzing disciplinary differences," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 609-626, May.
    15. Nykl, Michal & Campr, Michal & Ježek, Karel, 2015. "Author ranking based on personalized PageRank," Journal of Informetrics, Elsevier, vol. 9(4), pages 777-799.
    16. Carlos Olmeda-Gómez & Maria-Antonia Ovalle-Perandones & Antonio Perianes-Rodríguez, 2017. "Co-word analysis and thematic landscapes in Spanish information science literature, 1985–2014," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 195-217, October.
    17. Guo Chen & Lu Xiao & Chang-ping Hu & Xue-qin Zhao, 2015. "Identifying the research focus of Library and Information Science institutions in China with institution-specific keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 707-724, May.
    18. Daria Maltseva & Vladimir Batagelj, 2020. "iMetrics: the development of the discipline with many names," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 313-359, October.
    19. Jimi Adams & Ryan Light, 2014. "Mapping Interdisciplinary Fields: Efficiencies, Gaps and Redundancies in HIV/AIDS Research," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-13, December.
    20. Yui-yip Lau & César Ducruet & Adolf K. Y. Ng & Xiaowen Fu, 2017. "Across the waves: a bibliometric analysis of container shipping research since the 1960s," Maritime Policy & Management, Taylor & Francis Journals, vol. 44(6), pages 667-684, August.
    21. Jun-Ping Qiu & Ke Dong & Hou-Qiang Yu, 2014. "Comparative study on structure and correlation among author co-occurrence networks in bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1345-1360, November.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:9:y:2015:i:3:p:455-465. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.