IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0089550.html
   My bibliography  Save this article

Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life

Author

Listed:
  • Anne E Thessen
  • Cynthia Sims Parr

Abstract

Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.

Suggested Citation

  • Anne E Thessen & Cynthia Sims Parr, 2014. "Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-10, March.
  • Handle: RePEc:plo:pone00:0089550
    DOI: 10.1371/journal.pone.0089550
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089550
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0089550&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0089550?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hong Cui, 2012. "CharaParser for fine‐grained semantic annotation of organism morphological descriptions," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(4), pages 738-754, April.
    2. Christian Bizer & Tom Heath & Tim Berners-Lee, 2009. "Linked Data - The Story So Far," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 5(3), pages 1-22, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stahl, Florian & Schomm, Fabian & Vossen, Gottfried, 2012. "Marketplaces for data: An initial survey," ERCIS Working Papers 14, University of Münster, European Research Center for Information Systems (ERCIS).
    2. Anett HOPPE & Ana ROXIN & Christophe NICOLLE, 2015. "Ontology-based Integration of Web Navigation for Dynamic User Profiling," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 19(1), pages 10-24.
    3. Kurt Sandkuhl & Hans-Georg Fill & Stijn Hoppenbrouwers & John Krogstie & Florian Matthes & Andreas Opdahl & Gerhard Schwabe & Ömer Uludag & Robert Winter, 2018. "From Expert Discipline to Common Practice: A Vision and Research Agenda for Extending the Reach of Enterprise Modeling," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 60(1), pages 69-80, February.
    4. Phillip Lord & Simon Cockell & Robert Stevens, 2012. "Three Steps to Heaven: Semantic Publishing in a Real World Workflow," Future Internet, MDPI, vol. 4(4), pages 1-12, November.
    5. Marta Sabou & Irem Onder & Adrian M. P. Brasoveanu & Arno Scharl, 2016. "Towards cross-domain data analytics in tourism: a linked data based approach," Information Technology & Tourism, Springer, vol. 16(1), pages 71-101, March.
    6. Wuhui Chen & Incheon Paik, 2013. "Improving efficiency of service discovery using Linked data-based service publication," Information Systems Frontiers, Springer, vol. 15(4), pages 613-625, September.
    7. Tianxing Wu & Guilin Qi & Cheng Li & Meng Wang, 2018. "A Survey of Techniques for Constructing Chinese Knowledge Graphs and Their Applications," Sustainability, MDPI, vol. 10(9), pages 1-26, September.
    8. Nitesh Khilwani & J. A. Harding, 2016. "Managing corporate memory on the semantic web," Journal of Intelligent Manufacturing, Springer, vol. 27(1), pages 101-118, February.
    9. Veale, Michael & Binns, Reuben, 2017. "Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data," SocArXiv ustxg, Center for Open Science.
    10. Schiavone, Francesco & Paolone, Francesco & Mancini, Daniela, 2019. "Business model innovation for urban smartization," Technological Forecasting and Social Change, Elsevier, vol. 142(C), pages 210-219.
    11. Ghadeer Ashour & Ahmed Al-Dubai & Imed Romdhani & Daniyal Alghazzawi, 2022. "Ontology-Based Linked Data to Support Decision-Making within Universities," Mathematics, MDPI, vol. 10(17), pages 1-21, September.
    12. E. G. Stephan & T. O. Elsethagen & L. K. Berg & M. C. Macduff & P. R. Paulson & W. J. Shaw & C. Sivaraman & W. P. Smith & A. Wynne, 2016. "Semantic catalog of things, services, and data to support a wind data management facility," Information Systems Frontiers, Springer, vol. 18(4), pages 679-691, August.
    13. Hossein Hassani & Xu Huang & Mansi Ghodsi, 2018. "Big Data and Causality," Annals of Data Science, Springer, vol. 5(2), pages 133-156, June.
    14. Muhammad Sajid Qureshi & Ali Daud, 2021. "Fine-grained academic rankings: mapping affiliation of the influential researchers with the top ranked HEIs," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8331-8361, October.
    15. Sean Kennedy & Owen Molloy & Robert Stewart & Paul Jacob & Maria Maleshkova & Frank Doheny, 2012. "A Semantically Automated Protocol Adapter for Mapping SOAP Web Services to RESTful HTTP Format to Enable the Web Infrastructure, Enhance Web Service Interoperability and Ease Web Service Migration," Future Internet, MDPI, vol. 4(2), pages 1-24, April.
    16. Simon French, 2012. "Expert Judgment, Meta-analysis, and Participatory Risk Analysis," Decision Analysis, INFORMS, vol. 9(2), pages 119-127, June.
    17. Costantino Thanos, 2017. "Research Data Reusability: Conceptual Foundations, Barriers and Enabling Technologies," Publications, MDPI, vol. 5(1), pages 1-19, January.
    18. Raymond Y. K. Lau & J. Leon Zhao & Wenping Zhang & Yi Cai & Eric W. T. Ngai, 2015. "Learning Context-Sensitive Domain Ontologies from Folksonomies: A Cognitively Motivated Method," INFORMS Journal on Computing, INFORMS, vol. 27(3), pages 561-578, August.
    19. Muhammad Ahtisham Aslam & Naif Radi Aljohani, 2017. "SPedia: A Central Hub for the Linked Open Data of Scientific Publications," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 13(1), pages 128-147, January.
    20. Costantino Thanos, 2016. "A Vision for Open Cyber-Scholarly Infrastructures," Publications, MDPI, vol. 4(2), pages 1-18, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0089550. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.