IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v119y2019i1d10.1007_s11192-019-03025-y.html
   My bibliography  Save this article

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Author

Listed:
  • Iqra Safder

    (Information Technology University)

  • Saeed-Ul Hassan

    (Information Technology University)

Abstract

Recently, tremendous advances have been observed in information retrieval systems designed to search for relevant knowledge in scientific publications. Although these techniques are quite powerful, there is still room for improvement in the area of searching for metadata relating to algorithms in full-text publication datasets—for instance, efficiency-related metrics such as precision, recall, f-measure and accuracy, and other useful metadata such as the datasets deployed and the algorithmic run-time complexity. In this study, we proposed a novel deep learning-based feature engineering approach that improves search capabilities by mining algorithmic-specific metadata from full-text scientific publications. Typically, traditional term frequency-inverse document frequency (TF-IDF)-based approaches function like a ‘bag of words’ model and thus fail to capture either the text’s semantics or the word sequence. In this work, we designed a semantically enriched synopsis of each full-text document by adding algorithmic-specific deep metadata text lines to enhance the search mechanism of algorithm search systems. These text lines are classified by our deployed deep learning-based bi-directional long short term memory (LSTM) model. The designed bi-directional LSTM model outperformed the support vector machine by 9.46%, with a 0.81 f1-score on a dataset of 37,000 algorithm-specific deep metadata text lines that had been tagged by four human experts. Lastly, we present a case study on 21,940 full-text publications downloaded from ACL ( https://aclweb.org/ ) to show the effectiveness of deep learning-based advanced feature engineering search compared to the conventional TF-IDF-based (Lucene) search.

Suggested Citation

  • Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
  • Handle: RePEc:spr:scient:v:119:y:2019:i:1:d:10.1007_s11192-019-03025-y
    DOI: 10.1007/s11192-019-03025-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-019-03025-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-019-03025-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Saeed-Ul Hassan & Iqra Safder & Anam Akram & Faisal Kamiran, 2018. "A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 973-996, August.
    2. Guillaume Cabanac & Ingo Frommholz & Philipp Mayr, 2018. "Bibliometric-enhanced information retrieval: preface," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1225-1227, August.
    3. Samaneh Karimi & Luis Moraes & Avisha Das & Azadeh Shakery & Rakesh Verma, 2018. "Citance-based retrieval and summarization using IR and machine learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1331-1366, August.
    4. Shutian Ma & Jin Xu & Chengzhi Zhang, 2018. "Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1303-1330, August.
    5. Lutz Bornmann & Rüdiger Mutz, 2015. "Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(11), pages 2215-2222, November.
    6. Kevin Heffernan & Simone Teufel, 2018. "Identifying problems and solutions in scientific text," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1367-1382, August.
    7. Ejis, 2017. "Table of Contents," European Journal of Interdisciplinary Studies, Bucharest Economic Academy, issue 02, June.
    8. Ejis, 2017. "Table of Contents," European Journal of Interdisciplinary Studies, Bucharest Economic Academy, issue 01, March.
    9. Saeed-Ul Hassan & Mubashir Imran & Sehrish Iqbal & Naif Radi Aljohani & Raheel Nawaz, 2018. "Deep context of citations using machine-learning models in scholarly full-text articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1645-1662, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Heng Yang & Nan Wang & Lina Yang & Wei Liu & Sili Wang, 2023. "Research on the Automatic Subject-Indexing Method of Academic Papers Based on Climate Change Domain Ontology," Sustainability, MDPI, vol. 15(5), pages 1-13, February.
    2. Saeed-Ul Hassan & Naif R. Aljohani & Mudassir Shabbir & Umair Ali & Sehrish Iqbal & Raheem Sarwar & Eugenio Martínez-Cámara & Sebastián Ventura & Francisco Herrera, 2020. "Tweet Coupling: a social media methodology for clustering scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 973-991, August.
    3. Chung, Park & Sohn, So Young, 2020. "Early detection of valuable patents using a deep learning model: Case of semiconductor industry," Technological Forecasting and Social Change, Elsevier, vol. 158(C).
    4. Radosław Malik & Anna Visvizi & Orlando Troisi & Mara Grimaldi, 2022. "Smart Services in Smart Cities: Insights from Science Mapping Analysis," Sustainability, MDPI, vol. 14(11), pages 1-16, May.
    5. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    6. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    2. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    3. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    4. Guillaume Cabanac & Ingo Frommholz & Philipp Mayr, 2018. "Bibliometric-enhanced information retrieval: preface," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1225-1227, August.
    5. Helen Lee & Sarah Shea Crowne & Melanie Estarziau & Keith Kranker & Charles Michalopoulos & Anne Warren & Tod Mijanovich & Jill H. Filene & Anne Duggan & Virginia Knox, "undated". "The Effects of Home Visiting on Prenatal Health, Birth Outcomes, and Health Care Use in the First Year of Life: Final Implementation and Impact Findings from the Mother and Infant Home Visiting Progra," Mathematica Policy Research Reports a9626a8d90bf4f01811d0c9d7, Mathematica Policy Research.
    6. A. Portansky P. & А. Портанский П., 2017. "О перспективах мегарегиональных торговых соглашений // About the Prospects of Megaregional Trade Agreements," Мир новой экономики // The world of new economy, Финансовый университет при Правительстве Российской Федерации // Financial University under The Governtment оf The Russian Federation, issue 3, pages 47-53.
    7. Syed Afroz Keramat & Khorshed Alam & Jeff Gow & Stuart J H Biddle, 2020. "Gender differences in the longitudinal association between obesity, and disability with workplace absenteeism in the Australian working population," PLOS ONE, Public Library of Science, vol. 15(5), pages 1-14, May.
    8. Anders Peder Højer Karlsen & Mik Wetterslev & Signe Elisa Hansen & Morten Sejer Hansen & Ole Mathiesen & Jørgen B Dahl, 2017. "Postoperative pain treatment after total knee arthroplasty: A systematic review," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-53, March.
    9. Marcella Alsan & Sarah Eichmeyer, 2024. "Experimental Evidence on the Effectiveness of Nonexperts for Improving Vaccine Demand," American Economic Journal: Economic Policy, American Economic Association, vol. 16(1), pages 394-414, February.
    10. Claire Greene & Scott Schuh, 2017. "The 2016 Diary of Consumer Payment Choice," Research Data Report 17-7, Federal Reserve Bank of Boston.
    11. Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
    12. Michelle Tew & Philip Clarke & Karin Thursky & Kim Dalziel, 2019. "Incorporating Future Medical Costs: Impact on Cost-Effectiveness Analysis in Cancer Patients," PharmacoEconomics, Springer, vol. 37(7), pages 931-941, July.
    13. Adam Lulek, 2019. "Information on environmental protection and annual reports of oil companies," Ekonomia i Prawo, Uniwersytet Mikolaja Kopernika, vol. 18(4), pages 475-486, December.
    14. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    15. Meier, Armando N. & Levav, Jonathan & Meier, Stephan, 2020. "Early Release and Recidivism," IZA Discussion Papers 13035, Institute of Labor Economics (IZA).
    16. Saeed-Ul Hassan & Naif R. Aljohani & Mudassir Shabbir & Umair Ali & Sehrish Iqbal & Raheem Sarwar & Eugenio Martínez-Cámara & Sebastián Ventura & Francisco Herrera, 2020. "Tweet Coupling: a social media methodology for clustering scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 973-991, August.
    17. Duane Hybertson & Mimi Hailegiorghis & Kenneth Griesi & Brian Soeder & William Rouse, 2018. "Evidence‐based systems engineering," Systems Engineering, John Wiley & Sons, vol. 21(3), pages 243-258, May.
    18. Rikki Jones & Cindy Woods & Kim Usher, 2018. "Rates and features of methamphetamine‐related presentations to emergency departments: An integrative literature review," Journal of Clinical Nursing, John Wiley & Sons, vol. 27(13-14), pages 2569-2582, July.
    19. Antonio Gagliano & Francesco Nocera & Giuseppe Tina, 2020. "Performances and economic analysis of small photovoltaic–electricity energy storage system for residential applications," Energy & Environment, , vol. 31(1), pages 155-175, February.
    20. Samuel W Hainsworth & Paul M Dietze & David P Wilson & Brett Sutton & Margaret E Hellard & Nick Scott, 2018. "Hepatitis C virus notification rates in Australia are highest in socioeconomically disadvantaged areas," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-14, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:119:y:2019:i:1:d:10.1007_s11192-019-03025-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.