IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v73y2022i11p1513-1528.html
   My bibliography  Save this article

Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures: An application in Physics

Author

Listed:
  • Gerson Pech
  • Catarina Delgado
  • Silvio Paolo Sorella

Abstract

Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These “exclusive journals” are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy‐makers, funding, and research institutions—via more accurate academic performance evaluations—, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.

Suggested Citation

  • Gerson Pech & Catarina Delgado & Silvio Paolo Sorella, 2022. "Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures: An application in Physics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(11), pages 1513-1528, November.
  • Handle: RePEc:bla:jinfst:v:73:y:2022:i:11:p:1513-1528
    DOI: 10.1002/asi.24655
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24655
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24655?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lutz Bornmann, 2018. "Field classification of publications in Dimensions: a first case study testing its reliability and validity," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 637-640, October.
    2. Theresa Velden & Shiyan Yan & Carl Lagoze, 2017. "Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1033-1051, May.
    3. Bart Thijs & Lin Zhang & Wolfgang Glänzel, 2015. "Bibliographic coupling and hierarchical clustering for the validation and improvement of subject-classification schemes," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1453-1467, December.
    4. Wang, Qi & Waltman, Ludo, 2016. "Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus," Journal of Informetrics, Elsevier, vol. 10(2), pages 347-364.
    5. Fei Shu & Yue Ma & Junping Qiu & Vincent Larivière, 2020. "Classifications of science and their effects on bibliometric evaluations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2727-2744, December.
    6. Eugene Garfield & Irving H. Sher, 1993. "KeyWords Plus™—algorithmic derivative indexing," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 44(5), pages 298-299, June.
    7. Loet Leydesdorff & Lutz Bornmann, 2016. "The operationalization of “fields” as WoS subject categories (WCs) in evaluative bibliometrics: The cases of “library and information science” and “science & technology studies”," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(3), pages 707-714, March.
    8. Gerardo Urrutia Sánchez & Lilian Prado & Wolfgang Bietenholz, 2018. "Theoretical high energy physcis in Latin America from 1990 to 2012: a statistical study," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 125-146, July.
    9. Christian Herzog & Brian Kierkegaard Lunn, 2018. "Response to the letter ‘Field classification of publications in Dimensions: a first case study testing its reliability and validity’," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 641-645, October.
    10. Juan Zhang & Qi Yu & Fashan Zheng & Chao Long & Zuxun Lu & Zhiguang Duan, 2016. "Comparing keywords plus of WOS and author keywords: A case study of patient adherence research," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(4), pages 967-972, April.
    11. Nees Jan Eck & Ludo Waltman, 2010. "Software survey: VOSviewer, a computer program for bibliometric mapping," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(2), pages 523-538, August.
    12. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    13. Peter Sjögårde & Per Ahlgren & Ludo Waltman, 2021. "Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(7), pages 853-869, July.
    14. Lu, Wei & Liu, Zhifeng & Huang, Yong & Bu, Yi & Li, Xin & Cheng, Qikai, 2020. "How do authors select keywords? A preliminary study of author keyword selection behavior," Journal of Informetrics, Elsevier, vol. 14(4).
    15. Gerson Pech & Catarina Delgado, 2020. "Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 223-252, April.
    16. Fontana, Magda & Iori, Martina & Montobbio, Fabio & Sinatra, Roberta, 2020. "New and atypical combinations: An assessment of novelty and interdisciplinarity," Research Policy, Elsevier, vol. 49(7).
    17. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    18. Richard Klavans & Kevin W. Boyack, 2017. "Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 984-998, April.
    19. Erjia Yan, 2014. "Finding knowledge paths among scientific disciplines," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(11), pages 2331-2347, November.
    20. Leydesdorff, Loet & Bornmann, Lutz & Zhou, Ping, 2016. "Construction of a pragmatic base line for journal classifications and maps based on aggregated journal-journal citation relations," Journal of Informetrics, Elsevier, vol. 10(4), pages 902-918.
    21. Alesia Zuccala & Maarten Someren & Maurits Bellen, 2014. "A machine-learning approach to coding book reviews as quality indicators: Toward a theory of megacitation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(11), pages 2248-2260, November.
    22. Shu, Fei & Julien, Charles-Antoine & Zhang, Lin & Qiu, Junping & Zhang, Jing & Larivière, Vincent, 2019. "Comparing journal and paper level classifications of science," Journal of Informetrics, Elsevier, vol. 13(1), pages 202-225.
    23. Lin Zhang & Ronald Rousseau & Wolfgang Glänzel, 2016. "Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1257-1265, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.
    2. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    3. Leydesdorff, Loet & Bornmann, Lutz & Zhou, Ping, 2016. "Construction of a pragmatic base line for journal classifications and maps based on aggregated journal-journal citation relations," Journal of Informetrics, Elsevier, vol. 10(4), pages 902-918.
    4. Fei Shu & Yue Ma & Junping Qiu & Vincent Larivière, 2020. "Classifications of science and their effects on bibliometric evaluations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2727-2744, December.
    5. Peter Sjögårde & Fereshteh Didegah, 2022. "The association between topic growth and citation impact of research publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1903-1921, April.
    6. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    7. Gabriele Sampagnaro, 2023. "Keyword occurrences and journal specialization," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(10), pages 5629-5645, October.
    8. Matthias Held & Grit Laudel & Jochen Gläser, 2021. "Challenges to the validity of topic reconstruction," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4511-4536, May.
    9. Baccini, Federica & Barabesi, Lucio & Baccini, Alberto & Khelfaoui, Mahdi & Gingras, Yves, 2022. "Similarity network fusion for scholarly journals," Journal of Informetrics, Elsevier, vol. 16(1).
    10. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    11. Haunschild, Robin & Daniels, Angela D. & Bornmann, Lutz, 2022. "Scores of a specific field-normalized indicator calculated with different approaches of field-categorization: Are the scores different or similar?," Journal of Informetrics, Elsevier, vol. 16(1).
    12. Sitaram Devarakonda & Dmitriy Korobskiy & Tandy Warnow & George Chacko, 2020. "Viewing computer science through citation analysis: Salton and Bergmark Redux," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 271-287, October.
    13. Shu, Fei & Julien, Charles-Antoine & Zhang, Lin & Qiu, Junping & Zhang, Jing & Larivière, Vincent, 2019. "Comparing journal and paper level classifications of science," Journal of Informetrics, Elsevier, vol. 13(1), pages 202-225.
    14. Juan Pablo Bascur & Suzan Verberne & Nees Jan Eck & Ludo Waltman, 2023. "Academic information retrieval using citation clusters: in-depth evaluation based on systematic reviews," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2895-2921, May.
    15. Haunschild, Robin & Schier, Hermann & Marx, Werner & Bornmann, Lutz, 2018. "Algorithmically generated subject categories based on citation relations: An empirical micro study using papers on overall water splitting," Journal of Informetrics, Elsevier, vol. 12(2), pages 436-447.
    16. Xian Li & Ronald Rousseau & Liming Liang & Fangjie Xi & Yushuang Lü & Yifan Yuan & Xiaojun Hu, 2022. "Is low interdisciplinarity of references an unexpected characteristic of Nobel Prize winning research?," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2105-2122, April.
    17. Juan Miguel Campanario, 2018. "Are leaders really leading? Journals that are first in Web of Science subject categories in the context of their groups," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 111-130, April.
    18. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    19. Carusi, Chiara & Bianchi, Giuseppe, 2019. "Scientific community detection via bipartite scholar/journal graph co-clustering," Journal of Informetrics, Elsevier, vol. 13(1), pages 354-386.
    20. Roberto Camerani & Daniele Rotolo & Nicola Grassano, 2018. "Do Firms Publish? A Multi-Sectoral Analysis," SPRU Working Paper Series 2018-21, SPRU - Science Policy Research Unit, University of Sussex Business School.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:73:y:2022:i:11:p:1513-1528. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.