IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v96y2013i1d10.1007_s11192-012-0900-9.html
   My bibliography  Save this article

Detecting the knowledge structure of bioinformatics by mining full-text collections

Author

Listed:
  • Min Song

    (Yonsei University)

  • Su Yeon Kim

    (Yonsei University)

Abstract

Bioinformatics is a fast-growing, diverse research field that has recently gained much public attention. Even though there are several attempts to understand the field of bioinformatics by bibliometric analysis, the proposed approach in this paper is the first attempt at applying text mining techniques to a large set of full-text articles to detect the knowledge structure of the field. To this end, we use PubMed Central full-text articles for bibliometric analysis instead of relying on citation data provided in Web of Science. In particular, we develop text mining routines to build a custom-made citation database as a result of mining full-text. We present several interesting findings in this study. First, the majority of the papers published in the field of bioinformatics are not cited by others (63 % of papers received less than two citations). Second, there is a linear, consistent increase in the number of publications. Particularly year 2003 is the turning point in terms of publication growth. Third, most researches of bioinformatics are driven by USA-based institutes followed by European institutes. Fourth, the results of topic modeling and word co-occurrence analysis reveal that major topics focus more on biological aspects than on computational aspects of bioinformatics. However, the top 10 ranked articles identified by PageRank are more related to computational aspects. Fifth, visualization of author co-citation analysis indicates that researchers in molecular biology or genomics play a key role in connecting sub-disciplines of bioinformatics.

Suggested Citation

  • Min Song & Su Yeon Kim, 2013. "Detecting the knowledge structure of bioinformatics by mining full-text collections," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 183-201, July.
  • Handle: RePEc:spr:scient:v:96:y:2013:i:1:d:10.1007_s11192-012-0900-9
    DOI: 10.1007/s11192-012-0900-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-012-0900-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-012-0900-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Pedro Albarrán & Javier Ruiz‐Castillo, 2011. "References made and citations received by scientific articles," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(1), pages 40-49, January.
    2. Michael J. Stringer & Marta Sales-Pardo & Luís A. Nunes Amaral, 2010. "Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1377-1385, July.
    3. Anthony F.J. van Raan, 2006. "Statistical properties of bibliometric indicators: Research group indicator distributions and correlations," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 408-430, February.
    4. Senator Jeong & Sungin Lee & Hong‐Gee Kim, 2009. "Are you an invited speaker? A bibliometric analysis of elite groups for scholarly events in bioinformatics," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(6), pages 1118-1131, June.
    5. Chaomei Chen & Fidelia Ibekwe-SanJuan & Jianhua Hou, 2010. "The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1386-1409, July.
    6. Per O. Seglen, 1992. "The skewness of science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(9), pages 628-638, October.
    7. Wolfgang Glänzel & Frizo Janssens & Bart Thijs, 2009. "A comparative analysis of publication activity and citation impact based on the core literature in bioinformatics," Scientometrics, Springer;Akadémiai Kiadó, vol. 79(1), pages 109-129, April.
    8. Swapan Kumar Patra & Saroj Mishra, 2006. "Bibliometric study of bioinformatics literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 67(3), pages 477-489, June.
    9. Howard D. White & Belver C. Griffith, 1981. "Author cocitation: A literature measure of intellectual structure," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 32(3), pages 163-171, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hakyeon Lee & Hanbin Seo & Youngjung Geum, 2018. "Uncovering the Topic Landscape of Product-Service System Research: from Sustainability to Value Creation," Sustainability, MDPI, vol. 10(4), pages 1-15, March.
    2. Ai-Yuan Liu & Shi-Ying Li & Yu-Qing Guo, 2014. "Characteristics of research on bioinformatics in China assessed with Science Citation Index Expanded," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(2), pages 371-391, May.
    3. Jun-Ping Qiu & Ke Dong & Hou-Qiang Yu, 2014. "Comparative study on structure and correlation among author co-occurrence networks in bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1345-1360, November.
    4. Iñaki Bildosola & Gaizka Garechana & Enara Zarrabeitia & Ernesto Cilleruelo, 2020. "Characterization of strategic emerging technologies: the case of big data," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 28(1), pages 45-60, March.
    5. Ji Yeon Lee & Richa Kumari & Jae Yun Jeong & Tae-Hyun Kim & Byeong-Hee Lee, 2020. "Knowledge Discovering on Graphene Green Technology by Text Mining in National R&D Projects in South Korea," Sustainability, MDPI, vol. 12(23), pages 1-16, November.
    6. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    7. Zhichao Ba & Yujie Cao & Jin Mao & Gang Li, 2019. "A hierarchical approach to analyzing knowledge integration between two fields—a case study on medical informatics and computer science," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1455-1486, June.
    8. Yongjun Zhu & Min Song & Erjia Yan, 2016. "Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-14, May.
    9. Sepideh Fahimifar & Khadijeh Mousavi & Fatemeh Mozaffari & Marcel Ausloos, 2023. "Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3685-3712, August.
    10. Zhu, Lin & Cunningham, Scott W., 2022. "Unveiling the knowledge structure of technological forecasting and social change (1969–2020) through an NMF-based hierarchical topic model," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    11. Mao, Jin & Liang, Zhentao & Cao, Yujie & Li, Gang, 2020. "Quantifying cross-disciplinary knowledge flow from the perspective of content: Introducing an approach based on knowledge memes," Journal of Informetrics, Elsevier, vol. 14(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pedro Albarrán & Juan A. Crespo & Ignacio Ortuño & Javier Ruiz-Castillo, 2011. "The skewness of science in 219 sub-fields and a number of aggregates," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(2), pages 385-397, August.
    2. Ruiz-Castillo, Javier & Costas, Rodrigo, 2018. "Individual and field citation distributions in 29 broad scientific fields," Journal of Informetrics, Elsevier, vol. 12(3), pages 868-892.
    3. Zhihui Zhang & Ying Cheng & Nian Cai Liu, 2015. "Improving the normalization effect of mean-based method from the perspective of optimization: optimization-based linear methods and their performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 587-607, January.
    4. Vîiu, Gabriel-Alexandru, 2018. "The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics," Journal of Informetrics, Elsevier, vol. 12(2), pages 401-415.
    5. Jianhua Hou & Xiucai Yang & Chaomei Chen, 2018. "Emerging trends and new developments in information science: a document co-citation analysis (2009–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 869-892, May.
    6. Lafond, Francois, 2012. "Learning and the structure of citation networks," MERIT Working Papers 2012-071, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    7. Javier Ruiz-Castillo, 2013. "The role of statistics in establishing the similarity of citation distributions in a static and a dynamic context," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 173-181, July.
    8. Wang Guizhou & Zhang Si & Yu Tao & Ning Yu, 2021. "A Systematic Overview of Blockchain Research," Journal of Systems Science and Information, De Gruyter, vol. 9(3), pages 205-238, June.
    9. Jianhua Hou, 2017. "Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1437-1452, March.
    10. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    11. Ruiz-Castillo, Javier & Waltman, Ludo, 2015. "Field-normalized citation impact indicators using algorithmically constructed classification systems of science," Journal of Informetrics, Elsevier, vol. 9(1), pages 102-117.
    12. S. R. Goldberg & H. Anthony & T. S. Evans, 2015. "Modelling citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1577-1604, December.
    13. Bornmann, Lutz & Leydesdorff, Loet, 2017. "Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data," Journal of Informetrics, Elsevier, vol. 11(1), pages 164-175.
    14. Neus Herranz & Javier Ruiz-Castillo, 2013. "The end of the “European Paradox”," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 453-464, April.
    15. Boyack, Kevin W. & Klavans, Richard, 2014. "Including cited non-source items in a large-scale map of science: What difference does it make?," Journal of Informetrics, Elsevier, vol. 8(3), pages 569-580.
    16. Andrea Bonaccorsi & Cinzia Daraio & Stefano Fantoni & Viola Folli & Marco Leonetti & Giancarlo Ruocco, 2017. "Do social sciences and humanities behave like life and hard sciences?," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 607-653, July.
    17. Ruiz-Castillo, Javier & Costas, Rodrigo, 2014. "The skewness of scientific productivity," Journal of Informetrics, Elsevier, vol. 8(4), pages 917-934.
    18. Gordon Rogers & Martin Szomszor & Jonathan Adams, 2020. "Sample size in bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 777-794, October.
    19. Herranz, Neus & Ruiz-Castillo, Javier, 2012. "Sub-field normalization in the multiplicative case: Average-based citation indicators," Journal of Informetrics, Elsevier, vol. 6(4), pages 543-556.
    20. Antonio Perianes-Rodriguez & Javier Ruiz-Castillo, 2016. "University citation distributions," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(11), pages 2790-2804, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:96:y:2013:i:1:d:10.1007_s11192-012-0900-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.