IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v11y2017i1p46-62.html
   My bibliography  Save this article

Discovering discoveries: Identifying biomedical discoveries using citation contexts

Author

Listed:
  • Small, Henry
  • Tseng, Hung
  • Patek, Mike

Abstract

A procedure for identifying discoveries in the biomedical sciences is described that makes use of citation context information, or more precisely citing sentences, drawn from the PubMed Central database. The procedure focuses on use of specific terms in the citing sentences and the joint appearance of cited references. After a manual screening process to remove non-discoveries, a list of over 100 discoveries and their associated articles is compiled and characterized by subject matter and by type of discovery. The phenomenon of multiple discovery is shown to play an important role. The onset and timing of recognition of the articles are studied by comparing the number of citing sentences with and without discovery terms, and show both early onset and delays in recognition. A comparative analysis of the vocabularies of the discovery and non-discovery sentences reveals the types of words and concepts that scientists associate with discoveries. A machine learning application is used to efficiently extend the list. Implications of the findings for understanding the nature and justification of scientific discoveries are discussed.

Suggested Citation

  • Small, Henry & Tseng, Hung & Patek, Mike, 2017. "Discovering discoveries: Identifying biomedical discoveries using citation contexts," Journal of Informetrics, Elsevier, vol. 11(1), pages 46-62.
  • Handle: RePEc:eee:infome:v:11:y:2017:i:1:p:46-62
    DOI: 10.1016/j.joi.2016.11.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157716302255
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2016.11.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Johan Blomquist & Joakim Westerlund, 2016. "Panel bootstrap tests of slope homogeneity," Empirical Economics, Springer, vol. 50(4), pages 1359-1381, June.
    2. ., 2016. "A few more general conclusions," Chapters, in: Rethinking Corporate Governance, chapter 17, pages 553-561, Edward Elgar Publishing.
    3. Kevin W. Boyack & Henry Small & Richard Klavans, 2013. "Improving the accuracy of co-citation clustering using full text," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(9), pages 1759-1767, September.
    4. Lin, Zhiting & Wu, Xiaoqing & He, Dongyue & Zhu, Qiang & Ni, Jixiang, 2016. "Analyzing and modeling heterogeneous behavior," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 450(C), pages 287-293.
    5. Nihad Aliyev & Xue-Zhong He, 2016. "Toward a General Model of Financial Markets," Research Paper Series 371, Quantitative Finance Research Centre, University of Technology, Sydney.
    6. Bradford Demarest & Cassidy R. Sugimoto, 2015. "Argue, observe, assess: Measuring disciplinary identities and differences through socio-epistemic discourse," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(7), pages 1374-1387, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    2. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    3. Xu, Jianguo & Guo, Lixiang & Jiang, Jiang & Ge, Bingfeng & Li, Mengjun, 2019. "A deep learning methodology for automatic extraction and discovery of technical intelligence," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 339-351.
    4. Peter Kokol & Helena Blažun Vošner & Jernej Završnik, 2020. "Do simultaneous inventions sleep? A case study on nursing sleeping papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2827-2832, December.
    5. Chaker Jebari & Enrique Herrera-Viedma & Manuel Jesus Cobo, 2021. "The use of citation context to detect the evolution of research topics: a large-scale analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 2971-2989, April.
    6. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    7. Lv, Yanhua & Ding, Ying & Song, Min & Duan, Zhiguang, 2018. "Topology-driven trend analysis for drug discovery," Journal of Informetrics, Elsevier, vol. 12(3), pages 893-905.
    8. Small, Henry, 2018. "Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty," Journal of Informetrics, Elsevier, vol. 12(2), pages 461-480.
    9. Lutz Bornmann & K. Brad Wray & Robin Haunschild, 2020. "Citation concept analysis (CCA): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books by Thomas S. K," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1051-1074, February.
    10. Min, Chao & Bu, Yi & Sun, Jianjun, 2021. "Predicting scientific breakthroughs based on knowledge structure variations," Technological Forecasting and Social Change, Elsevier, vol. 164(C).
    11. Coccia, Mario, 2022. "Probability of discoveries between research fields to explain scientific and technological change," Technology in Society, Elsevier, vol. 68(C).
    12. Amber Geurts & Ralph Gutknecht & Philine Warnke & Arjen Goetheer & Elna Schirrmeister & Babette Bakker & Svetlana Meissner, 2022. "New perspectives for data‐supported foresight: The hybrid AI‐expert approach," Futures & Foresight Science, John Wiley & Sons, vol. 4(1), March.
    13. Du, Jian & Li, Peixin & Haunschild, Robin & Sun, Yinan & Tang, Xiaoli, 2020. "Paper-patent citation linkages as early signs for predicting delayed recognized knowledge: Macro and micro evidence," Journal of Informetrics, Elsevier, vol. 14(2).
    14. József Popp & Péter Balogh & Judit Oláh & Sebastian Kot & Mónika Harangi Rákos & Péter Lengyel, 2018. "Social Network Analysis of Scientific Articles Published by Food Policy," Sustainability, MDPI, vol. 10(3), pages 1-20, February.
    15. Henry Small & Kevin W. Boyack & Richard Klavans, 2019. "Citations and certainty: a new interpretation of citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 1079-1092, March.
    16. Zhu, Wanying & Jin, Ching & Ma, Yifang & Xu, Cong, 2023. "Earlier recognition of scientific excellence enhances future achievements and promotes persistence," Journal of Informetrics, Elsevier, vol. 17(2).
    17. Xue Wang & Xuemei Yang & Jian Du & Xuwen Wang & Jiao Li & Xiaoli Tang, 2021. "A deep learning approach for identifying biomedical breakthrough discoveries using context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5531-5549, July.
    18. Lutz Bornmann & Adam Y. Ye & Fred Y. Ye, 2017. "Sequence analysis of annually normalized citation counts: an empirical analysis based on the characteristic scores and scales (CSS) method," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1665-1680, December.
    19. Heng Huang & Donghua Zhu & Xuefeng Wang, 2022. "Evaluating scientific impact of publications: combining citation polarity and purpose," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5257-5281, September.
    20. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    21. Shi, Xuanyu & Du, Jian, 2022. "Distinguishing transformative from incremental clinical evidence: A classifier of clinical research using textual features from abstracts and citing sentences," Journal of Informetrics, Elsevier, vol. 16(2).
    22. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    23. Chuanyi Wang & Fei Guo & Qing Wu, 2021. "The influence of academic advisors on academic network of Physics doctoral students: empirical evidence based on scientometrics analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 4899-4925, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Takahiro Kawamura & Katsutaro Watanabe & Naoya Matsumoto & Shusaku Egami & Mari Jibu, 2018. "Funding map using paragraph embedding based on semantic diversity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 941-958, August.
    2. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    3. Dangzhi Zhao & Andreas Strotmann, 2020. "Telescopic and panoramic views of library and information science research 2011–2018: a comparison of four weighting schemes for author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 255-270, July.
    4. Kim, Ha Jin & Jeong, Yoo Kyung & Song, Min, 2016. "Content- and proximity-based author co-citation analysis using citation sentences," Journal of Informetrics, Elsevier, vol. 10(4), pages 954-966.
    5. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    6. Kun Sun & Haitao Liu & Wenxin Xiong, 2021. "The evolutionary pattern of language in scientific writings: A case study of Philosophical Transactions of Royal Society (1665–1869)," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1695-1724, February.
    7. Shenghui Wang & Rob Koopman, 2017. "Clustering articles based on semantic similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1017-1031, May.
    8. Ali Hortaçsu & Jakub Kastl & Allen Zhang, 2018. "Bid Shading and Bidder Surplus in the US Treasury Auction System," American Economic Review, American Economic Association, vol. 108(1), pages 147-169, January.
    9. Kai Li & Jason Rollins & Erjia Yan, 2018. "Web of Science use in published research and review papers 1997–2017: a selective, dynamic, cross-domain, content-based analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 1-20, April.
    10. Mingyang Wang & Jiaqi Zhang & Guangsheng Chen & Kah-Hin Chai, 2019. "Examining the influence of open access on journals’ citation obsolescence by modeling the actual citation process," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1621-1641, June.
    11. Shutian Ma & Jin Xu & Chengzhi Zhang, 2018. "Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1303-1330, August.
    12. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    13. Yan, Li & Cao, Huiying & Gao, Chao & Wang, Zhen & Li, Xuelong, 2023. "Mining of book-loan behavior based on coupling relationship analysis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 613(C).
    14. Clara Boothby & Staša Milojević, 2021. "An exploratory full-text analysis of Science Careers in a changing academic job market," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4055-4071, May.
    15. Mengyu Yu & Mazie Krehbiel & Samantha Thompson & Tatjana Miljkovic, 2020. "An exploration of gender gap using advanced data science tools: actuarial research community," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 767-789, May.
    16. Al Garni, Hassan Z. & Awasthi, Anjali, 2017. "Solar PV power plant site selection using a GIS-AHP based approach with application in Saudi Arabia," Applied Energy, Elsevier, vol. 206(C), pages 1225-1240.
    17. Confente, Ilenia & Scarpi, Daniele & Russo, Ivan, 2020. "Marketing a new generation of bio-plastics products for a circular economy: The role of green self-identity, self-congruity, and perceived value," Journal of Business Research, Elsevier, vol. 112(C), pages 431-439.
    18. Bikun Chen & Dannan Deng & Zhouyan Zhong & Chengzhi Zhang, 2020. "Exploring linguistic characteristics of highly browsed and downloaded academic articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1769-1790, March.
    19. Newbery, David, 2018. "Shifting demand and supply over time and space to manage intermittent generation: The economics of electrical storage," Energy Policy, Elsevier, vol. 113(C), pages 711-720.
    20. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:11:y:2017:i:1:p:46-62. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.