IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v107y2016i2d10.1007_s11192-016-1861-1.html
   My bibliography  Save this article

Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers

Author

Listed:
  • Holly N. Wolcott

    (Thomson Reuters)

  • Matthew J. Fouch

    (Thomson Reuters)

  • Elizabeth R. Hsu

    (National Cancer Institute)

  • Leo G. DiJoseph

    (Thomson Reuters)

  • Catherine A. Bernaciak

    (Thomson Reuters)

  • James G. Corrigan

    (National Cancer Institute)

  • Duane E. Williams

    (ÜberResearch)

Abstract

Research funding organizations invest substantial resources to monitor mission-relevant research findings to identify and support promising new lines of inquiry. To that end, we have been pursuing the development of tools to identify research publications that have a strong likelihood of driving new avenues of research. This paper describes our work towards incorporating multiple time-dependent and -independent features of publications into a model to identify candidate breakthrough papers as early as possible following publication. We used multiple random forest models to assess the ability of indicators to reliably distinguish a gold standard set of breakthrough publications as identified by subject matter experts from among a comparison group of similar Thomson Reuters Web of Science™ publications. These indicators were then tested for their predictive value in random forest models. Model parameter optimization and variable selection were used to construct a final model based on indicators that can be measured within 6 months post-publication; the final model had an estimated true positive rate of 0.77 and false positive rate of 0.01.

Suggested Citation

  • Holly N. Wolcott & Matthew J. Fouch & Elizabeth R. Hsu & Leo G. DiJoseph & Catherine A. Bernaciak & James G. Corrigan & Duane E. Williams, 2016. "Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 807-817, May.
  • Handle: RePEc:spr:scient:v:107:y:2016:i:2:d:10.1007_s11192-016-1861-1
    DOI: 10.1007/s11192-016-1861-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-016-1861-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-016-1861-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chaomei Chen, 2006. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 359-377, February.
    2. Cody Dunne & Ben Shneiderman & Robert Gove & Judith Klavans & Bonnie Dorr, 2012. "Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2351-2369, December.
    3. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    4. Henry Small, 2006. "Tracking and predicting growth areas in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 595-610, September.
    5. Cody Dunne & Ben Shneiderman & Robert Gove & Judith Klavans & Bonnie Dorr, 2012. "Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(12), pages 2351-2369, December.
    6. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    7. Ponomarev, Ilya V. & Williams, Duane E. & Hackett, Charles J. & Schnell, Joshua D. & Haak, Laurel L., 2014. "Predicting highly cited papers: A Method for Early Detection of Candidate Breakthroughs," Technological Forecasting and Social Change, Elsevier, vol. 81(C), pages 49-55.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cristian Mejia & Yuya Kajikawa, 2018. "Using acknowledgement data to characterize funding organizations by the types of research sponsored: the case of robotics research," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 883-904, March.
    2. Min, Chao & Bu, Yi & Sun, Jianjun, 2021. "Predicting scientific breakthroughs based on knowledge structure variations," Technological Forecasting and Social Change, Elsevier, vol. 164(C).
    3. Yang, Guancan & Lu, Guoxuan & Xu, Shuo & Chen, Liang & Wen, Yuxin, 2023. "Which type of dynamic indicators should be preferred to predict patent commercial potential?," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    4. Xue Wang & Xuemei Yang & Jian Du & Xuwen Wang & Jiao Li & Xiaoli Tang, 2021. "A deep learning approach for identifying biomedical breakthrough discoveries using context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5531-5549, July.
    5. Li, Xin & Wen, Yang & Jiang, Jiaojiao & Daim, Tugrul & Huang, Lucheng, 2022. "Identifying potential breakthrough research: A machine learning method using scientific papers and Twitter data," Technological Forecasting and Social Change, Elsevier, vol. 184(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zehra Taşkın, 2021. "Forecasting the future of library and information science and its sub-fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1527-1551, February.
    2. Jingwei Han & Zhixiong Tan & Maozhi Chen & Liang Zhao & Ling Yang & Siying Chen, 2022. "Carbon Footprint Research Based on Input–Output Model—A Global Scientometric Visualization Analysis," IJERPH, MDPI, vol. 19(18), pages 1-23, September.
    3. Jianwei Qian & Huawen Shen & Rob Law, 2018. "Research in Sustainable Tourism: A Longitudinal Study of Articles between 2008 and 2017," Sustainability, MDPI, vol. 10(3), pages 1-13, February.
    4. Min, Chao & Bu, Yi & Sun, Jianjun, 2021. "Predicting scientific breakthroughs based on knowledge structure variations," Technological Forecasting and Social Change, Elsevier, vol. 164(C).
    5. Hou, Jianhua & Wang, Dongyi & Li, Jing, 2022. "A new method for measuring the originality of academic articles based on knowledge units in semantic networks," Journal of Informetrics, Elsevier, vol. 16(3).
    6. Wenbing Luo & Ziyan Tian & Shihu Zhong & Qinke Lyu & Mingjun Deng, 2022. "Global Evolution of Research on Sustainable Finance from 2000 to 2021: A Bibliometric Analysis on WoS Database," Sustainability, MDPI, vol. 14(15), pages 1-23, August.
    7. Liang Zhou & Lin Zhang & Ying Zhao & Ruoshu Zheng & Kaiwen Song, 2021. "A scientometric review of blockchain research," Information Systems and e-Business Management, Springer, vol. 19(3), pages 757-787, September.
    8. Qing Ping & Chaomei Chen, 2018. "LitStoryTeller+: an interactive system for multi-level scientific paper visual storytelling with a supportive text mining toolbox," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1887-1944, September.
    9. Chaomei Chen & Zhigang Hu & Jared Milbank & Timothy Schultz, 2013. "A visual analytic study of retracted articles in scientific literature," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(2), pages 234-253, February.
    10. Payam Hanafizadeh & Seyedali Marjaie, 2020. "Trends and turning points of banking: a timespan view," Review of Managerial Science, Springer, vol. 14(6), pages 1183-1219, December.
    11. Lijun Li, 2023. "Big data visualisation in regional comprehensive economic partnership: a systematic review," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-10, December.
    12. Xue Wang & Xuemei Yang & Jian Du & Xuwen Wang & Jiao Li & Xiaoli Tang, 2021. "A deep learning approach for identifying biomedical breakthrough discoveries using context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5531-5549, July.
    13. Carlos Olmeda-Gómez & Carlos Romá-Mateo & Maria-Antonia Ovalle-Perandones, 2019. "Overview of trends in global epigenetic research (2009–2017)," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1545-1574, June.
    14. Gisleine Carmo & Luiz Flávio Felizardo & Valderí Castro Alcântara & Cristiane Aparecida Silva & José Willer Prado, 2023. "The impact of Jürgen Habermas’s scientific production: a scientometric review," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1853-1875, March.
    15. Pin Li & Guoli Yang & Chuanqi Wang, 2019. "Visual topical analysis of library and information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1753-1791, December.
    16. Juan Tang & Yudi Fang & Ziyan Tian & Yinghua Gong & Liang Yuan, 2022. "Ecosystem Services Research in Green Sustainable Science and Technology Field: Trends, Issues, and Future Directions," Sustainability, MDPI, vol. 15(1), pages 1-22, December.
    17. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    18. Yi-Ming Wei & Jin-Wei Wang & Tianqi Chen & Bi-Ying Yu & Hua Liao, 2018. "Frontiers of Low-Carbon Technologies: Results from Bibliographic Coupling with Sliding Window," CEEP-BIT Working Papers 116, Center for Energy and Environmental Policy Research (CEEP), Beijing Institute of Technology.
    19. Xuefeng Wang & Shuo Zhang & Yuqin liu, 2022. "ITGInsight–discovering and visualizing research fronts in the scientific literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6509-6531, November.
    20. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:107:y:2016:i:2:d:10.1007_s11192-016-1861-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.