IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2402.05024.html
   My bibliography  Save this paper

Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?

Author

Listed:
  • Yulin Yu
  • Daniel M. Romero

Abstract

Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement. In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. We investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 6,000 datasets curated within one of the largest social science databases, ICPSR. This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, the combination of datasets with atypically combined topics has the opposite effect -- the use of such data is associated with fewer citations. Third, younger and less experienced research teams tend to use atypical combinations of datasets in research at a higher frequency than their older and more experienced counterparts. Lastly, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an under-utilized but powerful strategy correlated with the scientific and broader impact of scientific discoveries.

Suggested Citation

  • Yulin Yu & Daniel M. Romero, 2024. "Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?," Papers 2402.05024, arXiv.org, revised Feb 2024.
  • Handle: RePEc:arx:papers:2402.05024
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2402.05024
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Benjamin F. Jones, 2009. "The Burden of Knowledge and the "Death of the Renaissance Man": Is Innovation Getting Harder?," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 76(1), pages 283-317.
    2. Lingfei Wu & Dashun Wang & James A. Evans, 2019. "Large teams develop and small teams disrupt science and technology," Nature, Nature, vol. 566(7744), pages 378-382, February.
    3. Wang, Jian & Veugelers, Reinhilde & Stephan, Paula, 2017. "Bias against novelty in science: A cautionary tale for users of bibliometric indicators," Research Policy, Elsevier, vol. 46(8), pages 1416-1436.
    4. Sotaro Shibayama & Deyun Yin & Kuniko Matsumoto, 2021. "Measuring novelty in science with word embedding," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-16, July.
    5. Erin Leahey & Jina Lee & Russell J. Funk, 2023. "What Types of Novelty Are Most Disruptive?," American Sociological Review, , vol. 88(3), pages 562-597, June.
    6. Jelte M Wicherts & Marjan Bakker & Dylan Molenaar, 2011. "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-7, November.
    7. Bornmann, Lutz & Tekles, Alexander & Zhang, Helena H. & Ye, Fred Y., 2019. "Do we measure novelty when we analyze unusual combinations of cited references? A validation study of bibliometric novelty indicators based on F1000Prime data," Journal of Informetrics, Elsevier, vol. 13(4).
    8. Benedikt Fecher & Sascha Friesike & Marcel Hebing, 2015. "What Drives Academic Data Sharing?," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-25, February.
    9. Andreoli-Versbach, Patrick & Mueller-Langer, Frank, 2014. "Open access to data: An ideal professed but not practised," Research Policy, Elsevier, vol. 43(9), pages 1621-1633.
    10. Alan L. Porter & Ismael Rafols, 2009. "Is science becoming more interdisciplinary? Measuring and mapping six research fields over time," Scientometrics, Springer;Akadémiai Kiadó, vol. 81(3), pages 719-745, December.
    11. Lee, You-Na & Walsh, John P. & Wang, Jian, 2015. "Creativity in scientific teams: Unpacking novelty and impact," Research Policy, Elsevier, vol. 44(3), pages 684-697.
    12. Virginia Gewin, 2016. "Data sharing: An open mind on open data," Nature, Nature, vol. 529(7584), pages 117-119, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dongqing Lyu & Kaile Gong & Xuanmin Ruan & Ying Cheng & Jiang Li, 2021. "Does research collaboration influence the “disruption” of articles? Evidence from neurosciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 287-303, January.
    2. Haeussler, Carolin & Sauermann, Henry, 2020. "Division of labor in collaborative knowledge production: The role of team size and interdisciplinarity," Research Policy, Elsevier, vol. 49(6).
    3. Andrea Bonaccorsi & Nicola Melluso & Francesco Alessandro Massucci, 2022. "Exploring the antecedents of interdisciplinarity at the European Research Council: a topic modeling approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 6961-6991, December.
    4. Libo Sheng & Dongqing Lyu & Xuanmin Ruan & Hongquan Shen & Ying Cheng, 2023. "The association between prior knowledge and the disruption of an article," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4731-4751, August.
    5. KWON Seokbeom & MOTOHASHI Kazuyuki, 2020. "Incentive or Disincentive for Disclosure of Research Data? A Large-Scale Empirical Analysis and Implications for Open Science Policy," Discussion papers 20058, Research Institute of Economy, Trade and Industry (RIETI).
    6. Fontana, Magda & Iori, Martina & Montobbio, Fabio & Sinatra, Roberta, 2020. "New and atypical combinations: An assessment of novelty and interdisciplinarity," Research Policy, Elsevier, vol. 49(7).
    7. Guoqiang Liang & Ying Lou & Haiyan Hou, 2022. "Revisiting the disruptive index: evidence from the Nobel Prize-winning articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 5721-5730, October.
    8. Stefano Bianchini & Moritz Müller & Pierre Pelletier, 2022. "Artificial intelligence in science: An emerging general method of invention," Post-Print hal-03958025, HAL.
    9. Banal-Estañol, Albert & Macho-Stadler, Inés & Pérez-Castrillo, David, 2019. "Evaluation in research funding agencies: Are structurally diverse teams biased against?," Research Policy, Elsevier, vol. 48(7), pages 1823-1840.
    10. Sam Arts & Nicola Melluso & Reinhilde Veugelers, 2023. "Beyond Citations: Measuring Novel Scientific Ideas and their Impact in Publication Text," Papers 2309.16437, arXiv.org, revised Nov 2023.
    11. Zhentao Liang & Jin Mao & Gang Li, 2023. "Bias against scientific novelty: A prepublication perspective," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 99-114, January.
    12. Gold, E. Richard, 2021. "The fall of the innovation empire and its possible rise through open science," Research Policy, Elsevier, vol. 50(5).
    13. Zhu, Nibing & Liu, Chang & Yang, Zhilin, 2021. "Team Size, Research Variety, and Research Performance: Do Coauthors’ Coauthors Matter?," Journal of Informetrics, Elsevier, vol. 15(4).
    14. Nicolas Carayol, 2016. "The Right Job and the Job Right: Novelty, Impact and Journal Stratification in Science," Post-Print hal-02274661, HAL.
    15. Fontana, Magda & Iori, Martina & Leone Sciabolazza, Valerio & Souza, Daniel, 2022. "The interdisciplinarity dilemma: Public versus private interests," Research Policy, Elsevier, vol. 51(7).
    16. Shiyun Wang & Yaxue Ma & Jin Mao & Yun Bai & Zhentao Liang & Gang Li, 2023. "Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 150-167, February.
    17. Pierre Pelletier & Kevin Wirtz, 2023. "Sails and Anchors: The Complementarity of Exploratory and Exploitative Scientists in Knowledge Creation," Papers 2312.10476, arXiv.org.
    18. Liu, Meijun & Jaiswal, Ajay & Bu, Yi & Min, Chao & Yang, Sijie & Liu, Zhibo & Acuña, Daniel & Ding, Ying, 2022. "Team formation and team impact: The balance between team freshness and repeat collaboration," Journal of Informetrics, Elsevier, vol. 16(4).
    19. Christian Catalini & Christian Fons-Rosen & Patrick Gaulé, 2020. "How Do Travel Costs Shape Collaboration?," Management Science, INFORMS, vol. 66(8), pages 3340-3360, August.
    20. Hou, Jianhua & Wang, Dongyi & Li, Jing, 2022. "A new method for measuring the originality of academic articles based on knowledge units in semantic networks," Journal of Informetrics, Elsevier, vol. 16(3).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2402.05024. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.