IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v15y2021i3s1751157721000134.html
   My bibliography  Save this article

A sensitivity analysis of factors influential to the popularity of shared data in data repositories

Author

Listed:
  • Xie, Qing
  • Wang, Jiamin
  • Kim, Giyeong
  • Lee, Soobin
  • Song, Min

Abstract

With their rapid development, data repositories usually provide abundant metadata—including data types, keywords, downloads, stars, forks, and citations—along with the data content. These rich metadata can be used as valuable resources to study the factors that facilitate data sharing. However, few previous studies have attempted to study which metadata are correlated with the popularity of data. This study overcomes these issues by extracting the major factors for each dataset from a well-known data repository, the UCI Machine Learning Repository, and a popular open-source software repository, GitHub. We trained a neural network model and measured the influence of these features on quantified popularity metrics using the weight product of connecting neurons. We grouped the UCI factors into two categories (intrinsic and extrinsic) and the GitHub factors into three categories (intrinsic, extrinsic, and web-related) to analyze their influence on popularity at each level. The quantified influence was used to predict the popularity of the data or software. We conducted a statistical analysis to explore the relationship between these factors and popularity with five different domains (life sciences, physical sciences, computer science/engineering, social sciences, and others) for the UCI repository. This study’s findings contribute to understanding the factors that affect the popularity of open datasets or software for providing guidance on data sharing, reuse, and organization.

Suggested Citation

  • Xie, Qing & Wang, Jiamin & Kim, Giyeong & Lee, Soobin & Song, Min, 2021. "A sensitivity analysis of factors influential to the popularity of shared data in data repositories," Journal of Informetrics, Elsevier, vol. 15(3).
  • Handle: RePEc:eee:infome:v:15:y:2021:i:3:s1751157721000134
    DOI: 10.1016/j.joi.2021.101142
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157721000134
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2021.101142?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lachance, Christian & Larivière, Vincent, 2014. "On the citation lifecycle of papers with delayed recognition," Journal of Informetrics, Elsevier, vol. 8(4), pages 863-872.
    2. Lutz Bornmann, 2013. "What is societal impact of research and how can it be assessed? a literature survey," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(2), pages 217-233, February.
    3. Isabella Peters & Peter Kraker & Elisabeth Lex & Christian Gumpenberger & Juan Gorraiz, 2016. "Research data explored: an extended analysis of citations and altmetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 723-744, May.
    4. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    5. Heather A Piwowar & Roger S Day & Douglas B Fridsma, 2007. "Sharing Detailed Research Data Is Associated with Increased Citation Rate," PLOS ONE, Public Library of Science, vol. 2(3), pages 1-5, March.
    6. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    7. Bornmann, Lutz & Williams, Richard, 2013. "How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects," Journal of Informetrics, Elsevier, vol. 7(2), pages 562-574.
    8. Gianmaria Silvello, 2018. "Theory and practice of data citation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(1), pages 6-20, January.
    9. Ixchel M. Faniel & Adam Kriesberg & Elizabeth Yakel, 2016. "Social scientists' satisfaction with data reuse," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(6), pages 1404-1416, June.
    10. Fatemeh Rostami & Asghar Mohammadpoorasl & Mohammad Hajizadeh, 2014. "The effect of characteristics of title on citation rates of articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 2007-2010, March.
    11. Björn Hammarfelt, 2014. "Using altmetrics for assessing research impact in the humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1419-1430, November.
    12. Lawrence D. Fu & Constantin F. Aliferis, 2010. "Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(1), pages 257-270, October.
    13. Virginia Gewin, 2016. "Data sharing: An open mind on open data," Nature, Nature, vol. 529(7584), pages 117-119, January.
    14. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    15. Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
    16. Andrea Saltelli, 2002. "Sensitivity Analysis for Importance Assessment," Risk Analysis, John Wiley & Sons, vol. 22(3), pages 579-590, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    2. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    3. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    4. Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
    5. Marco Gatti, 2018. "The Impact of Management Accounting Research: An Analysis of the Past and a Look at the Future," International Journal of Business and Management, Canadian Center of Science and Education, vol. 13(5), pages 1-47, March.
    6. Barbara McGillivray & Paola Marongiu & Nilo Pedrazzini & Marton Ribary & Mandy Wigdorowitz & Eleonora Zordan, 2022. "Deep Impact: A Study on the Impact of Data Papers and Datasets in the Humanities and Social Sciences," Publications, MDPI, vol. 10(4), pages 1-40, October.
    7. Hou, Jianhua & Wang, Dongyi & Li, Jing, 2022. "A new method for measuring the originality of academic articles based on knowledge units in semantic networks," Journal of Informetrics, Elsevier, vol. 16(3).
    8. Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.
    9. Jianhua Hou & Bili Zheng & Yang Zhang & Chaomei Chen, 2021. "How do Price medalists’ scholarly impact change before and after their awards?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5945-5981, July.
    10. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    11. Mingyang Wang & Shi Li & Guangsheng Chen, 2017. "Detecting latent referential articles based on their vitality performance in the latest 2 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1557-1571, September.
    12. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    13. Ying Guo & Xiantao Xiao, 2022. "Author-level altmetrics for the evaluation of Chinese scholars," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 973-990, February.
    14. Jianhua Hou & Xiucai Yang & Yang Zhang, 2023. "The effect of social media knowledge cascade: an analysis of scientific papers diffusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5169-5195, September.
    15. Isidro F. Aguillo, 2020. "Altmetrics of the Open Access Institutional Repositories: a webometrics approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(3), pages 1181-1192, June.
    16. Lanu Kim & Jason H. Portenoy & Jevin D. West & Katherine W. Stovel, 2020. "Scientific journals still matter in the era of academic search engines and preprint archives," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(10), pages 1218-1226, October.
    17. Chieh Liu & Mu-Hsuan Huang, 2022. "Exploring the relationships between altmetric counts and citations of papers in different academic fields based on co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4939-4958, August.
    18. Daniel Torres-Salinas & Nicolás Robinson-Garcia & Juan Gorraiz, 2017. "Filling the citation gap: measuring the multidimensional impact of the academic book at institutional level with PlumX," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1371-1384, December.
    19. Shahzad, Murtuza & Alhoori, Hamed & Freedman, Reva & Rahman, Shaikh Abdul, 2022. "Quantifying the online long-term interest in research," Journal of Informetrics, Elsevier, vol. 16(2).
    20. Wesley Mendes-Da-Silva, 2018. "The Promotion of Transparency and the Impact of Research on Business," RAC - Revista de Administração Contemporânea (Journal of Contemporary Administration), ANPAD - Associação Nacional de Pós-Graduação e Pesquisa em Administração, vol. 22(4), pages 639-649.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:15:y:2021:i:3:s1751157721000134. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.