IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v15y2021i3s1751157721000134.html
   My bibliography  Save this article

A sensitivity analysis of factors influential to the popularity of shared data in data repositories

Author

Listed:
  • Xie, Qing
  • Wang, Jiamin
  • Kim, Giyeong
  • Lee, Soobin
  • Song, Min

Abstract

With their rapid development, data repositories usually provide abundant metadata—including data types, keywords, downloads, stars, forks, and citations—along with the data content. These rich metadata can be used as valuable resources to study the factors that facilitate data sharing. However, few previous studies have attempted to study which metadata are correlated with the popularity of data. This study overcomes these issues by extracting the major factors for each dataset from a well-known data repository, the UCI Machine Learning Repository, and a popular open-source software repository, GitHub. We trained a neural network model and measured the influence of these features on quantified popularity metrics using the weight product of connecting neurons. We grouped the UCI factors into two categories (intrinsic and extrinsic) and the GitHub factors into three categories (intrinsic, extrinsic, and web-related) to analyze their influence on popularity at each level. The quantified influence was used to predict the popularity of the data or software. We conducted a statistical analysis to explore the relationship between these factors and popularity with five different domains (life sciences, physical sciences, computer science/engineering, social sciences, and others) for the UCI repository. This study’s findings contribute to understanding the factors that affect the popularity of open datasets or software for providing guidance on data sharing, reuse, and organization.

Suggested Citation

  • Xie, Qing & Wang, Jiamin & Kim, Giyeong & Lee, Soobin & Song, Min, 2021. "A sensitivity analysis of factors influential to the popularity of shared data in data repositories," Journal of Informetrics, Elsevier, vol. 15(3).
  • Handle: RePEc:eee:infome:v:15:y:2021:i:3:s1751157721000134
    DOI: 10.1016/j.joi.2021.101142
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157721000134
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2021.101142?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Bornmann, Lutz & Williams, Richard, 2013. "How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects," Journal of Informetrics, Elsevier, vol. 7(2), pages 562-574.
    2. Gianmaria Silvello, 2018. "Theory and practice of data citation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(1), pages 6-20, January.
    3. Lachance, Christian & Larivière, Vincent, 2014. "On the citation lifecycle of papers with delayed recognition," Journal of Informetrics, Elsevier, vol. 8(4), pages 863-872.
    4. Ixchel M. Faniel & Adam Kriesberg & Elizabeth Yakel, 2016. "Social scientists' satisfaction with data reuse," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(6), pages 1404-1416, June.
    5. Lutz Bornmann, 2013. "What is societal impact of research and how can it be assessed? a literature survey," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(2), pages 217-233, February.
    6. Isabella Peters & Peter Kraker & Elisabeth Lex & Christian Gumpenberger & Juan Gorraiz, 2016. "Research data explored: an extended analysis of citations and altmetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 723-744, May.
    7. Fatemeh Rostami & Asghar Mohammadpoorasl & Mohammad Hajizadeh, 2014. "The effect of characteristics of title on citation rates of articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 2007-2010, March.
    8. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    9. Heather A Piwowar & Roger S Day & Douglas B Fridsma, 2007. "Sharing Detailed Research Data Is Associated with Increased Citation Rate," PLOS ONE, Public Library of Science, vol. 2(3), pages 1-5, March.
    10. Björn Hammarfelt, 2014. "Using altmetrics for assessing research impact in the humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1419-1430, November.
    11. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    12. Lawrence D. Fu & Constantin F. Aliferis, 2010. "Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(1), pages 257-270, October.
    13. Virginia Gewin, 2016. "Data sharing: An open mind on open data," Nature, Nature, vol. 529(7584), pages 117-119, January.
    14. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    15. Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
    16. Andrea Saltelli, 2002. "Sensitivity Analysis for Importance Assessment," Risk Analysis, John Wiley & Sons, vol. 22(3), pages 579-590, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Heo, Go Eun & Ko, Young Soo & Xie, Qing & Song, Min, 2023. "High acknowledgement index: Characterizing research supporters with factors of acknowledgement affecting paper citation counts," Journal of Informetrics, Elsevier, vol. 17(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    2. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    3. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    4. Ruonan Cai & Wencan Tian & Rundong Luo & Zhichao Fang & Zhigang Hu, 2025. "Do articles with multiple corresponding authorships have a citation advantage? A double machine learning analysis approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(5), pages 2523-2550, May.
    5. Jianhua Hou & Bili Zheng & Yang Zhang & Chaomei Chen, 2021. "How do Price medalists’ scholarly impact change before and after their awards?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5945-5981, July.
    6. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    7. Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
    8. Marco Gatti, 2018. "The Impact of Management Accounting Research: An Analysis of the Past and a Look at the Future," International Journal of Business and Management, Canadian Center of Science and Education, vol. 13(5), pages 1-47, March.
    9. Rafał Zbonikowski, 2025. "What influences the number of citations of scientific articles? Study on colloid and interface science," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(5), pages 2577-2593, May.
    10. Barbara McGillivray & Paola Marongiu & Nilo Pedrazzini & Marton Ribary & Mandy Wigdorowitz & Eleonora Zordan, 2022. "Deep Impact: A Study on the Impact of Data Papers and Datasets in the Humanities and Social Sciences," Publications, MDPI, vol. 10(4), pages 1-40, October.
    11. Mingyang Wang & Shi Li & Guangsheng Chen, 2017. "Detecting latent referential articles based on their vitality performance in the latest 2 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1557-1571, September.
    12. Hou, Jianhua & Wang, Dongyi & Li, Jing, 2022. "A new method for measuring the originality of academic articles based on knowledge units in semantic networks," Journal of Informetrics, Elsevier, vol. 16(3).
    13. Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.
    14. Liwei Zhang & Liang Ma, 2023. "Is open science a double-edged sword?: data sharing and the changing citation pattern of Chinese economics articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2803-2818, May.
    15. Daniela Filippo & Pablo Sastrón-Toledo, 2023. "Influence of research on open science in the public policy sphere," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1995-2017, March.
    16. Bornmann, Lutz, 2014. "Validity of altmetrics data for measuring societal impact: A study using data from Altmetric and F1000Prime," Journal of Informetrics, Elsevier, vol. 8(4), pages 935-950.
    17. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    18. Copiello, Sergio, 2019. "Peer and neighborhood effects: Citation analysis using a spatial autoregressive model and pseudo-spatial data," Journal of Informetrics, Elsevier, vol. 13(1), pages 238-254.
    19. Ying Guo & Xiantao Xiao, 2022. "Author-level altmetrics for the evaluation of Chinese scholars," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 973-990, February.
    20. Zhichao Fang & Rodrigo Costas & Wencan Tian & Xianwen Wang & Paul Wouters, 2020. "An extensive analysis of the presence of altmetric data for Web of Science publications across subject fields and research topics," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2519-2549, September.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:15:y:2021:i:3:s1751157721000134. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.