IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0004803.html
   My bibliography  Save this article

Clickstream Data Yields High-Resolution Maps of Science

Author

Listed:
  • Johan Bollen
  • Herbert Van de Sompel
  • Aric Hagberg
  • Luis Bettencourt
  • Ryan Chute
  • Marko A Rodriguez
  • Lyudmila Balakireva

Abstract

Background: Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science. Methodology: Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences. Conclusions: Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.

Suggested Citation

  • Johan Bollen & Herbert Van de Sompel & Aric Hagberg & Luis Bettencourt & Ryan Chute & Marko A Rodriguez & Lyudmila Balakireva, 2009. "Clickstream Data Yields High-Resolution Maps of Science," PLOS ONE, Public Library of Science, vol. 4(3), pages 1-11, March.
  • Handle: RePEc:plo:pone00:0004803
    DOI: 10.1371/journal.pone.0004803
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0004803
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0004803&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0004803?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Leo Egghe & Ronald Rousseau, 2000. "The influence of publication delays on the observed aging distribution of scientific literature," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(2), pages 158-165.
    2. Kevin W. Boyack & Brian N. Wylie & George S. Davidson, 2002. "Domain visualization using VxInsight® for science and technology management," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 53(9), pages 764-774.
    3. Tim Brody & Stevan Harnad & Leslie Carr, 2006. "Earlier Web usage statistics as predictors of later citation impact," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(8), pages 1060-1072, June.
    4. Luís M. A. Bettencourt & David I. Kaiser & Jasleen Kaur & Carlos Castillo-Chávez & David E. Wojick, 2008. "Population modeling of the emergence and development of scientific fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 75(3), pages 495-518, June.
    5. Johan Bollen & Herbert van de Sompel, 2006. "Mapping the structure of science through usage," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(2), pages 227-258, November.
    6. Kevin W. Boyack & Richard Klavans & Katy Börner, 2005. "Mapping the backbone of science," Scientometrics, Springer;Akadémiai Kiadó, vol. 64(3), pages 351-374, August.
    7. Philip M. Davis & Jason S. Price, 2006. "eJournal interface can influence usage statistics: Implications for libraries, publishers, and Project COUNTER," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(9), pages 1243-1248, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Johan Bollen & Herbert Van de Sompel & Aric Hagberg & Ryan Chute, 2009. "A Principal Component Analysis of 39 Scientific Impact Measures," PLOS ONE, Public Library of Science, vol. 4(6), pages 1-11, June.
    2. Nees Jan Eck & Ludo Waltman, 2010. "Software survey: VOSviewer, a computer program for bibliometric mapping," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(2), pages 523-538, August.
    3. Bollen, Johan & Fox, Geoffrey & Singhal, Prashant Raj, 2011. "How and where the TeraGrid supercomputing infrastructure benefits science," Journal of Informetrics, Elsevier, vol. 5(1), pages 114-121.
    4. Andrea Bonaccorsi & Filippo Chiarello & Gualtiero Fantoni, 2021. "Impact for whom? Mapping the users of public research with lexicon-based text mining," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1745-1774, February.
    5. John Hudson, 2017. "Identifying economics’ place amongst academic disciplines: a science or a social science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(2), pages 735-750, November.
    6. Xiaolin Shi & Lada A Adamic & Belle L Tseng & Gavin S Clarkson, 2009. "The Impact of Boundary Spanning Scholarly Publications and Patents," PLOS ONE, Public Library of Science, vol. 4(8), pages 1-7, August.
    7. Dietmar Wolfram, 2015. "The symbiotic relationship between information retrieval and informetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2201-2214, March.
    8. Boyack, Kevin W. & Klavans, Richard, 2014. "Including cited non-source items in a large-scale map of science: What difference does it make?," Journal of Informetrics, Elsevier, vol. 8(3), pages 569-580.
    9. R. Basurto-Flores & L. Guzmán-Vargas & S. Velasco & A. Medina & A. Calvo Hernandez, 2018. "On entropy research analysis: cross-disciplinary knowledge transfer," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 123-139, October.
    10. Ismael Rafols & Alan Porter & Loet Leydesdorff, 2009. "Overlay Maps of Science: a New Tool for Research Policy," SPRU Working Paper Series 179, SPRU - Science Policy Research Unit, University of Sussex Business School.
    11. Cameron Neylon & Shirley Wu, 2009. "Article-Level Metrics and the Evolution of Scientific Impact," PLOS Biology, Public Library of Science, vol. 7(11), pages 1-6, November.
    12. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    13. Silva, F.N. & Viana, M.P. & Travençolo, B.A.N. & Costa, L. da F., 2011. "Investigating relationships within and between category networks in Wikipedia," Journal of Informetrics, Elsevier, vol. 5(3), pages 431-438.
    14. Kraker, Peter & Schlögl, Christian & Jack, Kris & Lindstaedt, Stefanie, 2015. "Visualization of co-readership patterns from an online reference management system," Journal of Informetrics, Elsevier, vol. 9(1), pages 169-182.
    15. Mingers, John & Leydesdorff, Loet, 2015. "A review of theory and practice in scientometrics," European Journal of Operational Research, Elsevier, vol. 246(1), pages 1-19.
    16. Leydesdorff, Loet & Rafols, Ismael, 2011. "Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations," Journal of Informetrics, Elsevier, vol. 5(1), pages 87-100.
    17. Goldman, Alyssa W., 2014. "Conceptualizing the interdisciplinary diffusion and evolution of emerging fields: The case of systems biology," Journal of Informetrics, Elsevier, vol. 8(1), pages 43-58.
    18. Wenyuan Liu & Andrea Nanetti & Siew Ann Cheong, 2017. "Knowledge evolution in physics research: An analysis of bibliographic coupling networks," PLOS ONE, Public Library of Science, vol. 12(9), pages 1-19, September.
    19. Koon-Kiu Yan & Mark Gerstein, 2011. "The Spread of Scientific Information: Insights from the Web Usage Statistics in PLoS Article-Level Metrics," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-7, May.
    20. Andrew Kirby, 2015. "The Challenges of Journal Startup in the Digital Era," Publications, MDPI, vol. 3(4), pages 1-13, September.
    21. Ana Teresa Santos & Sandro Mendonça, 2022. "Do papers (really) match journals’ “aims and scope”? A computational assessment of innovation studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7449-7470, December.
    22. Miguel R. Guevara & Dominik Hartmann & Manuel Aristarán & Marcelo Mendoza & César A. Hidalgo, 2016. "The research space: using career paths to predict the evolution of the research output of individuals, institutions, and nations," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1695-1709, December.
    23. Xin Shuai & Alberto Pepe & Johan Bollen, 2012. "How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-8, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    2. Kraker, Peter & Schlögl, Christian & Jack, Kris & Lindstaedt, Stefanie, 2015. "Visualization of co-readership patterns from an online reference management system," Journal of Informetrics, Elsevier, vol. 9(1), pages 169-182.
    3. John McLevey & Alexander V. Graham & Reid McIlroy-Young & Pierson Browne & Kathryn S. Plaisance, 2018. "Interdisciplinarity and insularity in the diffusion of knowledge: an analysis of disciplinary boundaries between philosophy of science and the sciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 331-349, October.
    4. Jimi Adams & Ryan Light, 2014. "Mapping Interdisciplinary Fields: Efficiencies, Gaps and Redundancies in HIV/AIDS Research," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-13, December.
    5. Xianwen Wang & Zhichao Fang & Xiaoling Sun, 2016. "Usage patterns of scholarly articles on Web of Science: a study on Web of Science usage count," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 917-926, November.
    6. Bettencourt, Luís M.A. & Kaiser, David I. & Kaur, Jasleen, 2009. "Scientific discovery and topological transitions in collaboration networks," Journal of Informetrics, Elsevier, vol. 3(3), pages 210-221.
    7. Xianwen Wang & Wenli Mao & Shenmeng Xu & Chunbo Zhang, 2014. "Usage history of scientific literature: Nature metrics and metrics of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 1923-1933, March.
    8. Wood-Doughty, Alex & Bergstrom, Ted & Steigerwald, Douglas, 2017. "Do download reports reliably measure journal usage? Trusting the fox to count your Hens?," University of California at Santa Barbara, Economics Working Paper Series qt1f221007, Department of Economics, UC Santa Barbara.
    9. Ryan Light & jimi adams, 2016. "Knowledge in motion: the evolution of HIV/AIDS research," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1227-1248, June.
    10. Bikun Chen, 2018. "Usage pattern comparison of the same scholarly articles between Web of Science (WoS) and Springer," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 519-537, April.
    11. Ehsan Mohammadi, 2012. "Knowledge mapping of the Iranian nanoscience and technology: a text mining approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(3), pages 593-608, September.
    12. Kiss, Istvan Z. & Broom, Mark & Craze, Paul G. & Rafols, Ismael, 2010. "Can epidemic models describe the diffusion of topics across disciplines?," Journal of Informetrics, Elsevier, vol. 4(1), pages 74-82.
    13. Stanley D. Brunn, 2014. "Cyberspace Knowledge Gaps and Boundaries in Sustainability Science: Topics, Regions, Editorial Teams and Journals," Sustainability, MDPI, vol. 6(10), pages 1-28, September.
    14. Balland, Pierre-Alexandre & Boschma, Ron, 2022. "Do scientific capabilities in specific domains matter for technological diversification in European regions?," Research Policy, Elsevier, vol. 51(10).
    15. Takahiro Kawamura & Katsutaro Watanabe & Naoya Matsumoto & Shusaku Egami & Mari Jibu, 2018. "Funding map using paragraph embedding based on semantic diversity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 941-958, August.
    16. Andreas Bjurström & Merritt Polk, 2011. "Climate change and interdisciplinarity: a co-citation analysis of IPCC Third Assessment Report," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(3), pages 525-550, June.
    17. Citron, Daniel T. & Way, Samuel F., 2018. "Network assembly of scientific communities of varying size and specificity," Journal of Informetrics, Elsevier, vol. 12(1), pages 181-190.
    18. Xuefeng Wang & Huichao Ren & Yun Chen & Yuqin Liu & Yali Qiao & Ying Huang, 2019. "Measuring patent similarity with SAO semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 1-23, October.
    19. Barbara McGillivray & Mathias Astell, 2019. "The relationship between usage and citations in an open access mega-journal," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 817-838, November.
    20. Giovanni Abramo & Ciriaco Andrea D'Angelo & Flavia Costa, 2012. "Identifying interdisciplinarity through the disciplinary classification of coauthors of scientific publications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2206-2222, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0004803. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.