IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v24y2013i3p802-821.html
   My bibliography  Save this article

Status Locality on the Web: Implications for Building Focused Collections

Author

Listed:
  • Gautam Pant

    (Department of Management Sciences, The University of Iowa, Iowa City, Iowa 52242)

  • Padmini Srinivasan

    (Department of Computer Science, The University of Iowa, Iowa City, Iowa 52242)

Abstract

Topical locality on the Web is the notion that pages tend to link to other topically similar pages and that such similarity decays rapidly with link distance. This supports meaningful Web browsing and searching by information consumers. It also allows topical Web crawlers, programs that fetch pages by following hyperlinks, to harvest topical subsets of the Web for applications such as those in vertical search and business intelligence. We show that the Web exhibits another property that we call “status locality.” It is based on the notion that pages tend to link to other pages of similar status (importance) and that this status similarity also decays rapidly with link distance. Analogous to topical locality, status locality may also be exploited by Web crawlers. Collections built by such crawlers include pages that are both topically relevant and also important. This capability is crucial because of the large numbers of Web pages addressing even niche topics. The challenge in exploiting status locality while crawling is that page importance (or status ) is typically recognized through global measures computed by processing link data from billion of pages. In contrast, topical Web crawlers depend on local information based on previously downloaded pages. We solve this problem by using methods developed previously that utilize local characteristics of pages to estimate their global status. This leads to the design of new crawlers, specifically of utility-biased crawlers guided by a Cobb-Douglas utility function. Our crawler experiments show that status and topicality of Web collections present a trade-off. An adaptive version of our utility-biased crawler dynamically modifies output elasticities of topicality and status to create Web collections that maintain high average topicality. This can be done while simultaneously achieving significantly higher average status as compared to several benchmarks including a state-of-the-art topical crawler.

Suggested Citation

  • Gautam Pant & Padmini Srinivasan, 2013. "Status Locality on the Web: Implications for Building Focused Collections," Information Systems Research, INFORMS, vol. 24(3), pages 802-821, September.
  • Handle: RePEc:inm:orisre:v:24:y:2013:i:3:p:802-821
    DOI: 10.1287/isre.1120.0457
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.1120.0457
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.1120.0457?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Réka Albert & Hawoong Jeong & Albert-László Barabási, 1999. "Diameter of the World-Wide Web," Nature, Nature, vol. 401(6749), pages 130-131, September.
    2. Gautam Pant & Padmini Srinivasan, 2010. "Predicting Web Page Status," Information Systems Research, INFORMS, vol. 21(2), pages 345-364, June.
    3. Filippo Menczer, 2004. "Lexical and semantic clustering by Web links," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(14), pages 1261-1269, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cobeña, Mar & Gallego, à ngeles & Casanueva, Cristóbal, 2019. "Diversity in airline alliance portfolio configuration," Journal of Air Transport Management, Elsevier, vol. 75(C), pages 16-26.
    2. repec:osf:osfxxx:yjbd7_v1 is not listed on IDEAS
    3. Yuanyang Liu & Gautam Pant & Olivia R. L. Sheng, 2020. "Predicting Labor Market Competition: Leveraging Interfirm Network and Employee Skills," Information Systems Research, INFORMS, vol. 31(4), pages 1443-1466, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohd-Zaid, Fairul & Kabban, Christine M. Schubert & Deckro, Richard F. & White, Edward D., 2017. "Parameter specification for the degree distribution of simulated Barabási–Albert graphs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 465(C), pages 141-152.
    2. He, He & Yang, Bo & Hu, Xiaoming, 2016. "Exploring community structure in networks by consensus dynamics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 450(C), pages 342-353.
    3. Zheng, Mingbo & Zhang, Xinyu, 2025. "Digitalization and renewable energy development: Analysis based on cross-country panel data," Energy, Elsevier, vol. 319(C).
    4. Elias Carroni & Paolo Pin & Simone Righi, 2020. "Bring a Friend! Privately or Publicly?," Management Science, INFORMS, vol. 66(5), pages 2269-2290, May.
    5. Duan, Shuyu & Wen, Tao & Jiang, Wen, 2019. "A new information dimension of complex network based on Rényi entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 516(C), pages 529-542.
    6. Baek, Seung Ki & Kim, Tae Young & Kim, Beom Jun, 2008. "Testing a priority-based queue model with Linux command histories," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(14), pages 3660-3668.
    7. Freddy Hern�n Cepeda L�pez, 2008. "La topolog�a de redes como herramienta de Seguimiento en el sistema de Pagos de Alto Valor en Colombia," Borradores de Economia 4676, Banco de la Republica.
    8. Chung-Yuan Huang & Chuen-Tsai Sun & Hsun-Cheng Lin, 2005. "Influence of Local Information on Social Simulations in Small-World Network Models," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 8(4), pages 1-8.
    9. Xue Guo & Hu Zhang & Tianhai Tian, 2019. "Multi-Likelihood Methods for Developing Stock Relationship Networks Using Financial Big Data," Papers 1906.08088, arXiv.org.
    10. Chang, Chia-ling & Chen, Shu-heng, 2011. "Interactions in DSGE models: The Boltzmann-Gibbs machine and social networks approach," Economics Discussion Papers 2011-25, Kiel Institute for the World Economy (IfW Kiel).
    11. Lin, Yi & Zhang, Jianwei & Yang, Bo & Liu, Hong & Zhao, Liping, 2019. "An optimal routing strategy for transport networks with minimal transmission cost and high network capacity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 551-561.
    12. Stefano Breschi & Lucia Cusmano, 2002. "Unveiling the Texture of a European Research Area: Emergence of Oligarchic Networks under EU Framework Programmes," KITeS Working Papers 130, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Jul 2002.
    13. He, Xuan & Zhao, Hai & Cai, Wei & Li, Guang-Guang & Pei, Fan-Dong, 2015. "Analyzing the structure of earthquake network by k-core decomposition," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 421(C), pages 34-43.
    14. Huang, Huilin, 2009. "The degree sequences of an asymmetrical growing network," Statistics & Probability Letters, Elsevier, vol. 79(4), pages 420-425, February.
    15. Gianluca Carnabuci, 2013. "The distribution of technological progress," Empirical Economics, Springer, vol. 44(3), pages 1143-1154, June.
    16. Zhengzheng Pan, 2012. "Opinions and Networks: How Do They Effect Each Other," Computational Economics, Springer;Society for Computational Economics, vol. 39(2), pages 157-171, February.
    17. Laurienti, Paul J. & Joyce, Karen E. & Telesford, Qawi K. & Burdette, Jonathan H. & Hayasaka, Satoru, 2011. "Universal fractal scaling of self-organized networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(20), pages 3608-3613.
    18. Chen, Qinghua & Shi, Dinghua, 2004. "The modeling of scale-free networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 335(1), pages 240-248.
    19. Feng Xie & David Levinson, 2009. "Modeling the Growth of Transportation Networks: A Comprehensive Review," Networks and Spatial Economics, Springer, vol. 9(3), pages 291-307, September.
    20. Srayan Datta & Eytan Adar, 2018. "A generative model for scientific concept hierarchies," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-19, February.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:24:y:2013:i:3:p:802-821. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.