IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v24y2013i3p802-821.html
   My bibliography  Save this article

Status Locality on the Web: Implications for Building Focused Collections

Author

Listed:
  • Gautam Pant

    (Department of Management Sciences, The University of Iowa, Iowa City, Iowa 52242)

  • Padmini Srinivasan

    (Department of Computer Science, The University of Iowa, Iowa City, Iowa 52242)

Abstract

Topical locality on the Web is the notion that pages tend to link to other topically similar pages and that such similarity decays rapidly with link distance. This supports meaningful Web browsing and searching by information consumers. It also allows topical Web crawlers, programs that fetch pages by following hyperlinks, to harvest topical subsets of the Web for applications such as those in vertical search and business intelligence. We show that the Web exhibits another property that we call “status locality.” It is based on the notion that pages tend to link to other pages of similar status (importance) and that this status similarity also decays rapidly with link distance. Analogous to topical locality, status locality may also be exploited by Web crawlers. Collections built by such crawlers include pages that are both topically relevant and also important. This capability is crucial because of the large numbers of Web pages addressing even niche topics. The challenge in exploiting status locality while crawling is that page importance (or status ) is typically recognized through global measures computed by processing link data from billion of pages. In contrast, topical Web crawlers depend on local information based on previously downloaded pages. We solve this problem by using methods developed previously that utilize local characteristics of pages to estimate their global status. This leads to the design of new crawlers, specifically of utility-biased crawlers guided by a Cobb-Douglas utility function. Our crawler experiments show that status and topicality of Web collections present a trade-off. An adaptive version of our utility-biased crawler dynamically modifies output elasticities of topicality and status to create Web collections that maintain high average topicality. This can be done while simultaneously achieving significantly higher average status as compared to several benchmarks including a state-of-the-art topical crawler.

Suggested Citation

  • Gautam Pant & Padmini Srinivasan, 2013. "Status Locality on the Web: Implications for Building Focused Collections," Information Systems Research, INFORMS, vol. 24(3), pages 802-821, September.
  • Handle: RePEc:inm:orisre:v:24:y:2013:i:3:p:802-821
    DOI: 10.1287/isre.1120.0457
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.1120.0457
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.1120.0457?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Filippo Menczer, 2004. "Lexical and semantic clustering by Web links," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(14), pages 1261-1269, December.
    2. Réka Albert & Hawoong Jeong & Albert-László Barabási, 1999. "Diameter of the World-Wide Web," Nature, Nature, vol. 401(6749), pages 130-131, September.
    3. Gautam Pant & Padmini Srinivasan, 2010. "Predicting Web Page Status," Information Systems Research, INFORMS, vol. 21(2), pages 345-364, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yuanyang Liu & Gautam Pant & Olivia R. L. Sheng, 2020. "Predicting Labor Market Competition: Leveraging Interfirm Network and Employee Skills," Information Systems Research, INFORMS, vol. 31(4), pages 1443-1466, December.
    2. Cobeña, Mar & Gallego, à ngeles & Casanueva, Cristóbal, 2019. "Diversity in airline alliance portfolio configuration," Journal of Air Transport Management, Elsevier, vol. 75(C), pages 16-26.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohd-Zaid, Fairul & Kabban, Christine M. Schubert & Deckro, Richard F. & White, Edward D., 2017. "Parameter specification for the degree distribution of simulated Barabási–Albert graphs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 465(C), pages 141-152.
    2. Chen, Shu-Heng & Chang, Chia-Ling & Wen, Ming-Chang, 2014. "Social networks and macroeconomic stability," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 8, pages 1-40.
    3. Zhang, Wen-Yao & Wei, Zong-Wen & Wang, Bing-Hong & Han, Xiao-Pu, 2016. "Measuring mixing patterns in complex networks by Spearman rank correlation coefficient," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 451(C), pages 440-450.
    4. Pi, Xiaochen & Tang, Longkun & Chen, Xiangzhong, 2021. "A directed weighted scale-free network model with an adaptive evolution mechanism," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 572(C).
    5. He, He & Yang, Bo & Hu, Xiaoming, 2016. "Exploring community structure in networks by consensus dynamics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 450(C), pages 342-353.
    6. Long Ma & Xiao Han & Zhesi Shen & Wen-Xu Wang & Zengru Di, 2015. "Efficient Reconstruction of Heterogeneous Networks from Time Series via Compressed Sensing," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-12, November.
    7. Blagus, Neli & Šubelj, Lovro & Bajec, Marko, 2012. "Self-similar scaling of density in complex real-world networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(8), pages 2794-2802.
    8. Elias Carroni & Paolo Pin & Simone Righi, 2020. "Bring a Friend! Privately or Publicly?," Management Science, INFORMS, vol. 66(5), pages 2269-2290, May.
    9. Biggiero, Lucio & Angelini, Pier Paolo, 2015. "Hunting scale-free properties in R&D collaboration networks: Self-organization, power-law and policy issues in the European aerospace research area," Technological Forecasting and Social Change, Elsevier, vol. 94(C), pages 21-43.
    10. Duan, Shuyu & Wen, Tao & Jiang, Wen, 2019. "A new information dimension of complex network based on Rényi entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 516(C), pages 529-542.
    11. Dávid Csercsik & Sándor Imre, 2017. "Cooperation and coalitional stability in decentralized wireless networks," Telecommunication Systems: Modelling, Analysis, Design and Management, Springer, vol. 64(4), pages 571-584, April.
    12. Baek, Seung Ki & Kim, Tae Young & Kim, Beom Jun, 2008. "Testing a priority-based queue model with Linux command histories," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(14), pages 3660-3668.
    13. Jing Yang & Yingwu Chen, 2011. "Fast Computing Betweenness Centrality with Virtual Nodes on Large Sparse Networks," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-5, July.
    14. Freddy Hernán Cepeda López, 2008. "La topología de redes como herramienta de seguimiento en el Sistema de Pagos de Alto Valor en Colombia," Borradores de Economia 513, Banco de la Republica de Colombia.
    15. Chung-Yuan Huang & Chuen-Tsai Sun & Hsun-Cheng Lin, 2005. "Influence of Local Information on Social Simulations in Small-World Network Models," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 8(4), pages 1-8.
    16. Xiang, Wang, 2023. "Strong ties or structural holes? A distance distribution perspective," Economics Letters, Elsevier, vol. 229(C).
    17. Xue Guo & Hu Zhang & Tianhai Tian, 2019. "Multi-Likelihood Methods for Developing Stock Relationship Networks Using Financial Big Data," Papers 1906.08088, arXiv.org.
    18. Chang, Chia-ling & Chen, Shu-heng, 2011. "Interactions in DSGE models: The Boltzmann-Gibbs machine and social networks approach," Economics Discussion Papers 2011-25, Kiel Institute for the World Economy (IfW Kiel).
    19. Lin, Yi & Zhang, Jianwei & Yang, Bo & Liu, Hong & Zhao, Liping, 2019. "An optimal routing strategy for transport networks with minimal transmission cost and high network capacity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 551-561.
    20. Stefano Breschi & Lucia Cusmano, 2002. "Unveiling the Texture of a European Research Area: Emergence of Oligarchic Networks under EU Framework Programmes," KITeS Working Papers 130, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Jul 2002.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:24:y:2013:i:3:p:802-821. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.