IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v197y2012i1p123-13410.1007-s10479-010-0704-3.html
   My bibliography  Save this article

Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data

Author

Listed:
  • Gang Kou
  • Chunwei Lou

Abstract

The developments in World Wide Web and the advances in digital data collection and storage technologies during the last two decades allow companies and organizations to store and share huge amounts of electronic documents. It is hard and inefficient to manually organize, analyze and present these documents. Search engine helps users to find relevant information by present a list of web pages in response to queries. How to assist users to find the most relevant web pages from vast text collections efficiently is a big challenge. The purpose of this study is to propose a hierarchical clustering method that combines multiple factors to identify clusters of web pages that can satisfy users’ information needs. The clusters are primarily envisioned to be used for search and navigation and potentially for some form of visualization as well. An experiment on Clickstream data from a processional search engine was conducted to examine the results shown that the clustering method is effective and efficient, in terms of both objective and subjective measures. Copyright Springer Science+Business Media, LLC 2012

Suggested Citation

  • Gang Kou & Chunwei Lou, 2012. "Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data," Annals of Operations Research, Springer, vol. 197(1), pages 123-134, August.
  • Handle: RePEc:spr:annopr:v:197:y:2012:i:1:p:123-134:10.1007/s10479-010-0704-3
    DOI: 10.1007/s10479-010-0704-3
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10479-010-0704-3
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10479-010-0704-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sang-Sung Park & Kwang-Kyu Seo & Dong-Sik Jang, 2007. "Fuzzy Art-Based Image Clustering Method For Content-Based Image Retrieval," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 6(02), pages 213-233.
    2. Yong Shi, 2009. "Current Research Trend: Information Technology And Decision Making In 2008," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 8(01), pages 1-5.
    3. Wen Zhang & Taketoshi Yoshida & Xijin Tang, 2009. "Distribution Of Multi-Words In Chinese And English Documents," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 8(02), pages 249-265.
    4. Jongwon Lee & Heeseok Lee, 2008. "Strategic Agent Based Web System Development Methodology," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(02), pages 309-337.
    5. Yong Shi & Yi Peng & Gang Kou & Zhengxin Chen, 2005. "Classifying Credit Card Accounts For Business Intelligence And Decision Making: A Multiple-Criteria Quadratic Programming Approach," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 4(04), pages 581-599.
    6. Yi Peng & Gang Kou & Yong Shi & Zhengxin Chen, 2008. "A Descriptive Framework For The Field Of Data Mining And Knowledge Discovery," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(04), pages 639-682.
    7. Gang Kou & Yi Peng & Yong Shi & Morgan Wise & Weixuan Xu, 2005. "Discovering Credit Cardholders’ Behavior by Multiple Criteria Linear Programming," Annals of Operations Research, Springer, vol. 135(1), pages 261-274, March.
    8. Sergio Alejandro Gómez & Carlos Iván Chesñevar & Guillermo Ricardo Simari, 2008. "Defeasible Reasoning In Web-Based Forms Through Argumentation," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(01), pages 71-101.
    9. Qingyu Zhang & Richard S. Segall, 2008. "Web Mining: A Survey Of Current Research, Techniques, And Software," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(04), pages 683-720.
    10. Soongoo Hong & Pairin Katerattanakul & Seok Jeong Joo, 2008. "Evaluating Government Website Accessibility: A Comparative Study," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(03), pages 491-515.
    11. Raid Al-Aomar & Fikri Dweiri, 2008. "A Customer-Oriented Decision Agent For Product Selection In Web-Based Services," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(01), pages 35-52.
    12. Jia Hu & Ning Zhong, 2008. "Web Farming With Clickstream," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(02), pages 291-308.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Idil Yavuz & Orrin Cooper, 2017. "A dynamic clustering method to improve the coherency of an ANP Supermatrix," Annals of Operations Research, Springer, vol. 254(1), pages 507-531, July.
    2. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    3. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2019. "Technology in the 21st century: New challenges and opportunities," Technological Forecasting and Social Change, Elsevier, vol. 143(C), pages 321-335.
    4. R. Sujatha & T. M. Rajalaxmi, 2016. "Hierarchical Fuzzy Hidden Markov Chain for Web Applications," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(01), pages 83-118, January.
    5. John A. Aloysius & Hartmut Hoehle & Soheil Goodarzi & Viswanath Venkatesh, 2018. "Big data initiatives in retail environments: Linking service process perceptions to shopping outcomes," Annals of Operations Research, Springer, vol. 270(1), pages 25-51, November.
    6. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2017. "A multidisciplinary perspective of big data in management research," International Journal of Production Economics, Elsevier, vol. 191(C), pages 97-112.
    7. Borja Ena & Alberto Gomez & Borja Ponte & Paolo Priore & Diego Diaz, 2022. "Homogeneous grouping of non-prime steel products for online auctions: a case study," Annals of Operations Research, Springer, vol. 315(1), pages 591-621, August.
    8. Roman Vavrek, 2019. "Evaluation of the Impact of Selected Weighting Methods on the Results of the TOPSIS Technique," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(06), pages 1821-1843, November.
    9. Ziyun Deng & Tingqin He, 2018. "A Method for Filtering Pages by Similarity Degree based on Dynamic Programming," Future Internet, MDPI, vol. 10(12), pages 1-12, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wikil Kwak & Yong Shi & Gang Kou, 2012. "Bankruptcy prediction for Korean firms after the 1997 financial crisis: using a multiple criteria linear programming data mining approach," Review of Quantitative Finance and Accounting, Springer, vol. 38(4), pages 441-453, May.
    2. Andrea Ko & Saira Gillani, 2020. "A Research Review and Taxonomy Development for Decision Support and Business Analytics Using Semantic Text Mining," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 97-126, January.
    3. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    4. Filip Rubacek & Irena Jindrichovska & Zuzana Horvathova & Josef Abrham, 2020. "Accessibility of Websites of the European National Tourism Boards," International Journal of Economics & Business Administration (IJEBA), International Journal of Economics & Business Administration (IJEBA), vol. 0(2), pages 114-125.
    5. Chun-Hao Chen & Tzung-Pei Hong & Yeong-Chyi Lee & Vincent S. Tseng, 2015. "Finding Active Membership Functions for Genetic-Fuzzy Data Mining," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 14(06), pages 1215-1242, November.
    6. Yen-Hao Hsieh & Soe-Tsyr Yuan, 2016. "Can Customer Expectations be Measured in Real Time?," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(01), pages 119-149, January.
    7. Daji Ergu & Gang Kou, 2012. "Questionnaire design improvement and missing item scores estimation for rapid and efficient decision making," Annals of Operations Research, Springer, vol. 197(1), pages 5-23, August.
    8. Roman Vavrek, 2019. "Evaluation of the Impact of Selected Weighting Methods on the Results of the TOPSIS Technique," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(06), pages 1821-1843, November.
    9. Ginger Saltos & Mihaela Cocea, 2017. "An Exploration of Crime Prediction Using Data Mining on Open Data," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(05), pages 1155-1181, September.
    10. Philippe Baecke & Dirk Van Den Poel, 2010. "Improving Purchasing Behavior Predictions By Data Augmentation With Situational Variables," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 9(06), pages 853-872.
    11. Lean Yu & Zebin Yang & Ling Tang, 2016. "A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment," Flexible Services and Manufacturing Journal, Springer, vol. 28(4), pages 576-592, December.
    12. P. D. Mahendhiran & S. Kannimuthu, 2018. "Deep Learning Techniques for Polarity Classification in Multimodal Sentiment Analysis," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 17(03), pages 883-910, May.
    13. Jingguo Wang & Raj Sharman & Stanley Zionts, 2012. "Functionality defense through diversity: a design framework to multitier systems," Annals of Operations Research, Springer, vol. 197(1), pages 25-45, August.
    14. M. Ballings & D. Van Den Poel, 2012. "The Relevant Length of Customer Event History for Churn Prediction: How long is long enough?," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 12/804, Ghent University, Faculty of Economics and Business Administration.
    15. Giyasettin Ozcan, 2018. "Unsupervised Learning from Multi-Dimensional Data: A Fast Clustering Algorithm Utilizing Canopies and Statistical Information," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 17(03), pages 841-856, May.
    16. Yugang Yu & Chengbin Chu & Haoxun Chen & Feng Chu, 2012. "Large scale stochastic inventory routing problems with split delivery and service level constraints," Annals of Operations Research, Springer, vol. 197(1), pages 135-158, August.
    17. Ergu, Daji & Kou, Gang & Peng, Yi & Shi, Yong, 2011. "A simple method to improve the consistency ratio of the pair-wise comparison matrix in ANP," European Journal of Operational Research, Elsevier, vol. 213(1), pages 246-259, August.
    18. Lean Yu & Shouyang Wang & Fenghua Wen & Kin Lai, 2012. "Genetic algorithm-based multi-criteria project portfolio selection," Annals of Operations Research, Springer, vol. 197(1), pages 71-86, August.
    19. M. Ballings & D. Van Den Poel & E. Verhagen, 2013. "Evaluating the Added Value of Pictorial Data for Customer Churn Prediction," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 13/869, Ghent University, Faculty of Economics and Business Administration.
    20. Amroush, Fadi, 2009. "استخدام تقنيات الذكاء الصنعي لاختيار أمثل نظام إداة علاقات مع الزبائن ملائم لاحتياجات شركة ما [Using Artificial intelligence to select the optimal E-CRM Based business needs]," MPRA Paper 28014, University Library of Munich, Germany.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:197:y:2012:i:1:p:123-134:10.1007/s10479-010-0704-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.