IDEAS home Printed from https://ideas.repec.org/a/sae/intdis/v5y2009i1p89-89.html
   My bibliography  Save this article

A Markov Prediction Model Based on Page Hierarchical Clustering

Author

Listed:
  • Yao Yao
  • Lei Shi
  • Zhanhong Wang

Abstract

The Markov prediction model is the basis of Web prefetching and personalized recommendation. It can be used to extract connotative Web link hierarchy. The visualized site structure can not only help users understand the relationships between the pages they have visited, but also suggest where they can go next. But the existence of a large amount of Web objects results in data redundancy and model hugeness. Therefore, how to mine and improve the link structure of a website has become a chief problem and it has positive meanings for prefetching. This paper presents an improved method that simplifies the topology structure of a website and extracted the conceptual link hierarchy which can make the organization clearly and legibly. First, the Markov Tree is constructed for the reason that a more capable mechanism for representing past activity in a form usable for prediction is a Markov Tree. In this case the Markov chain model can be defined as a three-tuple (A, S, P), where A is the collection of operation, S is the state space consisting of all the states in a link structure, and P is the one-step transition probability matrix. The transition probability matrix is calculated based on the Markov tree. Second, an algorithm is given to extract the hierarchical tree from the above matrix. The website link hierarchy (WLH) is obtained accordingly. A WLH only contains a trunk link which is a hyperlink from a page on a higher conceptual level to a page on its adjacent lower conceptual level. With the levels increment, there must be more and more pages in each level. It may blur the structure of the website. In order to tackle the problem, a clustering algorithm is proposed to cluster conceptually-related pages on same levels based on their in-link and out-link similarities, which are measured by the concept of weighted Euclidean distance. After the pages in WLH have been clustered, WLC can be constructed. Finally, the simplified model will be used for Web page prediction. Three parameters, i.e. precision, recall, and PRS have been employed to measure the performance in the experiments. Experiments based on two real Web log data demonstrate the efficiency of the proposed method, which can not only have good overall performance and clustering effect but also keep the relative higher prediction accuracy and recall.

Suggested Citation

  • Yao Yao & Lei Shi & Zhanhong Wang, 2009. "A Markov Prediction Model Based on Page Hierarchical Clustering," International Journal of Distributed Sensor Networks, , vol. 5(1), pages 89-89, January.
  • Handle: RePEc:sae:intdis:v:5:y:2009:i:1:p:89-89
    DOI: 10.1080/15501320802575062
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1080/15501320802575062
    Download Restriction: no

    File URL: https://libkey.io/10.1080/15501320802575062?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:intdis:v:5:y:2009:i:1:p:89-89. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.