IDEAS home Printed from https://ideas.repec.org/a/spr/jcomop/v33y2017i1d10.1007_s10878-015-9967-6.html
   My bibliography  Save this article

Minimized-cost cube query on heterogeneous information networks

Author

Listed:
  • Dan Yin

    (Harbin Institute of Technology)

  • Hong Gao

    (Harbin Institute of Technology)

  • Zhaonian Zou

    (Harbin Institute of Technology)

  • Jianzhong Li

    (Harbin Institute of Technology)

Abstract

Data cube is the foundation of on-line analytical processing (OLAP), which can provide users with data views from different perspectives and granularities. Heterogeneous information networks consist of multiple types of nodes and edges which represent different semantic relations. With the rapid development of social networks and knowledge graphs, heterogeneous information networks have become increasingly popular. In heterogeneous information networks, cube is the set of aggregate graphs and cube query is required for supporting OLAP. The existing research mainly studies aggregate graph query on homogeneous networks, but only considers the attributes of nodes. To overcome these challenges, this paper investigates cube query problem on heterogeneous information networks. (1) A novel cube model for heterogeneous information networks is proposed, which captures both the attribute and structure semantics. (2) Because the total number of aggregate graphs is huge, computing and storing them cost plenty of time and storage. The problem of partial cube materialization on heterogeneous information networks is investigated. Given a fixed size of memory space, select a subset of aggregate graphs in cube, to minimize the computing cost of the whole cube. This optimization problem is proved to be NP-complete and there is no $$n^{1-{\epsilon }}$$ n 1 - ϵ approximation algorithm unless P $$=$$ = NP. (3) A greedy algorithm is proposed for partial cube materialization based on two interesting dependencies between aggregate graphs, attribute dependence and path dependence. (4) Experiments on real world data sets show the cube definition is meaningful, and the partial cube materialization algorithm is efficient.

Suggested Citation

  • Dan Yin & Hong Gao & Zhaonian Zou & Jianzhong Li, 2017. "Minimized-cost cube query on heterogeneous information networks," Journal of Combinatorial Optimization, Springer, vol. 33(1), pages 339-364, January.
  • Handle: RePEc:spr:jcomop:v:33:y:2017:i:1:d:10.1007_s10878-015-9967-6
    DOI: 10.1007/s10878-015-9967-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10878-015-9967-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10878-015-9967-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcomop:v:33:y:2017:i:1:d:10.1007_s10878-015-9967-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.