IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v23y2021i1d10.1007_s10796-020-09998-z.html
   My bibliography  Save this article

Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

Author

Listed:
  • Abdallah Khelil

    (LIAS/ISAE-ENSMA
    LAPECI/Université Oran 1)

  • Amin Mesmoudi

    (LIAS/University of Poitiers)

  • Jorge Galicia

    (LIAS/ISAE-ENSMA)

  • Ladjel Bellatreche

    (LIAS/ISAE-ENSMA)

  • Mohand-Saïd Hacid

    (LIRIS/University of Lyon)

  • Emmanuel Coquery

    (LIRIS/University of Lyon)

Abstract

The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.

Suggested Citation

  • Abdallah Khelil & Amin Mesmoudi & Jorge Galicia & Ladjel Bellatreche & Mohand-Saïd Hacid & Emmanuel Coquery, 2021. "Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing," Information Systems Frontiers, Springer, vol. 23(1), pages 165-183, February.
  • Handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09998-z
    DOI: 10.1007/s10796-020-09998-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-020-09998-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-020-09998-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Spiros Mouzakitis & Dimitris Papaspyros & Michael Petychakis & Sotiris Koussouris & Anastasios Zafeiropoulos & Eleni Fotopoulou & Lena Farid & Fabrizio Orlandi & Judie Attard & John Psarras, 0. "Challenges and opportunities in renovating public sector information by enabling linked data and analytics," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    2. Spiros Mouzakitis & Dimitris Papaspyros & Michael Petychakis & Sotiris Koussouris & Anastasios Zafeiropoulos & Eleni Fotopoulou & Lena Farid & Fabrizio Orlandi & Judie Attard & John Psarras, 2017. "Challenges and opportunities in renovating public sector information by enabling linked data and analytics," Information Systems Frontiers, Springer, vol. 19(2), pages 321-336, April.
    3. Damaris Fuentes-Lorenzo & Jorge Morato & Juan Miguel Gómez, 2009. "Knowledge management in biomedical libraries: A semantic web approach," Information Systems Frontiers, Springer, vol. 11(4), pages 471-480, September.
    4. E. G. Stephan & T. O. Elsethagen & L. K. Berg & M. C. Macduff & P. R. Paulson & W. J. Shaw & C. Sivaraman & W. P. Smith & A. Wynne, 2016. "Semantic catalog of things, services, and data to support a wind data management facility," Information Systems Frontiers, Springer, vol. 18(4), pages 679-691, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Claudia Diamantini & Paolo Lo Giudice & Domenico Potena & Emanuele Storti & Domenico Ursino, 2021. "An Approach to Extracting Topic-guided Views from the Sources of a Data Lake," Information Systems Frontiers, Springer, vol. 23(1), pages 243-262, February.
    2. Claudia Diamantini & Paolo Lo Giudice & Domenico Potena & Emanuele Storti & Domenico Ursino, 0. "An Approach to Extracting Topic-guided Views from the Sources of a Data Lake," Information Systems Frontiers, Springer, vol. 0, pages 1-20.
    3. Marijn Janssen & David Konopnicki & Jane L. Snowdon & Adegboyega Ojo, 2017. "Driving public sector innovation using big and open linked data (BOLD)," Information Systems Frontiers, Springer, vol. 19(2), pages 189-195, April.
    4. Marijn Janssen & David Konopnicki & Jane L. Snowdon & Adegboyega Ojo, 0. "Driving public sector innovation using big and open linked data (BOLD)," Information Systems Frontiers, Springer, vol. 0, pages 1-7.
    5. Quan Z. Sheng & Xue Li & Anne H.H. Ngu & Yongrui Qin & Dong Xie, 2016. "Guest editorial: web of things," Information Systems Frontiers, Springer, vol. 18(4), pages 639-643, August.
    6. Daniel R. Harris, 2016. "Foundations of reusable and interoperable facet models using category theory," Information Systems Frontiers, Springer, vol. 18(5), pages 953-965, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09998-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.