IDEAS home Printed from https://ideas.repec.org/a/scn/025686/16084594.html
   My bibliography  Save this article

SQL query optimization for highly normalized Big data

Author

Listed:
  • GOLOV N.I.

    (National Research University Higher School of Economics)

  • RONNBACK L.

    (Stocholm University)

Abstract

This paper describes an approachforfast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.

Suggested Citation

  • Golov N.I. & Ronnback L., 2015. "SQL query optimization for highly normalized Big data," Бизнес-информатика, CyberLeninka;Федеральное государственное автономное образовательное учреждение высшего образования «Национальный исследовательский университет «Высшая школа экономики», issue 3 (33), pages 7-14.
  • Handle: RePEc:scn:025686:16084594
    as

    Download full text from publisher

    File URL: http://cyberleninka.ru/article/n/sql-query-optimization-for-highly-normalized-big-data
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sowa, Konrad & Przegalinska, Aleksandra & Ciechanowski, Leon, 2021. "Cobots in knowledge work," Journal of Business Research, Elsevier, vol. 125(C), pages 135-142.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:scn:025686:16084594. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CyberLeninka (email available below). General contact details of provider: http://cyberleninka.ru/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.