IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v9y2024i8p99-d1449846.html
   My bibliography  Save this article

A Performance Analysis of Hybrid and Columnar Cloud Databases for Efficient Schema Design in Distributed Data Warehouse as a Service

Author

Listed:
  • Fred Eduardo Revoredo Rabelo Ferreira

    (Center of Informatics, Federal University of Pernambuco (UFPE), Recife 50670-901, PE, Brazil)

  • Robson do Nascimento Fidalgo

    (Center of Informatics, Federal University of Pernambuco (UFPE), Recife 50670-901, PE, Brazil)

Abstract

A Data Warehouse (DW) is a centralized database that stores large volumes of historical data for analysis and reporting. In a world where enterprise data grows exponentially, new architectures are being investigated to overcome the deficiencies of traditional Database Management Systems (DBMSs), driving a shift towards more modern, cloud-based solutions that provide resources such as distributed processing, columnar storage, and horizontal scalability without the overhead of physical hardware management, i.e., a Database as a Service (DBaaS). Choosing the appropriate class of DBMS is a critical decision for organizations, and there are important differences that impact data volume and query performance (e.g., architecture, data models, and storage) to support analytics in a distributed cloud environment efficiently. In this sense, we carry out an experimental evaluation to analyze the performance of several DBaaS and the impact of data modeling, specifically the usage of a partially normalized Star Schema and a fully denormalized Flat Table Schema, to further comprehend their behavior in different configurations and designs in terms of data schema, storage form, memory availability, and cluster size. The analysis is done in two volumes of data generated by a well-established benchmark, comparing the performance of the DW in terms of average execution time, memory usage, data volume, and loading time. Our results provide guidelines for efficient DW design, showing, for example, that the denormalization of the schema does not guarantee improved performance, as solutions performed differently depending on its architecture. We also show that a Hybrid Processing (HTAP) NewSQL solution can outperform solutions that support only Online Analytical Processing (OLAP) in terms of overall execution time, but that the performance of each query is deeply influenced by its selectivity and by the number of join functions.

Suggested Citation

  • Fred Eduardo Revoredo Rabelo Ferreira & Robson do Nascimento Fidalgo, 2024. "A Performance Analysis of Hybrid and Columnar Cloud Databases for Efficient Schema Design in Distributed Data Warehouse as a Service," Data, MDPI, vol. 9(8), pages 1-24, August.
  • Handle: RePEc:gam:jdataj:v:9:y:2024:i:8:p:99-:d:1449846
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/9/8/99/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/9/8/99/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:9:y:2024:i:8:p:99-:d:1449846. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.