IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v23y2021i1d10.1007_s10796-020-09995-2.html
   My bibliography  Save this article

Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks

Author

Listed:
  • Pietro Michiardi

    (Eurecom)

  • Damiano Carra

    (University of Verona)

  • Sara Migliorini

    (University of Verona)

Abstract

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub)expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Extensive experiments on a prototype implementation of our system show significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks.

Suggested Citation

  • Pietro Michiardi & Damiano Carra & Sara Migliorini, 2021. "Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks," Information Systems Frontiers, Springer, vol. 23(1), pages 35-51, February.
  • Handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09995-2
    DOI: 10.1007/s10796-020-09995-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-020-09995-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-020-09995-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chao Zhu & Qiang Zhu & Calisto Zuzarte & Wenbin Ma, 2016. "Optimization of generic progressive queries based on dependency analysis and materialized views," Information Systems Frontiers, Springer, vol. 18(1), pages 205-231, February.
    2. Prabhakant Sinha & Andris A. Zoltners, 1979. "The Multiple-Choice Knapsack Problem," Operations Research, INFORMS, vol. 27(3), pages 503-515, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhong, Tao & Young, Rhonda, 2010. "Multiple Choice Knapsack Problem: Example of planning choice in transportation," Evaluation and Program Planning, Elsevier, vol. 33(2), pages 128-137, May.
    2. Drexl, Andreas & Haase, Knut, 1996. "Fast approximation methods for sales force deployment," Manuskripte aus den Instituten für Betriebswirtschaftslehre der Universität Kiel 411, Christian-Albrechts-Universität zu Kiel, Institut für Betriebswirtschaftslehre.
    3. Mohammadivojdan, Roshanak & Geunes, Joseph, 2018. "The newsvendor problem with capacitated suppliers and quantity discounts," European Journal of Operational Research, Elsevier, vol. 271(1), pages 109-119.
    4. Degraeve, Z. & Jans, R.F., 2003. "Improved Lower Bounds For The Capacitated Lot Sizing Problem With Set Up Times," ERIM Report Series Research in Management ERS-2003-026-LIS, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    5. George Kozanidis, 2009. "Solving the linear multiple choice knapsack problem with two objectives: profit and equity," Computational Optimization and Applications, Springer, vol. 43(2), pages 261-294, June.
    6. Melachrinoudis, Emanuel & Kozanidis, George, 2002. "A mixed integer knapsack model for allocating funds to highway safety improvements," Transportation Research Part A: Policy and Practice, Elsevier, vol. 36(9), pages 789-803, November.
    7. Andris A. Zoltners & Prabhakant Sinha, 2005. "The 2004 ISMS Practice Prize Winner—Sales Territory Design: Thirty Years of Modeling and Implementation," Marketing Science, INFORMS, vol. 24(3), pages 313-331, September.
    8. Yuji Nakagawa & Ross J. W. James & César Rego & Chanaka Edirisinghe, 2014. "Entropy-Based Optimization of Nonlinear Separable Discrete Decision Models," Management Science, INFORMS, vol. 60(3), pages 695-707, March.
    9. Francis, Peter & Zhang, Guangming & Smilowitz, Karen, 2007. "Improved modeling and solution methods for the multi-resource routing problem," European Journal of Operational Research, Elsevier, vol. 180(3), pages 1045-1059, August.
    10. Morton, Alec, 2014. "Aversion to health inequalities in healthcare prioritisation: A multicriteria optimisation perspective," Journal of Health Economics, Elsevier, vol. 36(C), pages 164-173.
    11. Tue R. L. Christensen & Kim Allan Andersen & Andreas Klose, 2013. "Solving the Single-Sink, Fixed-Charge, Multiple-Choice Transportation Problem by Dynamic Programming," Transportation Science, INFORMS, vol. 47(3), pages 428-438, August.
    12. Bagchi, Ansuman & Bhattacharyya, Nalinaksha & Chakravarti, Nilotpal, 1996. "LP relaxation of the two dimensional knapsack problem with box and GUB constraints," European Journal of Operational Research, Elsevier, vol. 89(3), pages 609-617, March.
    13. Michael Stiglmayr & José Figueira & Kathrin Klamroth, 2014. "On the multicriteria allocation problem," Annals of Operations Research, Springer, vol. 222(1), pages 535-549, November.
    14. Wilbaut, Christophe & Todosijevic, Raca & Hanafi, Saïd & Fréville, Arnaud, 2023. "Heuristic and exact reduction procedures to solve the discounted 0–1 knapsack problem," European Journal of Operational Research, Elsevier, vol. 304(3), pages 901-911.
    15. Edward Y H Lin & Chung-Min Wu, 2004. "The multiple-choice multi-period knapsack problem," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 55(2), pages 187-197, February.
    16. Andonov, R. & Poirriez, V. & Rajopadhye, S., 2000. "Unbounded knapsack problem: Dynamic programming revisited," European Journal of Operational Research, Elsevier, vol. 123(2), pages 394-407, June.
    17. Dauzère-Pérès, Stéphane & Hassoun, Michael, 2020. "On the importance of variability when managing metrology capacity," European Journal of Operational Research, Elsevier, vol. 282(1), pages 267-276.
    18. Drexl, Andreas & Jørnsten, Kurt, 2007. "Pricing the multiple-choice nested knapsack problem," Manuskripte aus den Instituten für Betriebswirtschaftslehre der Universität Kiel 626, Christian-Albrechts-Universität zu Kiel, Institut für Betriebswirtschaftslehre.
    19. Johnston, Robert E. & Khan, Lutfar R., 1995. "Bounds for nested knapsack problems," European Journal of Operational Research, Elsevier, vol. 81(1), pages 154-165, February.
    20. Isada, Yuriko & James, Ross J. W. & Nakagawa, Yuji, 2005. "An approach for solving nonlinear multi-objective separable discrete optimization problem with one constraint," European Journal of Operational Research, Elsevier, vol. 162(2), pages 503-513, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09995-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.