IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i1p39-54.html
   My bibliography  Save this article

Self-Learning Threshold-Based Load Balancing

Author

Listed:
  • Diego Goldsztajn

    (Eindhoven University of Technology, Eindhoven 5612 AZ, Netherlands)

  • Sem C. Borst

    (Eindhoven University of Technology, Eindhoven 5612 AZ, Netherlands)

  • Johan S. H. van Leeuwaarden

    (Tilburg University, Tilburg 5037 AB, Netherlands)

  • Debankur Mukherjee

    (Georgia Institute of Technology, Atlanta, Georgia 30332)

  • Philip A. Whiting

    (Macquarie University, Macquarie Park, New South Wales 2109, Australia)

Abstract

We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality of service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions that guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments that support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns. Summary of Contribution: Data centers and cloud computing platforms are the digital factories of the world, and managing resources and workloads in these systems involves operations research challenges of an unprecedented scale. Due to the massive size, complex dynamics, and wide range of time scales, the design and implementation of optimal resource-allocation strategies is prohibitively demanding from a computation and communication perspective. These resource-allocation strategies are essential for certain interactive applications, for which the available computing resources need to be distributed optimally among users in order to provide the best overall experienced performance. This is the subject of the present article, which considers the problem of distributing tasks among the various server pools of a large-scale service system, with the objective of optimizing the overall quality of service provided to users. A solution to this load-balancing problem cannot rely on maintaining complete state information at the gateway of the system, since this is computationally unfeasible, due to the magnitude and complexity of modern data centers and cloud computing platforms. Therefore, we examine a computationally light load-balancing algorithm that is yet asymptotically optimal in a regime where the size of the system approaches infinity. The analysis is based on a Markovian stochastic model, which is studied through fluid and diffusion limits in the aforementioned large-scale regime. The article analyzes the load-balancing algorithm theoretically and provides numerical experiments that support and extend the theoretical results.

Suggested Citation

  • Diego Goldsztajn & Sem C. Borst & Johan S. H. van Leeuwaarden & Debankur Mukherjee & Philip A. Whiting, 2022. "Self-Learning Threshold-Based Load Balancing," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 39-54, January.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:39-54
    DOI: 10.1287/ijoc.2021.1100
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2021.1100
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2021.1100?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Defraeye, Mieke & Van Nieuwenhuyse, Inneke, 2016. "Staffing and scheduling under nonstationary demand for service: A literature review," Omega, Elsevier, vol. 58(C), pages 4-25.
    2. Debankur Mukherjee & Sem C. Borst & Johan S. H. van Leeuwaarden & Philip A. Whiting, 2020. "Asymptotic Optimality of Power-of- d Load Balancing in Large-Scale Systems," Mathematics of Operations Research, INFORMS, vol. 45(4), pages 1535-1571, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pei, Zhi & Dai, Xu & Yuan, Yilun & Du, Rui & Liu, Changchun, 2021. "Managing price and fleet size for courier service with shared drones," Omega, Elsevier, vol. 104(C).
    2. Shone, Rob & Glazebrook, Kevin & Zografos, Konstantinos G., 2019. "Resource allocation in congested queueing systems with time-varying demand: An application to airport operations," European Journal of Operational Research, Elsevier, vol. 276(2), pages 566-581.
    3. Alireza Pooya & Morteza Pakdaman, 2021. "A new continuous time optimal control model for manpower planning with promotion from inside the system," Operational Research, Springer, vol. 21(1), pages 349-364, March.
    4. Niyirora, Jerome & Zhuang, Jun, 2017. "Fluid approximations and control of queues in emergency departments," European Journal of Operational Research, Elsevier, vol. 261(3), pages 1110-1124.
    5. Mustafa Y. Sir & David Nestler & Thomas Hellmich & Devashish Das & Michael J. Laughlin & Michon C. Dohlman & Kalyan Pasupathy, 2017. "Optimization of Multidisciplinary Staffing Improves Patient Experiences at the Mayo Clinic," Interfaces, INFORMS, vol. 47(5), pages 425-441, October.
    6. Yin, Yunqiang & Cheng, Shuenn-Ren & Cheng, T.C.E. & Wang, Du-Juan & Wu, Chin-Chia, 2016. "Just-in-time scheduling with two competing agents on unrelated parallel machines," Omega, Elsevier, vol. 63(C), pages 41-47.
    7. Smirnov, Dmitry & Huchzermeier, Arnd, 2020. "Analytics for labor planning in systems with load-dependent service times," European Journal of Operational Research, Elsevier, vol. 287(2), pages 668-681.
    8. Erhard, Melanie & Schoenfelder, Jan & Fügener, Andreas & Brunner, Jens O., 2018. "State of the art in physician scheduling," European Journal of Operational Research, Elsevier, vol. 265(1), pages 1-18.
    9. Cho, David D. & Stauffer, Jon M., 2022. "Tele-medicine question response service: Analysis of benefits and costs," Omega, Elsevier, vol. 111(C).
    10. Ward Whitt & Jingtong Zhao, 2017. "Many‐server loss models with non‐poisson time‐varying arrivals," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(3), pages 177-202, April.
    11. Andersen, Anders Reenberg & Nielsen, Bo Friis & Reinhardt, Line Blander & Stidsen, Thomas Riis, 2019. "Staff optimization for time-dependent acute patient flow," European Journal of Operational Research, Elsevier, vol. 272(1), pages 94-105.
    12. Yang, Muer & Wang, Xinfang (Jocelyn) & Xu, Nuo, 2015. "A robust voting machine allocation model to reduce extreme waiting," Omega, Elsevier, vol. 57(PB), pages 230-237.
    13. Farzad Zaerpour & Marco Bijvank & Huiyin Ouyang & Zhankun Sun, 2022. "Scheduling of Physicians with Time‐Varying Productivity Levels in Emergency Departments," Production and Operations Management, Production and Operations Management Society, vol. 31(2), pages 645-667, February.
    14. N. Llort & A. Lusa & C. Martínez-Costa & M. Mateo, 2019. "A decision support system and a mathematical model for strategic workforce planning in consultancies," Flexible Services and Manufacturing Journal, Springer, vol. 31(2), pages 497-523, June.
    15. Ma, Xiaoyu & Deng, Tianhu & Xue, Mengying & Shen, Zuo-Jun Max & Lan, Boxiong, 2017. "Optimal dynamic pricing of mobile data plans in wireless communications," Omega, Elsevier, vol. 66(PA), pages 91-105.
    16. Pieter Smet & Annelies Lejon & Greet Vanden Berghe, 2021. "Demand smoothing in shift design," Flexible Services and Manufacturing Journal, Springer, vol. 33(2), pages 457-484, June.
    17. Schwarz, Justus Arne & Selinka, Gregor & Stolletz, Raik, 2016. "Performance analysis of time-dependent queueing systems: Survey and classification," Omega, Elsevier, vol. 63(C), pages 170-189.
    18. Eugene Furman & Adam Diamant & Murat Kristal, 2021. "Customer Acquisition and Retention: A Fluid Approach for Staffing," Production and Operations Management, Production and Operations Management Society, vol. 30(11), pages 4236-4257, November.
    19. Notz, Pascal M. & Wolf, Peter K. & Pibernik, Richard, 2023. "Prescriptive analytics for a multi-shift staffing problem," European Journal of Operational Research, Elsevier, vol. 305(2), pages 887-901.
    20. Alemayehu, Fikru K. & Kumbhakar, Subal C. & Landazuri Tveteraas, Sigbjørn, 2022. "Estimation of staff use efficiency: Evidence from the hospitality industry," Technological Forecasting and Social Change, Elsevier, vol. 178(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:39-54. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.