IDEAS home Printed from https://ideas.repec.org/a/eee/appene/v398y2025ics0306261925011341.html

Building demand response control through constrained reinforcement learning with linear policies

Author

Listed:
  • Sanchez, Jerson
  • Cai, Jie

Abstract

Recent advancements in model-free control strategies, particularly reinforcement learning (RL), have enabled more practical and scalable solutions for controlling building energy systems. These strategies rely solely on data, eliminating the need for complex models of building dynamics during control decision making, the development of which is expensive involving significant engineering efforts. Conventional unconstrained RL controllers typically manage indoor comfort by incorporating a penalty for comfort violations into the reward function. This penalty function approach leads to control performance very sensitive to the penalty factor setting. A low comfort penalty factor can result in significant violations of comfort constraints while a high penalty factor tends to degrade economic performance. To address this issue, the present study presents a constrained RL-based control strategy for building demand response that explicitly learns a constraint value function from operation data. This study considers both linear mapping and deep neural networks for value and policy function approximation to evaluate their training stability and control performance in terms of economic return and constraint satisfaction. Simulation tests of the proposed strategy, as well as baseline model predictive controllers (MPC) and unconstrained RL strategies, demonstrate that the constrained RL approach could achieve utility cost savings of up to 16.1 %, comparable to those achieved with MPC baselines, while minimizing constraint violations. In contrast, the unconstrained RL controllers either lead to high utility costs or significant constraint violations, depending on the penalty factor settings. The constrained RL strategy with linear policy and value functions shows more stable training and offers 4 % additional cost savings with reduced constraint violations compared to constrained RL controllers with neural networks.

Suggested Citation

  • Sanchez, Jerson & Cai, Jie, 2025. "Building demand response control through constrained reinforcement learning with linear policies," Applied Energy, Elsevier, vol. 398(C).
  • Handle: RePEc:eee:appene:v:398:y:2025:i:c:s0306261925011341
    DOI: 10.1016/j.apenergy.2025.126404
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0306261925011341
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.apenergy.2025.126404?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Kuldeep Kurte & Jeffrey Munk & Olivera Kotevska & Kadir Amasyali & Robert Smith & Evan McKee & Yan Du & Borui Cui & Teja Kuruganti & Helia Zandi, 2020. "Evaluating the Adaptability of Reinforcement Learning Based HVAC Control for Residential Houses," Sustainability, MDPI, vol. 12(18), pages 1-38, September.
    2. Richard D. Smallwood & Edward J. Sondik, 1973. "The Optimal Control of Partially Observable Markov Processes over a Finite Horizon," Operations Research, INFORMS, vol. 21(5), pages 1071-1088, October.
    3. Du, Yan & Zandi, Helia & Kotevska, Olivera & Kurte, Kuldeep & Munk, Jeffery & Amasyali, Kadir & Mckee, Evan & Li, Fangxing, 2021. "Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning," Applied Energy, Elsevier, vol. 281(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blad, C. & Bøgh, S. & Kallesøe, C. & Raftery, Paul, 2023. "A laboratory test of an Offline-trained Multi-Agent Reinforcement Learning Algorithm for Heating Systems," Applied Energy, Elsevier, vol. 337(C).
    2. Panagiotis Michailidis & Iakovos Michailidis & Dimitrios Vamvakas & Elias Kosmatopoulos, 2023. "Model-Free HVAC Control in Buildings: A Review," Energies, MDPI, vol. 16(20), pages 1-45, October.
    3. Li, Yanjie & Yin, Baoqun & Xi, Hongsheng, 2011. "Finding optimal memoryless policies of POMDPs under the expected average reward criterion," European Journal of Operational Research, Elsevier, vol. 211(3), pages 556-567, June.
    4. Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
    5. Zeyue Sun & Mohsen Eskandari & Chaoran Zheng & Ming Li, 2022. "Handling Computation Hardness and Time Complexity Issue of Battery Energy Storage Scheduling in Microgrids by Deep Reinforcement Learning," Energies, MDPI, vol. 16(1), pages 1-20, December.
    6. Zhang, Bin & Hu, Weihao & Ghias, Amer M.Y.M. & Xu, Xiao & Chen, Zhe, 2022. "Multi-agent deep reinforcement learning-based coordination control for grid-aware multi-buildings," Applied Energy, Elsevier, vol. 328(C).
    7. M. Usman Saleem & Mustafa Shakir & M. Rehan Usman & M. Hamza Tahir Bajwa & Noman Shabbir & Payam Shams Ghahfarokhi & Kamran Daniel, 2023. "Integrating Smart Energy Management System with Internet of Things and Cloud Computing for Efficient Demand Side Management in Smart Grids," Energies, MDPI, vol. 16(12), pages 1-21, June.
    8. Chiel van Oosterom & Lisa M. Maillart & Jeffrey P. Kharoufeh, 2017. "Optimal maintenance policies for a safety‐critical system and its deteriorating sensor," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(5), pages 399-417, August.
    9. Malek Ebadi & Raha Akhavan-Tabatabaei, 2021. "Personalized Cotesting Policies for Cervical Cancer Screening: A POMDP Approach," Mathematics, MDPI, vol. 9(6), pages 1-20, March.
    10. N. Bora Keskin & John R. Birge, 2019. "Dynamic Selling Mechanisms for Product Differentiation and Learning," Operations Research, INFORMS, vol. 67(4), pages 1069-1089, July.
    11. Junbo Son & Yeongin Kim & Shiyu Zhou, 2022. "Alerting patients via health information system considering trust-dependent patient adherence," Information Technology and Management, Springer, vol. 23(4), pages 245-269, December.
    12. Jonghoon Ahn, 2020. "Improvement of the Performance Balance between Thermal Comfort and Energy Use for a Building Space in the Mid-Spring Season," Sustainability, MDPI, vol. 12(22), pages 1-14, November.
    13. Guo, Yuxiang & Qu, Shengli & Wang, Chuang & Xing, Ziwen & Duan, Kaiwen, 2024. "Optimal dynamic thermal management for data center via soft actor-critic algorithm with dynamic control interval and combined-value state space," Applied Energy, Elsevier, vol. 373(C).
    14. Hao Zhang, 2010. "Partially Observable Markov Decision Processes: A Geometric Technique and Analysis," Operations Research, INFORMS, vol. 58(1), pages 214-228, February.
    15. Wang, Qiaochu & Ding, Yan & Kong, Xiangfei & Tian, Zhe & Xu, Linrui & He, Qing, 2022. "Load pattern recognition based optimization method for energy flexibility in office buildings," Energy, Elsevier, vol. 254(PC).
    16. Chernonog, Tatyana & Avinadav, Tal, 2016. "A two-state partially observable Markov decision process with three actionsAuthor-Name: Ben-Zvi, Tal," European Journal of Operational Research, Elsevier, vol. 254(3), pages 957-967.
    17. Martin Mundhenk, 2000. "The Complexity of Optimal Small Policies," Mathematics of Operations Research, INFORMS, vol. 25(1), pages 118-129, February.
    18. Hernandez-Matheus, Alejandro & Löschenbrand, Markus & Berg, Kjersti & Fuchs, Ida & Aragüés-Peñalba, Mònica & Bullich-Massagué, Eduard & Sumper, Andreas, 2022. "A systematic review of machine learning techniques related to local energy communities," Renewable and Sustainable Energy Reviews, Elsevier, vol. 170(C).
    19. Zhao, Liyuan & Yang, Ting & Li, Wei & Zomaya, Albert Y., 2022. "Deep reinforcement learning-based joint load scheduling for household multi-energy system," Applied Energy, Elsevier, vol. 324(C).
    20. Zheng, Lingwei & Wu, Hao & Guo, Siqi & Sun, Xinyu, 2023. "Real-time dispatch of an integrated energy system based on multi-stage reinforcement learning with an improved action-choosing strategy," Energy, Elsevier, vol. 277(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:appene:v:398:y:2025:i:c:s0306261925011341. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/405891/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.