IDEAS home Printed from https://ideas.repec.org/a/eee/appene/v398y2025ics0306261925011341.html

Building demand response control through constrained reinforcement learning with linear policies

Author

Listed:
  • Sanchez, Jerson
  • Cai, Jie

Abstract

Recent advancements in model-free control strategies, particularly reinforcement learning (RL), have enabled more practical and scalable solutions for controlling building energy systems. These strategies rely solely on data, eliminating the need for complex models of building dynamics during control decision making, the development of which is expensive involving significant engineering efforts. Conventional unconstrained RL controllers typically manage indoor comfort by incorporating a penalty for comfort violations into the reward function. This penalty function approach leads to control performance very sensitive to the penalty factor setting. A low comfort penalty factor can result in significant violations of comfort constraints while a high penalty factor tends to degrade economic performance. To address this issue, the present study presents a constrained RL-based control strategy for building demand response that explicitly learns a constraint value function from operation data. This study considers both linear mapping and deep neural networks for value and policy function approximation to evaluate their training stability and control performance in terms of economic return and constraint satisfaction. Simulation tests of the proposed strategy, as well as baseline model predictive controllers (MPC) and unconstrained RL strategies, demonstrate that the constrained RL approach could achieve utility cost savings of up to 16.1 %, comparable to those achieved with MPC baselines, while minimizing constraint violations. In contrast, the unconstrained RL controllers either lead to high utility costs or significant constraint violations, depending on the penalty factor settings. The constrained RL strategy with linear policy and value functions shows more stable training and offers 4 % additional cost savings with reduced constraint violations compared to constrained RL controllers with neural networks.

Suggested Citation

  • Sanchez, Jerson & Cai, Jie, 2025. "Building demand response control through constrained reinforcement learning with linear policies," Applied Energy, Elsevier, vol. 398(C).
  • Handle: RePEc:eee:appene:v:398:y:2025:i:c:s0306261925011341
    DOI: 10.1016/j.apenergy.2025.126404
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0306261925011341
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.apenergy.2025.126404?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Kuldeep Kurte & Jeffrey Munk & Olivera Kotevska & Kadir Amasyali & Robert Smith & Evan McKee & Yan Du & Borui Cui & Teja Kuruganti & Helia Zandi, 2020. "Evaluating the Adaptability of Reinforcement Learning Based HVAC Control for Residential Houses," Sustainability, MDPI, vol. 12(18), pages 1-38, September.
    2. Du, Yan & Zandi, Helia & Kotevska, Olivera & Kurte, Kuldeep & Munk, Jeffery & Amasyali, Kadir & Mckee, Evan & Li, Fangxing, 2021. "Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning," Applied Energy, Elsevier, vol. 281(C).
    3. Richard D. Smallwood & Edward J. Sondik, 1973. "The Optimal Control of Partially Observable Markov Processes over a Finite Horizon," Operations Research, INFORMS, vol. 21(5), pages 1071-1088, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blad, C. & Bøgh, S. & Kallesøe, C. & Raftery, Paul, 2023. "A laboratory test of an Offline-trained Multi-Agent Reinforcement Learning Algorithm for Heating Systems," Applied Energy, Elsevier, vol. 337(C).
    2. Panagiotis Michailidis & Iakovos Michailidis & Dimitrios Vamvakas & Elias Kosmatopoulos, 2023. "Model-Free HVAC Control in Buildings: A Review," Energies, MDPI, vol. 16(20), pages 1-45, October.
    3. Razzano, Giuseppe & Brandi, Silvio & Piscitelli, Marco Savino & Capozzoli, Alfonso, 2025. "Rule extraction from deep reinforcement learning controller and comparative analysis with ASHRAE control sequences for the optimal management of Heating, Ventilation, and Air Conditioning (HVAC) systems in multizone buildings," Applied Energy, Elsevier, vol. 381(C).
    4. Nan Zhang & Sen Tian & Le Li & Zhongbin Wang & Jun Zhang, 2023. "Maintenance analysis of a partial observable K-out-of-N system with load sharing units," Journal of Risk and Reliability, , vol. 237(4), pages 703-713, August.
    5. Williams, Byron K., 2009. "Markov decision processes in natural resources management: Observability and uncertainty," Ecological Modelling, Elsevier, vol. 220(6), pages 830-840.
    6. Li, Yanjie & Yin, Baoqun & Xi, Hongsheng, 2011. "Finding optimal memoryless policies of POMDPs under the expected average reward criterion," European Journal of Operational Research, Elsevier, vol. 211(3), pages 556-567, June.
    7. Omar Al-Ani & Sanjoy Das, 2022. "Reinforcement Learning: Theory and Applications in HEMS," Energies, MDPI, vol. 15(17), pages 1-37, September.
    8. Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
    9. Zeyue Sun & Mohsen Eskandari & Chaoran Zheng & Ming Li, 2022. "Handling Computation Hardness and Time Complexity Issue of Battery Energy Storage Scheduling in Microgrids by Deep Reinforcement Learning," Energies, MDPI, vol. 16(1), pages 1-20, December.
    10. Zhang, Bin & Hu, Weihao & Ghias, Amer M.Y.M. & Xu, Xiao & Chen, Zhe, 2022. "Multi-agent deep reinforcement learning-based coordination control for grid-aware multi-buildings," Applied Energy, Elsevier, vol. 328(C).
    11. M. Usman Saleem & Mustafa Shakir & M. Rehan Usman & M. Hamza Tahir Bajwa & Noman Shabbir & Payam Shams Ghahfarokhi & Kamran Daniel, 2023. "Integrating Smart Energy Management System with Internet of Things and Cloud Computing for Efficient Demand Side Management in Smart Grids," Energies, MDPI, vol. 16(12), pages 1-21, June.
    12. Seites-Rundlett, William & Bashar, Mohammad Z. & Torres-Machi, Cristina & Corotis, Ross B., 2022. "Combined evidence model to enhance pavement condition prediction from highly uncertain sensor data," Reliability Engineering and System Safety, Elsevier, vol. 217(C).
    13. Charalampos Rafail Lazaridis & Iakovos Michailidis & Georgios Karatzinis & Panagiotis Michailidis & Elias Kosmatopoulos, 2024. "Evaluating Reinforcement Learning Algorithms in Residential Energy Saving and Comfort Management," Energies, MDPI, vol. 17(3), pages 1-33, January.
    14. Chiel van Oosterom & Lisa M. Maillart & Jeffrey P. Kharoufeh, 2017. "Optimal maintenance policies for a safety‐critical system and its deteriorating sensor," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(5), pages 399-417, August.
    15. Kirk A. Yost & Alan R. Washburn, 2000. "The LP/POMDP marriage: Optimization with imperfect information," Naval Research Logistics (NRL), John Wiley & Sons, vol. 47(8), pages 607-619, December.
    16. Bei Zhao & Siwen Zheng & Jianhui Zhang, 2020. "Optimal policy for composite sensing with crowdsourcing," International Journal of Distributed Sensor Networks, , vol. 16(5), pages 15501477209, May.
    17. Bian, Yuexin & Schmidt, Oliver & Shi, Yuanyuan, 2026. "Operator learning for energy-efficient building ventilation control with computational fluid dynamics simulation of a real-world classroom," Applied Energy, Elsevier, vol. 404(C).
    18. Malek Ebadi & Raha Akhavan-Tabatabaei, 2021. "Personalized Cotesting Policies for Cervical Cancer Screening: A POMDP Approach," Mathematics, MDPI, vol. 9(6), pages 1-20, March.
    19. Zong-Zhi Lin & James C. Bean & Chelsea C. White, 2004. "A Hybrid Genetic/Optimization Algorithm for Finite-Horizon, Partially Observed Markov Decision Processes," INFORMS Journal on Computing, INFORMS, vol. 16(1), pages 27-38, February.
    20. N. Bora Keskin & John R. Birge, 2019. "Dynamic Selling Mechanisms for Product Differentiation and Learning," Operations Research, INFORMS, vol. 67(4), pages 1069-1089, July.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:appene:v:398:y:2025:i:c:s0306261925011341. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/405891/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.