Author
Listed:
- Abdullah Alshammari
(Department of Computer Science and Engineering, College of Computer Science and Engineering, University of Hafr Al Batin, Hafar Al-Batin 39524, Saudi Arabia)
- Ammar Ahmed E. Elhadi
(Department of Computer Science and Engineering, College of Computer Science and Engineering, University of Hafr Al Batin, Hafar Al-Batin 39524, Saudi Arabia)
- Ashraf Osman Ibrahim
(Department of Computing, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia)
Abstract
Heating, ventilation, and air-conditioning (HVAC) systems dominate energy consumption in hot-climate buildings, where maintaining occupant comfort under extreme outdoor conditions remains a critical challenge, particularly under emerging time-of-use (TOU) electricity pricing schemes. While deep reinforcement learning (DRL) has shown promise for adaptive HVAC control, existing approaches often suffer from comfort violations, myopic decision making, and limited robustness to uncertainty. This paper proposes a comfort-first hybrid control framework that integrates Soft Actor–Critic (SAC) with a Cross-Entropy Method (CEM) refinement layer, referred to as SACEM. The framework combines data-efficient off-policy learning with short-horizon predictive optimization and safety-aware action projection to explicitly prioritize thermal comfort while minimizing energy use, operating cost, and peak demand. The control problem is formulated as a Markov Decision Process using a simplified thermal model representative of commercial buildings in hot desert climates. The proposed approach is evaluated through extensive simulation using Saudi Arabian summer weather conditions, realistic occupancy patterns, and a three-tier TOU electricity tariff. Performance is assessed against state-of-the-art baselines, including PPO, TD3, and standard SAC, using comfort, energy, cost, and peak demand metrics, complemented by ablation and disturbance-based stress tests. Results show that SACEM achieves a comfort score of 95.8%, while reducing energy consumption and operating cost by approximately 21% relative to the strongest baseline. The findings demonstrate that integrating comfort-dominant reward design with decision-time look-ahead yields robust, economically viable HVAC control suitable for deployment in hot-climate smart buildings.
Suggested Citation
Abdullah Alshammari & Ammar Ahmed E. Elhadi & Ashraf Osman Ibrahim, 2026.
"Environmentally Sustainable HVAC Management in Smart Buildings Using a Reinforcement Learning Framework SACEM,"
Sustainability, MDPI, vol. 18(2), pages 1-33, January.
Handle:
RePEc:gam:jsusta:v:18:y:2026:i:2:p:1036-:d:1844428
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:18:y:2026:i:2:p:1036-:d:1844428. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.