Author
Abstract
Industrial electricity bills are typically composed of two major components: the energy charge, which is based on the total accumulated energy consumption over a billing period (e.g., one month), and the demand charge, which depends on the highest peak power observed during the same period. Consequently, the joint optimization of energy costs (through energy arbitrage) and demand charges (through peak shaving) is crucial for effective cost management in industrial PV-battery load systems. However, this task remains fundamentally challenging due to the volatility of renewable generation and load, the complex temporal dependencies introduced by peak demand charges, and the competing objectives between immediate cost savings and long-term peak reduction—rendering existing model-based and data-driven energy management approaches inadequate for real-world applications. To tackle these challenges, this paper formulates the problem as a soft Markov Decision Process (MDP) and proposes a novel Offline Inverse Reinforcement Learning (OIRL) framework based on a dual reward-policy iterative optimization mechanism. Our approach introduces an innovative synthesis of contrastive reward learning—leveraging both expert demonstrations and on-policy trajectory rollouts—with conservative soft Q-learning optimization. This architecture enables accurate reconstruction of implicit reward structures through comparative analysis of expert and agent behaviors, while ensuring stable policy improvement via regularized value function updates with pessimistic value initialization. Extensive experiments using real-world data from our industrial partner in China demonstrate that OIRL achieves substantial energy arbitrage and peak shaving improvement compared to state-of-the-art reinforcement learning baselines in energy management. Furthermore, the framework maintains robust performance across diverse operating conditions, establishing a new paradigm for intelligent control of industrial PV-battery load systems.
Suggested Citation
Hu, Yulong & Li, Sen, 2026.
"Offline inverse reinforcement learning for joint optimization of energy costs and demand charge in industrial PV-battery load systems,"
Applied Energy, Elsevier, vol. 408(C).
Handle:
RePEc:eee:appene:v:408:y:2026:i:c:s0306261926000681
DOI: 10.1016/j.apenergy.2026.127416
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:appene:v:408:y:2026:i:c:s0306261926000681. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/405891/description#description .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.