Offline inverse reinforcement learning for joint optimization of energy costs and demand charge in industrial PV-battery load systems

Offline inverse reinforcement learning for joint optimization of energy costs and demand charge in industrial PV-battery load systems

Author

Listed:

Hu, Yulong
Li, Sen

Abstract

Industrial electricity bills are typically composed of two major components: the energy charge, which is based on the total accumulated energy consumption over a billing period (e.g., one month), and the demand charge, which depends on the highest peak power observed during the same period. Consequently, the joint optimization of energy costs (through energy arbitrage) and demand charges (through peak shaving) is crucial for effective cost management in industrial PV-battery load systems. However, this task remains fundamentally challenging due to the volatility of renewable generation and load, the complex temporal dependencies introduced by peak demand charges, and the competing objectives between immediate cost savings and long-term peak reduction—rendering existing model-based and data-driven energy management approaches inadequate for real-world applications. To tackle these challenges, this paper formulates the problem as a soft Markov Decision Process (MDP) and proposes a novel Offline Inverse Reinforcement Learning (OIRL) framework based on a dual reward-policy iterative optimization mechanism. Our approach introduces an innovative synthesis of contrastive reward learning—leveraging both expert demonstrations and on-policy trajectory rollouts—with conservative soft Q-learning optimization. This architecture enables accurate reconstruction of implicit reward structures through comparative analysis of expert and agent behaviors, while ensuring stable policy improvement via regularized value function updates with pessimistic value initialization. Extensive experiments using real-world data from our industrial partner in China demonstrate that OIRL achieves substantial energy arbitrage and peak shaving improvement compared to state-of-the-art reinforcement learning baselines in energy management. Furthermore, the framework maintains robust performance across diverse operating conditions, establishing a new paradigm for intelligent control of industrial PV-battery load systems.

Suggested Citation

Hu, Yulong & Li, Sen, 2026. "Offline inverse reinforcement learning for joint optimization of energy costs and demand charge in industrial PV-battery load systems," Applied Energy, Elsevier, vol. 408(C).

Handle: RePEc:eee:appene:v:408:y:2026:i:c:s0306261926000681
DOI: 10.1016/j.apenergy.2026.127416

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:appene:v:408:y:2026:i:c:s0306261926000681. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/405891/description#description .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Offline inverse reinforcement learning for joint optimization of energy costs and demand charge in industrial PV-battery load systems

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data