Author
Listed:
- V. Joshi Kumar
- Vinodh Kumar Elumalai
Abstract
This paper puts forward a novel deep reinforcement learning control framework to realise continuous action control for a partially observable system. One of the central problems in continuous action control is finding an optimal policy, which can make the agent achieve the control goals without violating the constraints. Although the reinforcement learning technique (RL) is primarily applied for addressing the optimisation problem in continuous action space, the critical limitation of the existing methods is that they utilise only a one-step state transition approach and fail to capitalise on the information available in the sequence of its previous states. Consequently, learning an optimal policy for continuous action space through current techniques may not be effective. Hence, this study attempts to solve the optimisation problem by integrating a convolutional neural network in a deep reinforcement learning (DRL) framework and realise an optimal policy through an inverse n-step temporal difference learning method. Moreover, we formulate a novel convolutional deep deterministic policy gradient (CDDPG) algorithm and present the convergence analysis through the Bellman contraction operator. One of the key benefits of the proposed approach is that it improves the performance of the RL agent by not only utilising information from a one-step transition but also extracting the hidden information from previous state sequences. The efficacy of the proposed scheme is experimentally validated on a rotary flexible link (RFL) system for tracking control and vibration suppression problems. The experimental validation of the proposed scheme on an RFL system highlights that the CDDPG can offer better tracking and vibration suppression features compared to those of the conventional DDPG and the state-of-the-art proximal policy optimisation (PPO) techniques.
Suggested Citation
V. Joshi Kumar & Vinodh Kumar Elumalai, 2025.
"A deep reinforcement learning control framework for a partially observable system: experimental validation on a rotary flexible link system,"
International Journal of Systems Science, Taylor & Francis Journals, vol. 56(14), pages 3332-3356, October.
Handle:
RePEc:taf:tsysxx:v:56:y:2025:i:14:p:3332-3356
DOI: 10.1080/00207721.2025.2468870
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:tsysxx:v:56:y:2025:i:14:p:3332-3356. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/TSYS20 .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.