Author
Listed:
- Amine Bennouna
(Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)
- Dessislava Pachamanova
(Mathematics, Analytics, Science and Technology Division, Babson College, Wellesley, Massachusetts 02457)
- Georgia Perakis
(Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)
- Omar Skali Lami
(Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)
Abstract
This paper proposes a framework for learning the most concise Markov decision process (MDP) model of a continuous state-space dynamic system from observed transition data. This setting is encountered in numerous important applications, such as patient treatment, online advertising, recommender systems, and estimation of treatment effects in econometrics. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system’s observation space into regions constituting states of a finite MDP representing the system. We discuss the theoretically minimal MDP representation that preserves the values and, therefore, the optimal policy of the dynamic system—in a sense, the optimal discretization. We formally define the problem of learning such a concise representation from transition data without exploration. Learning such a representation allows for enhanced tractability and, importantly, provides interpretability. To solve this problem, we introduce an in-sample property on partitions of the observation space we name coherence , and we show that if the class of possible partitions is of finite Vapnik-Chervonenkis dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample probably approximately correct convergence guarantees. This insight motivates our minimal representation learning algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments in both deterministic and stochastic environments as well as with real data.
Suggested Citation
Amine Bennouna & Dessislava Pachamanova & Georgia Perakis & Omar Skali Lami, 2025.
"Learning the Minimal Representation of a Continuous State-Space Markov Decision Process from Transition Data,"
Management Science, INFORMS, vol. 71(6), pages 5162-5184, June.
Handle:
RePEc:inm:ormnsc:v:71:y:2025:i:6:p:5162-5184
DOI: 10.1287/mnsc.2022.01652
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:71:y:2025:i:6:p:5162-5184. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.