IDEAS home Printed from https://ideas.repec.org/a/bla/popmgt/v31y2022i9p3491-3504.html
   My bibliography  Save this article

Sublinear regret for learning POMDPs

Author

Listed:
  • Yi Xiong
  • Ningyuan Chen
  • Xuefeng Gao
  • Xiang Zhou

Abstract

We study the model‐based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infinite horizon. We propose a learning algorithm for this problem, building on spectral method‐of‐moments estimations for hidden Markov models, the belief error control in POMDPs and upper confidence bound methods for online learning. We establish a regret bound of O(T2/3logT)$O(T^{2/3}\sqrt {\log T})$ for the proposed learning algorithm where T is the learning horizon. This is, to the best of our knowledge, the first algorithm achieving sublinear regret with respect to our oracle for learning general POMDPs.

Suggested Citation

  • Yi Xiong & Ningyuan Chen & Xuefeng Gao & Xiang Zhou, 2022. "Sublinear regret for learning POMDPs," Production and Operations Management, Production and Operations Management Society, vol. 31(9), pages 3491-3504, September.
  • Handle: RePEc:bla:popmgt:v:31:y:2022:i:9:p:3491-3504
    DOI: 10.1111/poms.13778
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/poms.13778
    Download Restriction: no

    File URL: https://libkey.io/10.1111/poms.13778?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Weidong Chen & Cong Shi & Izak Duenyas, 2020. "Optimal Learning Algorithms for Stochastic Inventory Systems with Random Capacities," Production and Operations Management, Production and Operations Management Society, vol. 29(7), pages 1624-1649, July.
    2. Naci Saldi & Serdar Yüksel & Tamás Linder, 2017. "On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces," Mathematics of Operations Research, INFORMS, vol. 42(4), pages 945-978, November.
    3. Boxiao Chen & Xiuli Chao & Hyun-Soo Ahn, 2019. "Coordinating Pricing and Inventory Replenishment with Nonparametric Demand Learning," Operations Research, INFORMS, vol. 67(4), pages 1035-1052, July.
    4. Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
    5. K. Hinderer, 2005. "Lipschitz Continuity of Value Functions in Markovian Decision Processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 62(1), pages 3-22, September.
    6. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    7. Mila Nambiar & David Simchi‐Levi & He Wang, 2021. "Dynamic Inventory Allocation with Demand Learning for Seasonal Goods," Production and Operations Management, Production and Operations Management Society, vol. 30(3), pages 750-765, March.
    8. Huizhen Yu & Dimitri P. Bertsekas, 2008. "On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP," Mathematics of Operations Research, INFORMS, vol. 33(1), pages 1-11, February.
    9. N. Bora Keskin & Assaf Zeevi, 2017. "Chasing Demand: Learning and Earning in a Changing Environment," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 277-307, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Feng & J. George Shanthikumar, 2022. "Developing operations management data analytics," Production and Operations Management, Production and Operations Management Society, vol. 31(12), pages 4544-4557, December.
    2. Gel, Esma S. & Salman, F. Sibel, 2022. "Dynamic ordering decisions with approximate learning of supply yield uncertainty," International Journal of Production Economics, Elsevier, vol. 243(C).
    3. Lin An & Andrew A. Li & Benjamin Moseley & R. Ravi, 2023. "The Nonstationary Newsvendor with (and without) Predictions," Papers 2305.07993, arXiv.org, revised Oct 2023.
    4. Zikun Ye & Dennis J. Zhang & Heng Zhang & Renyu Zhang & Xin Chen & Zhiwei Xu, 2023. "Cold Start to Improve Market Thickness on Online Advertising Platforms: Data-Driven Algorithms and Field Experiments," Management Science, INFORMS, vol. 69(7), pages 3838-3860, July.
    5. N. Bora Keskin & Yuexing Li & Jing-Sheng Song, 2022. "Data-Driven Dynamic Pricing and Ordering with Perishable Inventory in a Changing Environment," Management Science, INFORMS, vol. 68(3), pages 1938-1958, March.
    6. Yining Wang & Xi Chen & Xiangyu Chang & Dongdong Ge, 2021. "Uncertainty Quantification for Demand Prediction in Contextual Dynamic Pricing," Production and Operations Management, Production and Operations Management Society, vol. 30(6), pages 1703-1717, June.
    7. Xiangyu Gao & Huanan Zhang, 2022. "An efficient learning framework for multiproduct inventory systems with customer choices," Production and Operations Management, Production and Operations Management Society, vol. 31(6), pages 2492-2516, June.
    8. Hao Yuan & Qi Luo & Cong Shi, 2021. "Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs," Management Science, INFORMS, vol. 67(10), pages 6089-6115, October.
    9. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    10. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    11. Arman Oganisian & Nandita Mitra & Jason A. Roy, 2021. "A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation," Biometrics, The International Biometric Society, vol. 77(1), pages 125-135, March.
    12. Xiao, Baichun & Yang, Wei, 2021. "A Bayesian learning model for estimating unknown demand parameter in revenue management," European Journal of Operational Research, Elsevier, vol. 293(1), pages 248-262.
    13. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    14. Rufo, M.J. & Pérez, C.J. & Martín, J., 2009. "Local parametric sensitivity for mixture models of lifetime distributions," Reliability Engineering and System Safety, Elsevier, vol. 94(7), pages 1238-1244.
    15. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    16. David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
    17. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    18. Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
    19. Jayakumar Subramanian & Amit Sinha & Aditya Mahajan, 2023. "Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games," Dynamic Games and Applications, Springer, vol. 13(1), pages 56-88, March.
    20. Thomas Loots & Arnoud V. den Boer, 2023. "Data‐driven collusion and competition in a pricing duopoly with multinomial logit demand," Production and Operations Management, Production and Operations Management Society, vol. 32(4), pages 1169-1186, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:popmgt:v:31:y:2022:i:9:p:3491-3504. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1937-5956 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.