IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i8p5924-5957.html
   My bibliography  Save this article

Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making

Author

Listed:
  • Hao Zhang

    (Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada)

Abstract

Problems concerning dynamic learning and decision making are difficult to solve analytically. We study an infinite-horizon discrete-time model with a constant unknown state that may take two possible values. As a special partially observable Markov decision process (POMDP), this model unifies several types of learning-and-doing problems such as sequential hypothesis testing, dynamic pricing with demand learning, and multiarmed bandits. We adopt a relatively new solution framework from the POMDP literature based on the backward construction of the efficient frontier(s) of continuation-value vectors. This framework accommodates different optimality criteria simultaneously. In the infinite-horizon setting, with the aid of a set of signal quality indices, the extreme points on the efficient frontier can be linked through a set of difference equations and solved analytically. The solution carries structural properties analogous to those obtained under continuous-time models, and it provides a useful tool for making new discoveries through discrete-time models.

Suggested Citation

  • Hao Zhang, 2022. "Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making," Management Science, INFORMS, vol. 68(8), pages 5924-5957, August.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:8:p:5924-5957
    DOI: 10.1287/mnsc.2021.4194
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2021.4194
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2021.4194?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Godfrey Keller & Sven Rady, 1999. "Optimal Experimentation in a Changing Environment," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 66(3), pages 475-507.
    2. Philippe Aghion & Patrick Bolton & Christopher Harris & Bruno Jullien, 1991. "Optimal Learning by Experimentation," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 58(4), pages 621-654.
    3. Turgay Ayer & Oguzhan Alagoz & Natasha K. Stout, 2012. "OR Forum---A POMDP Approach to Personalize Mammography Screening Decisions," Operations Research, INFORMS, vol. 60(5), pages 1019-1034, October.
    4. J. Michael Harrison & Nur Sunar, 2015. "Investment Timing with Incomplete Information and Multiple Means of Learning," Operations Research, INFORMS, vol. 63(2), pages 442-457, April.
    5. Vivek F. Farias & Benjamin Van Roy, 2010. "Dynamic Pricing with a Prior on Market Response," Operations Research, INFORMS, vol. 58(1), pages 16-29, February.
    6. Saed Alizamir & Francis de Véricourt & Peng Sun, 2013. "Diagnostic Accuracy Under Congestion," Management Science, INFORMS, vol. 59(1), pages 157-171, December.
    7. Patrick Bolton & Christopher Harris, 1999. "Strategic Experimentation," Econometrica, Econometric Society, vol. 67(2), pages 349-374, March.
    8. Hao Zhang, 2010. "Partially Observable Markov Decision Processes: A Geometric Technique and Analysis," Operations Research, INFORMS, vol. 58(1), pages 214-228, February.
    9. Rothschild, Michael, 1974. "A two-armed bandit theory of market pricing," Journal of Economic Theory, Elsevier, vol. 9(2), pages 185-202, October.
    10. Giuseppe Moscarini & Lones Smith, 2001. "The Optimal Level of Experimentation," Econometrica, Econometric Society, vol. 69(6), pages 1629-1644, November.
    11. Richard D. Smallwood & Edward J. Sondik, 1973. "The Optimal Control of Partially Observable Markov Processes over a Finite Horizon," Operations Research, INFORMS, vol. 21(5), pages 1071-1088, October.
    12. Diana M. Negoescu & Kostas Bimpikis & Margaret L. Brandeau & Dan A. Iancu, 2018. "Dynamic Learning of Patient Response Types: An Application to Treating Chronic Diseases," Management Science, INFORMS, vol. 64(8), pages 3469-3488, August.
    13. Jingyu Zhang & Brian T. Denton & Hari Balasubramanian & Nilay D. Shah & Brant A. Inman, 2012. "Optimization of Prostate Biopsy Referral Decisions," Manufacturing & Service Operations Management, INFORMS, vol. 14(4), pages 529-547, October.
    14. Felipe Caro & Jérémie Gallien, 2007. "Dynamic Assortment with Demand Learning for Seasonal Consumer Goods," Management Science, INFORMS, vol. 53(2), pages 276-292, February.
    15. Victor F. Araman & René Caldentey, 2009. "Dynamic Pricing for Nonperishable Products with Demand Learning," Operations Research, INFORMS, vol. 57(5), pages 1169-1188, October.
    16. Dimitris Bertsimas & Adam J. Mersereau, 2007. "A Learning Approach for Interactive Marketing to a Customer Segment," Operations Research, INFORMS, vol. 55(6), pages 1120-1135, December.
    17. Edward J. Sondik, 1978. "The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs," Operations Research, INFORMS, vol. 26(2), pages 282-304, April.
    18. McLennan, Andrew, 1984. "Price dispersion and incomplete learning in the long run," Journal of Economic Dynamics and Control, Elsevier, vol. 7(3), pages 331-347, September.
    19. Josef Broder & Paat Rusmevichientong, 2012. "Dynamic Pricing Under a General Parametric Choice Model," Operations Research, INFORMS, vol. 60(4), pages 965-980, August.
    20. Omar Besbes & Assaf Zeevi, 2009. "Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms," Operations Research, INFORMS, vol. 57(6), pages 1407-1420, December.
    21. H. Dharma Kwon & Steven A. Lippman, 2011. "Acquisition of Project-Specific Assets with Bayesian Updating," Operations Research, INFORMS, vol. 59(5), pages 1119-1130, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arnoud V. den Boer & Bert Zwart, 2014. "Simultaneously Learning and Optimizing Using Controlled Variance Pricing," Management Science, INFORMS, vol. 60(3), pages 770-783, March.
    2. Philipp Afèche & Barış Ata, 2013. "Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics," Manufacturing & Service Operations Management, INFORMS, vol. 15(2), pages 292-304, May.
    3. J. Michael Harrison & Nur Sunar, 2015. "Investment Timing with Incomplete Information and Multiple Means of Learning," Operations Research, INFORMS, vol. 63(2), pages 442-457, April.
    4. Victor F. Araman & René A. Caldentey, 2022. "Diffusion Approximations for a Class of Sequential Experimentation Problems," Management Science, INFORMS, vol. 68(8), pages 5958-5979, August.
    5. J. Michael Harrison & N. Bora Keskin & Assaf Zeevi, 2012. "Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution," Management Science, INFORMS, vol. 58(3), pages 570-586, March.
    6. N. Bora Keskin & Assaf Zeevi, 2017. "Chasing Demand: Learning and Earning in a Changing Environment," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 277-307, May.
    7. Jue Wang, 2021. "Optimal Bayesian Demand Learning over Short Horizons," Production and Operations Management, Production and Operations Management Society, vol. 30(4), pages 1154-1177, April.
    8. Arnoud V. den Boer, 2014. "Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 863-888, August.
    9. Denis Sauré & Assaf Zeevi, 2013. "Optimal Dynamic Assortment Planning with Demand Learning," Manufacturing & Service Operations Management, INFORMS, vol. 15(3), pages 387-404, July.
    10. Xiao, Baichun & Yang, Wei, 2021. "A Bayesian learning model for estimating unknown demand parameter in revenue management," European Journal of Operational Research, Elsevier, vol. 293(1), pages 248-262.
    11. Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
    12. Qi (George) Chen & Stefanus Jasin & Izak Duenyas, 2021. "Technical Note—Joint Learning and Optimization of Multi-Product Pricing with Finite Resource Capacity and Unknown Demand Parameters," Operations Research, INFORMS, vol. 69(2), pages 560-573, March.
    13. Boxiao Chen & Xiuli Chao & Cong Shi, 2021. "Nonparametric Learning Algorithms for Joint Pricing and Inventory Control with Lost Sales and Censored Demand," Mathematics of Operations Research, INFORMS, vol. 46(2), pages 726-756, May.
    14. Athanassios N. Avramidis & Arnoud V. Boer, 2021. "Dynamic pricing with finite price sets: a non-parametric approach," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 94(1), pages 1-34, August.
    15. Bergemann, Dirk & Valimaki, Juuso, 1996. "Learning and Strategic Pricing," Econometrica, Econometric Society, vol. 64(5), pages 1125-1149, September.
    16. Heski Bar-Isaac, 2001. "Self-Confidence and Survival," FMG Discussion Papers dp395, Financial Markets Group.
    17. Huashuai Qu & Ilya O. Ryzhov & Michael C. Fu & Eric Bergerson & Megan Kurka & Ludek Kopacek, 2020. "Learning Demand Curves in B2B Pricing: A New Framework and Case Study," Production and Operations Management, Production and Operations Management Society, vol. 29(5), pages 1287-1306, May.
    18. N. Bora Keskin & John R. Birge, 2019. "Dynamic Selling Mechanisms for Product Differentiation and Learning," Operations Research, INFORMS, vol. 67(4), pages 1069-1089, July.
    19. Athanassios N. Avramidis, 2020. "A pricing problem with unknown arrival rate and price sensitivity," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 92(1), pages 77-106, August.
    20. Sebastian Sund & Lars H. Sendstad & Jacco J. J. Thijssen, 2022. "Kalman filter approach to real options with active learning," Computational Management Science, Springer, vol. 19(3), pages 457-490, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:8:p:5924-5957. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.