Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making

My bibliography Save this article

Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making

Author

Listed:

Hao Zhang
(Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada)

Registered:

Abstract

Problems concerning dynamic learning and decision making are difficult to solve analytically. We study an infinite-horizon discrete-time model with a constant unknown state that may take two possible values. As a special partially observable Markov decision process (POMDP), this model unifies several types of learning-and-doing problems such as sequential hypothesis testing, dynamic pricing with demand learning, and multiarmed bandits. We adopt a relatively new solution framework from the POMDP literature based on the backward construction of the efficient frontier(s) of continuation-value vectors. This framework accommodates different optimality criteria simultaneously. In the infinite-horizon setting, with the aid of a set of signal quality indices, the extreme points on the efficient frontier can be linked through a set of difference equations and solved analytically. The solution carries structural properties analogous to those obtained under continuous-time models, and it provides a useful tool for making new discoveries through discrete-time models.

Suggested Citation

Hao Zhang, 2022. "Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making," Management Science, INFORMS, vol. 68(8), pages 5924-5957, August.

Handle: RePEc:inm:ormnsc:v:68:y:2022:i:8:p:5924-5957
DOI: 10.1287/mnsc.2021.4194

Download full text from publisher

References listed on IDEAS

Godfrey Keller & Sven Rady, 1999. "Optimal Experimentation in a Changing Environment," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 66(3), pages 475-507.
- Godfrey Keller & Sven Rady, 1997. "Optimal Experimentation in a Changing Environment," STICERD - Theoretical Economics Paper Series 333, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
- Godfrey Keller & Sven Rady, 1998. "Optimal Experimentation in a Changing Environment," Game Theory and Information 9801001, University Library of Munich, Germany.
Philippe Aghion & Patrick Bolton & Christopher Harris & Bruno Jullien, 1991. "Optimal Learning by Experimentation," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 58(4), pages 621-654.
- Aghion, P. & Bolton, P. & Harris, C. & Jullien, B., 1990. "Optimal Learning By Experimentation," DELTA Working Papers 90-10, DELTA (Ecole normale supérieure).
- Aghion Philippe & Bolton, Patrick & Harris Christopher & Jullien Bruno, 1991. "Optimal learning by experimentation," CEPREMAP Working Papers (Couverture Orange) 9104, CEPREMAP.
Turgay Ayer & Oguzhan Alagoz & Natasha K. Stout, 2012. "OR Forum---A POMDP Approach to Personalize Mammography Screening Decisions," Operations Research, INFORMS, vol. 60(5), pages 1019-1034, October.
J. Michael Harrison & Nur Sunar, 2015. "Investment Timing with Incomplete Information and Multiple Means of Learning," Operations Research, INFORMS, vol. 63(2), pages 442-457, April.
Vivek F. Farias & Benjamin Van Roy, 2010. "Dynamic Pricing with a Prior on Market Response," Operations Research, INFORMS, vol. 58(1), pages 16-29, February.
Saed Alizamir & Francis de Véricourt & Peng Sun, 2013. "Diagnostic Accuracy Under Congestion," Management Science, INFORMS, vol. 59(1), pages 157-171, December.
Patrick Bolton & Christopher Harris, 1999. "Strategic Experimentation," Econometrica, Econometric Society, vol. 67(2), pages 349-374, March.
Hao Zhang, 2010. "Partially Observable Markov Decision Processes: A Geometric Technique and Analysis," Operations Research, INFORMS, vol. 58(1), pages 214-228, February.
Rothschild, Michael, 1974. "A two-armed bandit theory of market pricing," Journal of Economic Theory, Elsevier, vol. 9(2), pages 185-202, October.
Giuseppe Moscarini & Lones Smith, 2001. "The Optimal Level of Experimentation," Econometrica, Econometric Society, vol. 69(6), pages 1629-1644, November.
Richard D. Smallwood & Edward J. Sondik, 1973. "The Optimal Control of Partially Observable Markov Processes over a Finite Horizon," Operations Research, INFORMS, vol. 21(5), pages 1071-1088, October.
Diana M. Negoescu & Kostas Bimpikis & Margaret L. Brandeau & Dan A. Iancu, 2018. "Dynamic Learning of Patient Response Types: An Application to Treating Chronic Diseases," Management Science, INFORMS, vol. 64(8), pages 3469-3488, August.
Jingyu Zhang & Brian T. Denton & Hari Balasubramanian & Nilay D. Shah & Brant A. Inman, 2012. "Optimization of Prostate Biopsy Referral Decisions," Manufacturing & Service Operations Management, INFORMS, vol. 14(4), pages 529-547, October.
Felipe Caro & Jérémie Gallien, 2007. "Dynamic Assortment with Demand Learning for Seasonal Consumer Goods," Management Science, INFORMS, vol. 53(2), pages 276-292, February.
Victor F. Araman & René Caldentey, 2009. "Dynamic Pricing for Nonperishable Products with Demand Learning," Operations Research, INFORMS, vol. 57(5), pages 1169-1188, October.
Dimitris Bertsimas & Adam J. Mersereau, 2007. "A Learning Approach for Interactive Marketing to a Customer Segment," Operations Research, INFORMS, vol. 55(6), pages 1120-1135, December.
Edward J. Sondik, 1978. "The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs," Operations Research, INFORMS, vol. 26(2), pages 282-304, April.
McLennan, Andrew, 1984. "Price dispersion and incomplete learning in the long run," Journal of Economic Dynamics and Control, Elsevier, vol. 7(3), pages 331-347, September.
Josef Broder & Paat Rusmevichientong, 2012. "Dynamic Pricing Under a General Parametric Choice Model," Operations Research, INFORMS, vol. 60(4), pages 965-980, August.
Omar Besbes & Assaf Zeevi, 2009. "Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms," Operations Research, INFORMS, vol. 57(6), pages 1407-1420, December.
H. Dharma Kwon & Steven A. Lippman, 2011. "Acquisition of Project-Specific Assets with Bayesian Updating," Operations Research, INFORMS, vol. 59(5), pages 1119-1130, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Arnoud V. den Boer & Bert Zwart, 2014. "Simultaneously Learning and Optimizing Using Controlled Variance Pricing," Management Science, INFORMS, vol. 60(3), pages 770-783, March.
Philipp Afèche & Barış Ata, 2013. "Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics," Manufacturing & Service Operations Management, INFORMS, vol. 15(2), pages 292-304, May.
J. Michael Harrison & Nur Sunar, 2015. "Investment Timing with Incomplete Information and Multiple Means of Learning," Operations Research, INFORMS, vol. 63(2), pages 442-457, April.
Victor F. Araman & René A. Caldentey, 2022. "Diffusion Approximations for a Class of Sequential Experimentation Problems," Management Science, INFORMS, vol. 68(8), pages 5958-5979, August.
J. Michael Harrison & N. Bora Keskin & Assaf Zeevi, 2012. "Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution," Management Science, INFORMS, vol. 58(3), pages 570-586, March.
N. Bora Keskin & Assaf Zeevi, 2017. "Chasing Demand: Learning and Earning in a Changing Environment," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 277-307, May.
Jue Wang, 2021. "Optimal Bayesian Demand Learning over Short Horizons," Production and Operations Management, Production and Operations Management Society, vol. 30(4), pages 1154-1177, April.
Arnoud V. den Boer, 2014. "Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 863-888, August.
Denis Sauré & Assaf Zeevi, 2013. "Optimal Dynamic Assortment Planning with Demand Learning," Manufacturing & Service Operations Management, INFORMS, vol. 15(3), pages 387-404, July.
Xiao, Baichun & Yang, Wei, 2021. "A Bayesian learning model for estimating unknown demand parameter in revenue management," European Journal of Operational Research, Elsevier, vol. 293(1), pages 248-262.
Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
Qi (George) Chen & Stefanus Jasin & Izak Duenyas, 2021. "Technical Note—Joint Learning and Optimization of Multi-Product Pricing with Finite Resource Capacity and Unknown Demand Parameters," Operations Research, INFORMS, vol. 69(2), pages 560-573, March.
Boxiao Chen & Xiuli Chao & Cong Shi, 2021. "Nonparametric Learning Algorithms for Joint Pricing and Inventory Control with Lost Sales and Censored Demand," Mathematics of Operations Research, INFORMS, vol. 46(2), pages 726-756, May.
Athanassios N. Avramidis & Arnoud V. Boer, 2021. "Dynamic pricing with finite price sets: a non-parametric approach," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 94(1), pages 1-34, August.
Bergemann, Dirk & Valimaki, Juuso, 1996. "Learning and Strategic Pricing," Econometrica, Econometric Society, vol. 64(5), pages 1125-1149, September.
- Dirk Bergemann & Juuso Valimaki, 1996. "Learning and Strategic Pricing," Cowles Foundation Discussion Papers 1113, Cowles Foundation for Research in Economics, Yale University.
Heski Bar-Isaac, 2001. "Self-Confidence and Survival," FMG Discussion Papers dp395, Financial Markets Group.
- Heski Bar-Isaac, 2001. "Self-Confidence and Survival," STICERD - Theoretical Economics Paper Series 428, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
- Bar-Isaac, Heski, 2001. "Self-confidence and survival," LSE Research Online Documents on Economics 19329, London School of Economics and Political Science, LSE Library.
Huashuai Qu & Ilya O. Ryzhov & Michael C. Fu & Eric Bergerson & Megan Kurka & Ludek Kopacek, 2020. "Learning Demand Curves in B2B Pricing: A New Framework and Case Study," Production and Operations Management, Production and Operations Management Society, vol. 29(5), pages 1287-1306, May.
N. Bora Keskin & John R. Birge, 2019. "Dynamic Selling Mechanisms for Product Differentiation and Learning," Operations Research, INFORMS, vol. 67(4), pages 1069-1089, July.
Athanassios N. Avramidis, 2020. "A pricing problem with unknown arrival rate and price sensitivity," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 92(1), pages 77-106, August.
Sebastian Sund & Lars H. Sendstad & Jacco J. J. Thijssen, 2022. "Kalman filter approach to real options with active learning," Computational Management Science, Springer, vol. 19(3), pages 457-490, July.

More about this item

Keywords

learning and doing; sequential hypothesis testing; dynamic pricing with demand learning; multiarmed bandits; partially observable Markov decision processes;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:8:p:5924-5957. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data