Markov Decision Problems Where Means Bound Variances

My bibliography Save this article

Markov Decision Problems Where Means Bound Variances

Author

Listed:

Alessandro Arlotto
(The Fuqua School of Business, Duke University, Durham, North Carolina, 27708)
Noah Gans
(Operations and Information Management Department, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, 19104)
J. Michael Steele
(Statistics Department, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, 19104)

Registered:

Abstract

We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of the optimal total reward can be bounded by a simple linear function of its expected value. The class is characterized by three natural properties: reward nonnegativity and boundedness , existence of a do-nothing action , and optimal action monotonicity . These properties are commonly present and typically easy to check. Implications of the class properties and of the variance bound are illustrated by examples of MDPs from operations research, operations management, financial engineering, and combinatorial optimization.

Suggested Citation

Alessandro Arlotto & Noah Gans & J. Michael Steele, 2014. "Markov Decision Problems Where Means Bound Variances," Operations Research, INFORMS, vol. 62(4), pages 864-875, August.

Handle: RePEc:inm:oropre:v:62:y:2014:i:4:p:864-875
DOI: 10.1287/opre.2014.1281

Download full text from publisher

References listed on IDEAS

Bruss, F. Thomas & Delbaen, Freddy, 2004. "A central limit theorem for the optimal selection process for monotone subsequences of maximum expected length," Stochastic Processes and their Applications, Elsevier, vol. 114(2), pages 287-311, December.
Ying Huang & L. C. M. Kallenberg, 1994. "On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs," Mathematics of Operations Research, INFORMS, vol. 19(2), pages 434-448, May.
Carri W. Chan & Vivek F. Farias, 2009. "Stochastic Depletion Problems: Effective Myopic Policies for a Class of Dynamic Optimization Problems," Mathematics of Operations Research, INFORMS, vol. 34(2), pages 333-350, May.
Mannor, Shie & Tsitsiklis, John N., 2013. "Algorithmic aspects of mean–variance optimization in Markov decision processes," European Journal of Operational Research, Elsevier, vol. 231(3), pages 645-653.
Kun-Jen Chung, 1994. "Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case," Operations Research, INFORMS, vol. 42(1), pages 184-188, February.
Kawai, Hajime, 1987. "A variance minimization problem for a Markov decision process," European Journal of Operational Research, Elsevier, vol. 31(1), pages 140-145, July.
David B. Brown & James E. Smith & Peng Sun, 2010. "Information Relaxations and Duality in Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 58(4-part-1), pages 785-801, August.
Melike Baykal-Gürsoy & Keith W. Ross, 1992. "Variability Sensitive Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 17(3), pages 558-571, August.
Kalyan Talluri & Garrett van Ryzin, 1998. "An Analysis of Bid-Price Controls for Network Revenue Management," Management Science, INFORMS, vol. 44(11-Part-1), pages 1577-1593, November.
Matthew J. Sobel, 1994. "Mean-Variance Tradeoffs in an Undiscounted MDP," Operations Research, INFORMS, vol. 42(1), pages 175-183, February.
Bruss, F. Thomas & Delbaen, Freddy, 2001. "Optimal rules for the sequential selection of monotone subsequences of maximum expected length," Stochastic Processes and their Applications, Elsevier, vol. 96(2), pages 313-342, December.
James, Barry & James, Kang & Qi, Yongcheng, 2008. "Limit theorems for correlated Bernoulli random variables," Statistics & Probability Letters, Elsevier, vol. 78(15), pages 2339-2345, October.
C. Barz & K. Waldmann, 2007. "Risk-sensitive capacity control in revenue management," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 65(3), pages 565-579, June.
Jerzy A. Filar & L. C. M. Kallenberg & Huey-Miin Lee, 1989. "Variance-Penalized Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 14(1), pages 147-161, February.
Gregory P. Prastacos, 1983. "Optimal Sequential Investment Decisions Under Conditions of Uncertainty," Management Science, INFORMS, vol. 29(1), pages 118-134, January.
Jason D. Papastavrou & Srikanth Rajagopalan & Anton J. Kleywegt, 1996. "The Dynamic and Stochastic Knapsack Problem with Deadlines," Management Science, INFORMS, vol. 42(12), pages 1706-1718, December.
C. Derman & G. J. Lieberman & S. M. Ross, 1975. "A Stochastic Sequential Allocation Model," Operations Research, INFORMS, vol. 23(6), pages 1120-1130, December.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Alessandro Arlotto & J. Michael Steele, 2018. "A Central Limit Theorem for Costs in Bulinskaya’s Inventory Management Problem When Deliveries Face Delays," Methodology and Computing in Applied Probability, Springer, vol. 20(3), pages 839-854, September.
Ilya O. Ryzhov & Martijn R. K. Mes & Warren B. Powell & Gerald van den Berg, 2019. "Bayesian Exploration for Approximate Dynamic Programming," Operations Research, INFORMS, vol. 67(1), pages 198-214, January.
Arlotto, Alessandro & Nguyen, Vinh V. & Steele, J. Michael, 2015. "Optimal online selection of a monotone subsequence: a central limit theorem," Stochastic Processes and their Applications, Elsevier, vol. 125(9), pages 3596-3622.
Alessandro Arlotto & J. Michael Steele, 2016. "A Central Limit Theorem for Temporally Nonhomogenous Markov Chains with Applications to Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 41(4), pages 1448-1468, November.
Jingnan Fan & Andrzej Ruszczynski, 2014. "Process-Based Risk Measures and Risk-Averse Control of Discrete-Time Systems," Papers 1411.2675, arXiv.org, revised Nov 2016.
Jingnan Fan & Andrzej Ruszczyński, 2018. "Risk measurement and risk-averse control of partially observable discrete-time Markov systems," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 88(2), pages 161-184, October.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Li Xia, 2020. "Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2808-2827, December.
Karel Sladký, 2005. "On mean reward variance in semi-Markov processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 62(3), pages 387-397, December.
Santiago R. Balseiro & David B. Brown, 2019. "Approximations to Stochastic Dynamic Programs via Information Relaxation Duality," Operations Research, INFORMS, vol. 67(2), pages 577-597, March.
Ma, Shuai & Ma, Xiaoteng & Xia, Li, 2023. "A unified algorithm framework for mean-variance optimization in discounted Markov decision processes," European Journal of Operational Research, Elsevier, vol. 311(3), pages 1057-1067.
Dmitry Krass & O. J. Vrieze, 2002. "Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(3), pages 545-566, August.
Karel Sladký, 2013. "Risk-Sensitive and Mean Variance Optimality in Markov Decision Processes," Czech Economic Review, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, vol. 7(3), pages 146-161, November.
Xuhan Tian & Junmin (Jim) Shi & Xiangtong Qi, 2022. "Stochastic Sequential Allocations for Creative Crowdsourcing," Production and Operations Management, Production and Operations Management Society, vol. 31(2), pages 697-714, February.
Alexander G. Nikolaev & Sheldon H. Jacobson, 2010. "Technical Note ---Stochastic Sequential Decision-Making with a Random Number of Jobs," Operations Research, INFORMS, vol. 58(4-part-1), pages 1023-1027, August.
Yuanzheng Ma & Tong Wang & Huan Zheng, 2023. "On fairness and efficiency in nonprofit operations: Dynamic resource allocations," Production and Operations Management, Production and Operations Management Society, vol. 32(6), pages 1778-1792, June.
Yuhang Ma & Paat Rusmevichientong & Mika Sumida & Huseyin Topaloglu, 2020. "An Approximation Algorithm for Network Revenue Management Under Nonstationary Arrivals," Operations Research, INFORMS, vol. 68(3), pages 834-855, May.
Anton J. Kleywegt & Jason D. Papastavrou, 2001. "The Dynamic and Stochastic Knapsack Problem with Random Sized Items," Operations Research, INFORMS, vol. 49(1), pages 26-41, February.
Jingnan Fan & Andrzej Ruszczyński, 2018. "Risk measurement and risk-averse control of partially observable discrete-time Markov systems," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 88(2), pages 161-184, October.
Gnedin, Alexander & Seksenbayev, Amirlan, 2021. "Diffusion approximations in the online increasing subsequence problem," Stochastic Processes and their Applications, Elsevier, vol. 139(C), pages 298-320.
Pak, K. & Dekker, R., 2004. "Cargo Revenue Management: Bid-Prices for a 0-1 Multi Knapsack Problem," ERIM Report Series Research in Management ERS-2004-055-LIS, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
Grace Y. Lin & Yingdong Lu & David D. Yao, 2008. "The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing," Operations Research, INFORMS, vol. 56(4), pages 945-957, August.
Jingnan Fan & Andrzej Ruszczynski, 2014. "Process-Based Risk Measures and Risk-Averse Control of Discrete-Time Systems," Papers 1411.2675, arXiv.org, revised Nov 2016.
Michael Jong Kim, 2016. "Robust Control of Partially Observable Failing Systems," Operations Research, INFORMS, vol. 64(4), pages 999-1014, August.
Chiang, David Ming-Huang & Wu, Andy Wei-Di, 2011. "Discrete-order admission ATP model with joint effect of margin and order size in a MTO environment," International Journal of Production Economics, Elsevier, vol. 133(2), pages 761-775, October.
Arlotto, Alessandro & Nguyen, Vinh V. & Steele, J. Michael, 2015. "Optimal online selection of a monotone subsequence: a central limit theorem," Stochastic Processes and their Applications, Elsevier, vol. 125(9), pages 3596-3622.
Huanan Zhang & Cong Shi & Chao Qin & Cheng Hua, 2016. "Stochastic regret minimization for revenue management problems with nonstationary demands," Naval Research Logistics (NRL), John Wiley & Sons, vol. 63(6), pages 433-448, September.

More about this item

Keywords

Markov decision problems; variance bounds; optimal total reward;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:62:y:2014:i:4:p:864-875. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Markov Decision Problems Where Means Bound Variances

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data