IDEAS home Printed from https://ideas.repec.org/a/inm/ormoor/v45y2020i3p966-992.html
   My bibliography  Save this article

Variance Regularization in Sequential Bayesian Optimization

Author

Listed:
  • Michael Jong Kim

    (Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada)

Abstract

Sequential Bayesian optimization constitutes an important and broad class of problems where model parameters are not known a priori but need to be learned over time using Bayesian updating. It is known that the solution to these problems can in principle be obtained by solving the Bayesian dynamic programming (BDP) equation. Although the BDP equation can be solved in certain special cases (for example, when posteriors have low-dimensional representations), solving this equation in general is computationally intractable and remains an open problem. A second unresolved issue with the BDP equation lies in its (rather generic) interpretation. Beyond the standard narrative of balancing immediate versus future costs—an interpretation common to all dynamic programs with or without learning—the BDP equation does not provide much insight into the underlying mechanism by which sequential Bayesian optimization trades off between learning (exploration) and optimization (exploitation), the distinguishing feature of this problem class. The goal of this paper is to develop good approximations (with error bounds) to the BDP equation that help address the issues of computation and interpretation. To this end, we show how the BDP equation can be represented as a tractable single-stage optimization problem that trades off between a myopic term and a “variance regularization” term that measures the total solution variability over the remaining planning horizon. Intuitively, the myopic term can be regarded as a pure exploitation objective that ignores the impact of future learning, whereas the variance regularization term captures a pure exploration objective that only puts value on solutions that resolve statistical uncertainty. We develop quantitative error bounds for this representation and prove that the error tends to zero like o(n -1 ) almost surely in the number of stages n , which as a corollary, establishes strong consistency of the approximate solution.

Suggested Citation

  • Michael Jong Kim, 2020. "Variance Regularization in Sequential Bayesian Optimization," Mathematics of Operations Research, INFORMS, vol. 45(3), pages 966-992, August.
  • Handle: RePEc:inm:ormoor:v:45:y:2020:i:3:p:966-992
    DOI: 10.1287/moor.2019.1019
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/moor.2019.1019
    Download Restriction: no

    File URL: https://libkey.io/10.1287/moor.2019.1019?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. N. Bora Keskin & John R. Birge, 2019. "Dynamic Selling Mechanisms for Product Differentiation and Learning," Operations Research, INFORMS, vol. 67(4), pages 1069-1089, July.
    2. Brezzi, Monica & Lai, Tze Leung, 2002. "Optimal learning and experimentation in bandit problems," Journal of Economic Dynamics and Control, Elsevier, vol. 27(1), pages 87-108, November.
    3. Michael Jong Kim & Viliam Makis, 2013. "Joint Optimization of Sampling and Control of Partially Observable Failing Systems," Operations Research, INFORMS, vol. 61(3), pages 777-790, June.
    4. Keppo, Jussi & Moscarini, Giuseppe & Smith, Lones, 2008. "The demand for information: More heat than light," Journal of Economic Theory, Elsevier, vol. 138(1), pages 21-50, January.
    5. Dai, Min & Huang, Shan & Keppo, Jussi, 2019. "Opaque bank assets and optimal equity capital," Journal of Economic Dynamics and Control, Elsevier, vol. 100(C), pages 369-394.
    6. Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
    7. Gah-Yi Ban & Noureddine El Karoui & Andrew E. B. Lim, 2018. "Machine Learning and Portfolio Optimization," Management Science, INFORMS, vol. 64(3), pages 1136-1154, March.
    8. Henry Lam, 2016. "Robust Sensitivity Analysis for Stochastic Systems," Mathematics of Operations Research, INFORMS, vol. 41(4), pages 1248-1275, November.
    9. J. Michael Harrison & N. Bora Keskin & Assaf Zeevi, 2012. "Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution," Management Science, INFORMS, vol. 58(3), pages 570-586, March.
    10. Savaş Dayanik & Ülkü Gürler, 2002. "An Adaptive Bayesian Replacement Policy with Minimal Repair," Operations Research, INFORMS, vol. 50(3), pages 552-558, June.
    11. Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.
    12. Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
    13. Jussi Keppo & Lones Smith & Dmitry Davydov, 2008. "Optimal Electoral Timing: Exercise Wisely and You May Live Longer -super-1," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 75(2), pages 597-628.
    14. Jun-Ya Gotoh & Keita Shinozaki & Akiko Takeda, 2013. "Robust portfolio techniques for mitigating the fragility of CVaR minimization and generalization to coherent risk measures," Quantitative Finance, Taylor & Francis Journals, vol. 13(10), pages 1621-1635, October.
    15. Aditya Jain & Nils Rudi & Tong Wang, 2015. "Demand Estimation and Ordering Under Censoring: Stock-Out Timing Is (Almost) All You Need," Operations Research, INFORMS, vol. 63(1), pages 134-150, February.
    16. Michael Jong Kim, 2016. "Robust Control of Partially Observable Failing Systems," Operations Research, INFORMS, vol. 64(4), pages 999-1014, August.
    17. Alain Bensoussan & Jussi Keppo & Suresh P. Sethi, 2009. "Optimal Consumption And Portfolio Decisions With Partially Observed Real Prices," Mathematical Finance, Wiley Blackwell, vol. 19(2), pages 215-236, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiao, Baichun & Yang, Wei, 2021. "A Bayesian learning model for estimating unknown demand parameter in revenue management," European Journal of Operational Research, Elsevier, vol. 293(1), pages 248-262.
    2. Victor F. Araman & René A. Caldentey, 2022. "Diffusion Approximations for a Class of Sequential Experimentation Problems," Management Science, INFORMS, vol. 68(8), pages 5958-5979, August.
    3. Vishal Gupta, 2019. "Near-Optimal Bayesian Ambiguity Sets for Distributionally Robust Optimization," Management Science, INFORMS, vol. 65(9), pages 4242-4260, September.
    4. Ying Zhong & L. Jeff Hong & Guangwu Liu, 2021. "Earning and Learning with Varying Cost," Production and Operations Management, Production and Operations Management Society, vol. 30(8), pages 2379-2394, August.
    5. Ilya O. Ryzhov & Martijn R. K. Mes & Warren B. Powell & Gerald van den Berg, 2019. "Bayesian Exploration for Approximate Dynamic Programming," Operations Research, INFORMS, vol. 67(1), pages 198-214, January.
    6. Ilya O. Ryzhov, 2016. "On the Convergence Rates of Expected Improvement Methods," Operations Research, INFORMS, vol. 64(6), pages 1515-1528, December.
    7. Michael Jong Kim, 2016. "Robust Control of Partially Observable Failing Systems," Operations Research, INFORMS, vol. 64(4), pages 999-1014, August.
    8. Zhu, Zhicheng & Xiang, Yisha & Zhao, Ming & Shi, Yue, 2023. "Data-driven remanufacturing planning with parameter uncertainty," European Journal of Operational Research, Elsevier, vol. 309(1), pages 102-116.
    9. Philipp Afèche & Barış Ata, 2013. "Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics," Manufacturing & Service Operations Management, INFORMS, vol. 15(2), pages 292-304, May.
    10. Dai, Min & Huang, Shan & Keppo, Jussi, 2019. "Opaque bank assets and optimal equity capital," Journal of Economic Dynamics and Control, Elsevier, vol. 100(C), pages 369-394.
    11. Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
    12. Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
    13. Abid, Hofa, 2021. "Economic Growth and Business to-Business Marketing," OSF Preprints btqsc, Center for Open Science.
    14. Mamabolo R. M. & Beichelt F. E., 2004. "Maintenance Policies with Minimal Repair," Stochastics and Quality Control, De Gruyter, vol. 19(2), pages 143-166, January.
    15. Jun-ya Gotoh & Michael Jong Kim & Andrew E. B. Lim, 2020. "Worst-case sensitivity," Papers 2010.10794, arXiv.org.
    16. Andrew Papanicolaou, 2018. "Backward SDEs for Control with Partial Information," Papers 1807.08222, arXiv.org.
    17. Alessandra Fogli & Laura Veldkamp, 2007. "Nature or nurture? learning and female labor force dynamics," Staff Report 386, Federal Reserve Bank of Minneapolis.
    18. William L. Cooper & Tito Homem-de-Mello & Anton J. Kleywegt, 2015. "Learning and Pricing with Models That Do Not Explicitly Incorporate Competition," Operations Research, INFORMS, vol. 63(1), pages 86-103, February.
    19. Azizi, Fariba & Salari, Nooshin, 2023. "A novel condition-based maintenance framework for parallel manufacturing systems based on bivariate birth/birth–death processes," Reliability Engineering and System Safety, Elsevier, vol. 229(C).
    20. N. Bora Keskin & Assaf Zeevi, 2014. "Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies," Operations Research, INFORMS, vol. 62(5), pages 1142-1167, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:45:y:2020:i:3:p:966-992. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.