IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i3p1595-1615.html
   My bibliography  Save this article

Data Pooling in Stochastic Optimization

Author

Listed:
  • Vishal Gupta

    (Data Science and Operations, USC Marshall School of Business, Los Angles, California 90089)

  • Nathan Kallus

    (School of Operations Research and Information Engineering and Cornell Tech, Cornell University, New York, New York 10044)

Abstract

Managing large-scale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests that one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel data-pooling algorithm called Shrunken-SAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly nonconvex, nonsmooth optimization problems such as vehicle-routing, economic lot-sizing, or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein’s phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that, as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when and why data pooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that data pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of data pooling using real data from a chain of retail drug stores in the context of inventory management.

Suggested Citation

  • Vishal Gupta & Nathan Kallus, 2022. "Data Pooling in Stochastic Optimization," Management Science, INFORMS, vol. 68(3), pages 1595-1615, March.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1595-1615
    DOI: 10.1287/mnsc.2020.3933
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2020.3933
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2020.3933?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. DeMiguel, Victor & Martin-Utrera, Alberto & Nogales, Francisco J., 2013. "Size matters: Optimal calibration of shrinkage estimators for portfolio selection," Journal of Banking & Finance, Elsevier, vol. 37(8), pages 3018-3034.
    2. Deheuvels, Paul & Pfeifer, Dietmar, 1988. "Poisson approximations of multinomial distributions and point processes," Journal of Multivariate Analysis, Elsevier, vol. 25(1), pages 65-89, April.
    3. Vishal Gupta & Paat Rusmevichientong, 2021. "Small-Data, Large-Scale Linear Optimization with Uncertain Objectives," Management Science, INFORMS, vol. 67(1), pages 220-241, January.
    4. Retsef Levi & Georgia Perakis & Joline Uichanco, 2015. "The Data-Driven Newsvendor Problem: New Bounds and Insights," Operations Research, INFORMS, vol. 63(6), pages 1294-1306, December.
    5. Jorion, Philippe, 1986. "Bayes-Stein Estimation for Portfolio Analysis," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dursun, İpek & Akçay, Alp & van Houtum, Geert-Jan, 2022. "Data pooling for multiple single-component systems under population heterogeneity," International Journal of Production Economics, Elsevier, vol. 250(C).
    2. Qi Feng & J. George Shanthikumar, 2022. "Developing operations management data analytics," Production and Operations Management, Production and Operations Management Society, vol. 31(12), pages 4544-4557, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John D. Lamb & Kai-Hong Tee, 2024. "Using stochastic frontier analysis instead of data envelopment analysis in modelling investment performance," Annals of Operations Research, Springer, vol. 332(1), pages 891-907, January.
    2. Hwang, Inchang & Xu, Simon & In, Francis, 2018. "Naive versus optimal diversification: Tail risk and performance," European Journal of Operational Research, Elsevier, vol. 265(1), pages 372-388.
    3. Kircher, Felix & Rösch, Daniel, 2021. "A shrinkage approach for Sharpe ratio optimal portfolios with estimation risks," Journal of Banking & Finance, Elsevier, vol. 133(C).
    4. Abadir, Karim M. & Distaso, Walter & Žikeš, Filip, 2014. "Design-free estimation of variance matrices," Journal of Econometrics, Elsevier, vol. 181(2), pages 165-180.
    5. Michele Costola & Bertrand Maillet & Zhining Yuan & Xiang Zhang, 2024. "Mean–variance efficient large portfolios: a simple machine learning heuristic technique based on the two-fund separation theorem," Annals of Operations Research, Springer, vol. 334(1), pages 133-155, March.
    6. Platanakis, Emmanouil & Sutcliffe, Charles & Ye, Xiaoxia, 2021. "Horses for courses: Mean-variance for asset allocation and 1/N for stock selection," European Journal of Operational Research, Elsevier, vol. 288(1), pages 302-317.
    7. Marco Neffelli, 2018. "Target Matrix Estimators in Risk-Based Portfolios," Risks, MDPI, vol. 6(4), pages 1-20, November.
    8. Ortiz, Roberto & Contreras, Mauricio & Mellado, Cristhian, 2023. "Regression, multicollinearity and Markowitz," Finance Research Letters, Elsevier, vol. 58(PC).
    9. Chakrabarti, Deepayan, 2021. "Parameter-free robust optimization for the maximum-Sharpe portfolio problem," European Journal of Operational Research, Elsevier, vol. 293(1), pages 388-399.
    10. Sangwon Suh, 2018. "Portfolio Selection using New Factors based on Firm Characteristics," Journal of Economic Development, Chung-Ang Unviersity, Department of Economics, vol. 43(1), pages 77-99, March.
    11. Candelon, B. & Hurlin, C. & Tokpavi, S., 2012. "Sampling error and double shrinkage estimation of minimum variance portfolios," Journal of Empirical Finance, Elsevier, vol. 19(4), pages 511-527.
    12. Serrano, Breno & Minner, Stefan & Schiffer, Maximilian & Vidal, Thibaut, 2024. "Bilevel optimization for feature selection in the data-driven newsvendor problem," European Journal of Operational Research, Elsevier, vol. 315(2), pages 703-714.
    13. Jacobs, Heiko & Müller, Sebastian & Weber, Martin, 2014. "How should individual investors diversify? An empirical evaluation of alternative asset allocation policies," Journal of Financial Markets, Elsevier, vol. 19(C), pages 62-85.
    14. Hsiao-Fen Hsiao & Jiang-Chuan Huang & Zheng-Wei Lin, 2020. "Portfolio construction using bootstrapping neural networks: evidence from global stock market," Review of Derivatives Research, Springer, vol. 23(3), pages 227-247, October.
    15. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    16. Kruopis, Julius & Čekanavičius, Vydas, 2014. "Compound Poisson approximations for symmetric vectors," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 30-42.
    17. Allen, D.E. & McAleer, M.J. & Powell, R.J. & Singh, A.K., 2015. "Down-side Risk Metrics as Portfolio Diversification Strategies across the GFC," Econometric Institute Research Papers EI2015-32, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    18. Tu, Jun & Zhou, Guofu, 2011. "Markowitz meets Talmud: A combination of sophisticated and naive diversification strategies," Journal of Financial Economics, Elsevier, vol. 99(1), pages 204-215, January.
    19. Cong Shi & Weidong Chen & Izak Duenyas, 2016. "Technical Note—Nonparametric Data-Driven Algorithms for Multiproduct Inventory Systems with Censored Demand," Operations Research, INFORMS, vol. 64(2), pages 362-370, April.
    20. Mainik, Georg & Mitov, Georgi & Rüschendorf, Ludger, 2015. "Portfolio optimization for heavy-tailed assets: Extreme Risk Index vs. Markowitz," Journal of Empirical Finance, Elsevier, vol. 32(C), pages 115-134.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1595-1615. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.