IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i3p1595-1615.html
   My bibliography  Save this article

Data Pooling in Stochastic Optimization

Author

Listed:
  • Vishal Gupta

    (Data Science and Operations, USC Marshall School of Business, Los Angles, California 90089)

  • Nathan Kallus

    (School of Operations Research and Information Engineering and Cornell Tech, Cornell University, New York, New York 10044)

Abstract

Managing large-scale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests that one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel data-pooling algorithm called Shrunken-SAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly nonconvex, nonsmooth optimization problems such as vehicle-routing, economic lot-sizing, or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein’s phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that, as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when and why data pooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that data pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of data pooling using real data from a chain of retail drug stores in the context of inventory management.

Suggested Citation

  • Vishal Gupta & Nathan Kallus, 2022. "Data Pooling in Stochastic Optimization," Management Science, INFORMS, vol. 68(3), pages 1595-1615, March.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1595-1615
    DOI: 10.1287/mnsc.2020.3933
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2020.3933
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2020.3933?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. DeMiguel, Victor & Martin-Utrera, Alberto & Nogales, Francisco J., 2013. "Size matters: Optimal calibration of shrinkage estimators for portfolio selection," Journal of Banking & Finance, Elsevier, vol. 37(8), pages 3018-3034.
    2. Jorion, Philippe, 1986. "Bayes-Stein Estimation for Portfolio Analysis," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.
    3. Retsef Levi & Georgia Perakis & Joline Uichanco, 2015. "The Data-Driven Newsvendor Problem: New Bounds and Insights," Operations Research, INFORMS, vol. 63(6), pages 1294-1306, December.
    4. Deheuvels, Paul & Pfeifer, Dietmar, 1988. "Poisson approximations of multinomial distributions and point processes," Journal of Multivariate Analysis, Elsevier, vol. 25(1), pages 65-89, April.
    5. Vishal Gupta & Paat Rusmevichientong, 2021. "Small-Data, Large-Scale Linear Optimization with Uncertain Objectives," Management Science, INFORMS, vol. 67(1), pages 220-241, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qi Feng & J. George Shanthikumar, 2022. "Developing operations management data analytics," Production and Operations Management, Production and Operations Management Society, vol. 31(12), pages 4544-4557, December.
    2. Dursun, İpek & Akçay, Alp & van Houtum, Geert-Jan, 2022. "Data pooling for multiple single-component systems under population heterogeneity," International Journal of Production Economics, Elsevier, vol. 250(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Neffelli, 2018. "Target Matrix Estimators in Risk-Based Portfolios," Risks, MDPI, vol. 6(4), pages 1-20, November.
    2. Hwang, Inchang & Xu, Simon & In, Francis, 2018. "Naive versus optimal diversification: Tail risk and performance," European Journal of Operational Research, Elsevier, vol. 265(1), pages 372-388.
    3. Michele Costola & Bertrand Maillet & Zhining Yuan & Xiang Zhang, 2024. "Mean–variance efficient large portfolios: a simple machine learning heuristic technique based on the two-fund separation theorem," Annals of Operations Research, Springer, vol. 334(1), pages 133-155, March.
    4. Platanakis, Emmanouil & Sutcliffe, Charles & Ye, Xiaoxia, 2021. "Horses for courses: Mean-variance for asset allocation and 1/N for stock selection," European Journal of Operational Research, Elsevier, vol. 288(1), pages 302-317.
    5. Kircher, Felix & Rösch, Daniel, 2021. "A shrinkage approach for Sharpe ratio optimal portfolios with estimation risks," Journal of Banking & Finance, Elsevier, vol. 133(C).
    6. Ortiz, Roberto & Contreras, Mauricio & Mellado, Cristhian, 2023. "Regression, multicollinearity and Markowitz," Finance Research Letters, Elsevier, vol. 58(PC).
    7. John D. Lamb & Kai-Hong Tee, 2024. "Using stochastic frontier analysis instead of data envelopment analysis in modelling investment performance," Annals of Operations Research, Springer, vol. 332(1), pages 891-907, January.
    8. Chakrabarti, Deepayan, 2021. "Parameter-free robust optimization for the maximum-Sharpe portfolio problem," European Journal of Operational Research, Elsevier, vol. 293(1), pages 388-399.
    9. Sangwon Suh, 2018. "Portfolio Selection using New Factors based on Firm Characteristics," Journal of Economic Development, Chung-Ang Unviersity, Department of Economics, vol. 43(1), pages 77-99, March.
    10. Abadir, Karim M. & Distaso, Walter & Žikeš, Filip, 2014. "Design-free estimation of variance matrices," Journal of Econometrics, Elsevier, vol. 181(2), pages 165-180.
    11. Jung, Seung Hwan & Yang, Yunsi, 2023. "On the value of operational flexibility in the trailer shipment and assignment problem: Data-driven approaches and reinforcement learning," International Journal of Production Economics, Elsevier, vol. 264(C).
    12. Cabana Garceran del Vall, Elisa & Laniado Rodas, Henry & Lillo Rodríguez, Rosa Elvira, 2017. "Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators," DES - Working Papers. Statistics and Econometrics. WS 24613, Universidad Carlos III de Madrid. Departamento de Estadística.
    13. Janssen, Larissa & Claus, Thorsten & Sauer, Jürgen, 2016. "Literature review of deteriorating inventory models by key topics from 2012 to 2015," International Journal of Production Economics, Elsevier, vol. 182(C), pages 86-112.
    14. Bodnar, Olha & Bodnar, Taras & Niklasson, Vilhelm, 2024. "Constructing Bayesian tangency portfolios under short-selling restrictions," Finance Research Letters, Elsevier, vol. 62(PA).
    15. Gambacciani, Marco & Paolella, Marc S., 2017. "Robust normal mixtures for financial portfolio allocation," Econometrics and Statistics, Elsevier, vol. 3(C), pages 91-111.
    16. Ding, Wenliang & Shu, Lianjie & Gu, Xinhua, 2023. "A robust Glasso approach to portfolio selection in high dimensions," Journal of Empirical Finance, Elsevier, vol. 70(C), pages 22-37.
    17. Candelon, B. & Hurlin, C. & Tokpavi, S., 2012. "Sampling error and double shrinkage estimation of minimum variance portfolios," Journal of Empirical Finance, Elsevier, vol. 19(4), pages 511-527.
    18. Mishra, Anil V., 2016. "Foreign bias in Australian-domiciled mutual fund holdings," Pacific-Basin Finance Journal, Elsevier, vol. 39(C), pages 101-123.
    19. Andrew F. Siegel & Artemiza Woodgate, 2007. "Performance of Portfolios Optimized with Estimation Error," Management Science, INFORMS, vol. 53(6), pages 1005-1015, June.
    20. Afonso, António & Gomes, Pedro & Taamouti, Abderrahim, 2014. "Sovereign credit ratings, market volatility, and financial gains," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 20-33.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1595-1615. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.