IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v67y2021i8p4756-4771.html
   My bibliography  Save this article

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

Author

Listed:
  • Rong Jin

    (Alibaba Group US, San Mateo, California 94402)

  • David Simchi-Levi

    (Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)

  • Li Wang

    (Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)

  • Xinshang Wang

    (Alibaba Group US, San Mateo, California 94402; Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200240, China)

  • Sen Yang

    (Alibaba Group US, San Mateo, California 94402)

Abstract

The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d . The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d . We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T -independent part and a T -dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d . Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%.

Suggested Citation

  • Rong Jin & David Simchi-Levi & Li Wang & Xinshang Wang & Sen Yang, 2021. "Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses," Management Science, INFORMS, vol. 67(8), pages 4756-4771, August.
  • Handle: RePEc:inm:ormnsc:v:67:y:2021:i:8:p:4756-4771
    DOI: 10.1287/mnsc.2020.3773
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2020.3773
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2020.3773?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hamsa Bastani & Mohsen Bayati, 2020. "Online Decision Making with High-Dimensional Covariates," Operations Research, INFORMS, vol. 68(1), pages 276-294, January.
    2. Jean-Yves Audibert & Sébastien Bubeck & Gábor Lugosi, 2014. "Regret in Online Combinatorial Optimization," Mathematics of Operations Research, INFORMS, vol. 39(1), pages 31-45, February.
    3. Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
    4. Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
    2. Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
    3. Yining Wang & Boxiao Chen & David Simchi-Levi, 2021. "Multimodal Dynamic Pricing," Management Science, INFORMS, vol. 67(10), pages 6136-6152, October.
    4. Masahiro Kato & Shinji Ito, 2023. "Best-of-Both-Worlds Linear Contextual Bandits," Papers 2312.16489, arXiv.org.
    5. Ying Zhong & L. Jeff Hong & Guangwu Liu, 2021. "Earning and Learning with Varying Cost," Production and Operations Management, Production and Operations Management Society, vol. 30(8), pages 2379-2394, August.
    6. Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
    7. Bin Han & Ilya O. Ryzhov & Boris Defourny, 2016. "Optimal Learning in Linear Regression with Combinatorial Feature Selection," INFORMS Journal on Computing, INFORMS, vol. 28(4), pages 721-735, November.
    8. Daniel Russo & Benjamin Van Roy, 2018. "Learning to Optimize via Information-Directed Sampling," Operations Research, INFORMS, vol. 66(1), pages 230-252, January.
    9. Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
    10. Wang Chi Cheung & David Simchi-Levi & Ruihao Zhu, 2022. "Hedging the Drift: Learning to Optimize Under Nonstationarity," Management Science, INFORMS, vol. 68(3), pages 1696-1713, March.
    11. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    12. Mark Egan & Tomas Philipson, 2016. "Health Care Adherence and Personalized Medicine," Working Papers 2016-H01, Becker Friedman Institute for Research In Economics.
    13. Rad Niazadeh & Negin Golrezaei & Joshua Wang & Fransisca Susan & Ashwinkumar Badanidiyuru, 2023. "Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization," Management Science, INFORMS, vol. 69(7), pages 3797-3817, July.
    14. Xiangyu Gao & Stefanus Jasin & Sajjad Najafi & Huanan Zhang, 2022. "Joint Learning and Optimization for Multi-Product Pricing (and Ranking) Under a General Cascade Click Model," Management Science, INFORMS, vol. 68(10), pages 7362-7382, October.
    15. Agrawal, Priyank & Tulabandhula, Theja & Avadhanula, Vashist, 2023. "A tractable online learning algorithm for the multinomial logit contextual bandit," European Journal of Operational Research, Elsevier, vol. 310(2), pages 737-750.
    16. Mengying Zhu & Xiaolin Zheng & Yan Wang & Yuyuan Li & Qianqiao Liang, 2019. "Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling," Papers 1911.05309, arXiv.org, revised Nov 2019.
    17. Haihui Shen & L. Jeff Hong & Xiaowei Zhang, 2021. "Ranking and Selection with Covariates for Personalized Decision Making," INFORMS Journal on Computing, INFORMS, vol. 33(4), pages 1500-1519, October.
    18. Mark Egan & Tomas J. Philipson, 2014. "Health Care Adherence and Personalized Medicine," NBER Working Papers 20330, National Bureau of Economic Research, Inc.
    19. T. Law & J. Shawe-Taylor, 2017. "Practical Bayesian support vector regression for financial time series prediction and market condition change detection," Quantitative Finance, Taylor & Francis Journals, vol. 17(9), pages 1403-1416, September.
    20. Kimia Keshanian & Daniel Zantedeschi & Kaushik Dutta, 2022. "Features Selection as a Nash-Bargaining Solution: Applications in Online Advertising and Information Systems," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2485-2501, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:67:y:2021:i:8:p:4756-4771. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.