A tractable online learning algorithm for the multinomial logit contextual bandit

My bibliography Save this article

A tractable online learning algorithm for the multinomial logit contextual bandit

Author

Listed:

Agrawal, Priyank
Tulabandhula, Theja
Avadhanula, Vashist

Registered:

Abstract

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision makers problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon T. Though this problem has recently attracted considerable attention, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, current algorithms for this problem have regret bounded by O(κdT), where κ is a problem-dependent constant that may have an exponential dependency on the number of attributes, d. In this paper, we propose an optimistic algorithm and show that the regret is bounded by O(dT+κ), significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favorable regret guarantee. We also demonstrate that our algorithm has robust performance for varying κ values through numerical experiments.

Suggested Citation

Agrawal, Priyank & Tulabandhula, Theja & Avadhanula, Vashist, 2023. "A tractable online learning algorithm for the multinomial logit contextual bandit," European Journal of Operational Research, Elsevier, vol. 310(2), pages 737-750.

Handle: RePEc:eee:ejores:v:310:y:2023:i:2:p:737-750
DOI: 10.1016/j.ejor.2023.02.036

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Wang, Xiaolin & Zhao, Xiujie & Liu, Bin, 2020. "Design and pricing of extended warranty menus based on the multinomial logit choice model," European Journal of Operational Research, Elsevier, vol. 287(1), pages 237-250.
Flores, Alvaro & Berbeglia, Gerardo & Van Hentenryck, Pascal, 2019. "Assortment optimization under the Sequential Multinomial Logit Model," European Journal of Operational Research, Elsevier, vol. 273(3), pages 1052-1064.
Alfandari, Laurent & Hassanzadeh, Alborz & Ljubić, Ivana, 2021. "An exact method for assortment optimization under the nested logit model," European Journal of Operational Research, Elsevier, vol. 291(3), pages 830-845.
Grant, James A. & Szechtman, Roberto, 2021. "Filtered poisson process bandit on a continuum," European Journal of Operational Research, Elsevier, vol. 295(2), pages 575-586.
Denis Sauré & Assaf Zeevi, 2013. "Optimal Dynamic Assortment Planning with Demand Learning," Manufacturing & Service Operations Management, INFORMS, vol. 15(3), pages 387-404, July.
Xu, Jianyu & Chen, Lujie & Tang, Ou, 2021. "An online algorithm for the risk-aware restless bandit," European Journal of Operational Research, Elsevier, vol. 290(2), pages 622-639.
Laurent Alfandari & Alborz Hassanzadeh & Ivana Ljubić, 2021. "An Exact Method for Assortment Optimization under the Nested Logit Model," Working Papers hal-02463159, HAL.
A. Gürhan Kök & Marshall L. Fisher, 2007. "Demand Estimation and Assortment Optimization Under Substitution: Methodology and Application," Operations Research, INFORMS, vol. 55(6), pages 1001-1021, December.
Paat Rusmevichientong & Zuo-Jun Max Shen & David B. Shmoys, 2010. "Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint," Operations Research, INFORMS, vol. 58(6), pages 1666-1680, December.
Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
Shipra Agrawal & Vashist Avadhanula & Vineet Goyal & Assaf Zeevi, 2019. "MNL-Bandit: A Dynamic Learning Approach to Assortment Selection," Operations Research, INFORMS, vol. 67(5), pages 1453-1485, September.
Timonina-Farkas, Anna & Katsifou, Argyro & Seifert, Ralf W., 2020. "Product assortment and space allocation strategies to attract loyal and non-loyal customers," European Journal of Operational Research, Elsevier, vol. 285(3), pages 1058-1076.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Mehrani, Saharnaz & Sefair, Jorge A., 2022. "Robust assortment optimization under sequential product unavailability," European Journal of Operational Research, Elsevier, vol. 303(3), pages 1027-1043.
Arhami, Omid & Aslani, Shirin & Talebian, Masoud, 2024. "Dynamic assortment planning and capacity allocation with logit substitution," Journal of Retailing and Consumer Services, Elsevier, vol. 76(C).
Zhang, Le & Azadeh, Shadi Sharif & Jiang, Hai, 2025. "Exact and heuristic algorithms for cardinality-constrained assortment optimization problem under the cross-nested logit model," European Journal of Operational Research, Elsevier, vol. 324(1), pages 183-199.
Mou, Shandong & Robb, David J. & DeHoratius, Nicole, 2018. "Retail store operations: Literature review and research directions," European Journal of Operational Research, Elsevier, vol. 265(2), pages 399-422.
Shipra Agrawal & Vashist Avadhanula & Vineet Goyal & Assaf Zeevi, 2019. "MNL-Bandit: A Dynamic Learning Approach to Assortment Selection," Operations Research, INFORMS, vol. 67(5), pages 1453-1485, September.
Fernando Bernstein & A. Gürhan Kök & Lei Xie, 2015. "Dynamic Assortment Customization with Limited Inventories," Manufacturing & Service Operations Management, INFORMS, vol. 17(4), pages 538-553, October.
Yining Wang & Xi Chen & Xiangyu Chang & Dongdong Ge, 2021. "Uncertainty Quantification for Demand Prediction in Contextual Dynamic Pricing," Production and Operations Management, Production and Operations Management Society, vol. 30(6), pages 1703-1717, June.
Xi Chen & Yining Wang & Yuan Zhou, 2018. "Dynamic Assortment Optimization with Changing Contextual Information," Papers 1810.13069, arXiv.org, revised Jan 2019.
Page, Kenneth & Pérez, Juan & Telha, Claudio & García-Echalar, Andrés & López-Ospina, Héctor, 2021. "Optimal bundle composition in competition for continuous attributes," European Journal of Operational Research, Elsevier, vol. 293(3), pages 1168-1187.
Qiu, Jiaqing & Li, Xiangyong & Duan, Yongrui & Chen, Mengxi & Tian, Peng, 2020. "Dynamic assortment in the presence of brand heterogeneity," Journal of Retailing and Consumer Services, Elsevier, vol. 56(C).
Julia Heger & Robert Klein, 2024. "Assortment optimization: a systematic literature review," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 46(4), pages 1099-1161, December.
Shaoning Han & Andrés Gómez & Oleg A. Prokopyev, 2022. "Fractional 0–1 programming and submodularity," Journal of Global Optimization, Springer, vol. 84(1), pages 77-93, September.
Hense, Jonas & Hübner, Alexander, 2022. "Assortment optimization in omni-channel retailing," European Journal of Operational Research, Elsevier, vol. 301(1), pages 124-140.
Boxiao Chen & Xiuli Chao, 2020. "Dynamic Inventory Control with Stockout Substitution and Demand Learning," Management Science, INFORMS, vol. 66(11), pages 5108-5127, November.
Çömez-Dolgan, Nagihan & Fescioglu-Unver, Nilgun & Cephe, Ecem & Şen, Alper, 2021. "Capacitated strategic assortment planning under explicit demand substitution," European Journal of Operational Research, Elsevier, vol. 294(3), pages 1120-1138.
Çömez-Dolgan, Nagihan & Dağ, Hilal & Fescioglu-Unver, Nilgun & Şen, Alper, 2023. "Multi-plant manufacturing assortment planning in the presence of transshipments," European Journal of Operational Research, Elsevier, vol. 310(3), pages 1033-1050.
Xi Chen & Chao Shi & Yining Wang & Yuan Zhou, 2021. "Dynamic Assortment Planning Under Nested Logit Models," Production and Operations Management, Production and Operations Management Society, vol. 30(1), pages 85-102, January.
Daria Dzyabura & Srikanth Jagabathula, 2018. "Offline Assortment Optimization in the Presence of an Online Channel," Management Science, INFORMS, vol. 64(6), pages 2767-2786, June.
David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
Jalali, Hamed & Carmen, Raïsa & Van Nieuwenhuyse, Inneke & Boute, Robert, 2019. "Quality and pricing decisions in production/inventory systems," European Journal of Operational Research, Elsevier, vol. 272(1), pages 195-206.

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:310:y:2023:i:2:p:737-750. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A tractable online learning algorithm for the multinomial logit contextual bandit

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data