IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1908.01109.html
   My bibliography  Save this paper

The Use of Binary Choice Forests to Model and Estimate Discrete Choices

Author

Listed:
  • Ningyuan Chen
  • Guillermo Gallego
  • Zhuodong Tang

Abstract

Problem definition. In retailing, discrete choice models (DCMs) are commonly used to capture the choice behavior of customers when offered an assortment of products. When estimating DCMs using transaction data, flexible models (such as machine learning models or nonparametric models) are typically not interpretable and hard to estimate, while tractable models (such as the multinomial logit model) tend to misspecify the complex behavior represeted in the data. Methodology/results. In this study, we use a forest of binary decision trees to represent DCMs. This approach is based on random forests, a popular machine learning algorithm. The resulting model is interpretable: the decision trees can explain the decision-making process of customers during the purchase. We show that our approach can predict the choice probability of any DCM consistently and thus never suffers from misspecification. Moreover, our algorithm predicts assortments unseen in the training data. The mechanism and errors can be theoretically analyzed. We also prove that the random forest can recover preference rankings of customers thanks to the splitting criterion such as the Gini index and information gain ratio. Managerial implications. The framework has unique practical advantages. It can capture customers' behavioral patterns such as irrationality or sequential searches when purchasing a product. It handles nonstandard formats of training data that result from aggregation. It can measure product importance based on how frequently a random customer would make decisions depending on the presence of the product. It can also incorporate price information and customer features. Our numerical experiments using synthetic and real data show that using random forests to estimate customer choices can outperform existing methods.

Suggested Citation

  • Ningyuan Chen & Guillermo Gallego & Zhuodong Tang, 2019. "The Use of Binary Choice Forests to Model and Estimate Discrete Choices," Papers 1908.01109, arXiv.org, revised Apr 2024.
  • Handle: RePEc:arx:papers:1908.01109
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1908.01109
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Weitzman, Martin L, 1979. "Optimal Search for the Best Alternative," Econometrica, Econometric Society, vol. 47(3), pages 641-654, May.
    2. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521766555.
    3. Garrett van Ryzin & Gustavo Vulcano, 2017. "Technical Note—An Expectation-Maximization Method to Estimate a Rank-Based Choice Model of Demand," Operations Research, INFORMS, vol. 65(2), pages 396-407, April.
    4. Jose Blanchet & Guillermo Gallego & Vineet Goyal, 2016. "A Markov Chain Approximation to Choice Modeling," Operations Research, INFORMS, vol. 64(4), pages 886-905, August.
    5. H.D. Block & Jacob Marschak, 1959. "Random Orderings and Stochastic Theories of Response," Cowles Foundation Discussion Papers 66, Cowles Foundation for Research in Economics, Yale University.
    6. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    7. Vivek F. Farias & Srikanth Jagabathula & Devavrat Shah, 2013. "A Nonparametric Approach to Modeling Choice with Limited Data," Management Science, INFORMS, vol. 59(2), pages 305-322, December.
    8. Huber, Joel & Payne, John W & Puto, Christopher, 1982. "Adding Asymmetrically Dominated Alternatives: Violations of Regularity and the Similarity Hypothesis," Journal of Consumer Research, Journal of Consumer Research Inc., vol. 9(1), pages 90-98, June.
    9. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    10. Karthik Natarajan & Miao Song & Chung-Piaw Teo, 2009. "Persistency Model and Its Applications in Choice Modeling," Management Science, INFORMS, vol. 55(3), pages 453-469, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi-Chun Chen & Velibor V. Mišić, 2022. "Decision Forest: A Nonparametric Approach to Modeling Irrational Choice," Management Science, INFORMS, vol. 68(10), pages 7090-7111, October.
    2. Qi Feng & J. George Shanthikumar & Mengying Xue, 2022. "Consumer Choice Models and Estimation: A Review and Extension," Production and Operations Management, Production and Operations Management Society, vol. 31(2), pages 847-867, February.
    3. Sanjay Dominik Jena & Andrea Lodi & Claudio Sole, 2021. "On the estimation of discrete choice models to capture irrational customer behaviors," Papers 2109.03882, arXiv.org.
    4. Guiyun Feng & Xiaobo Li & Zizhuo Wang, 2017. "Technical Note—On the Relation Between Several Discrete Choice Models," Operations Research, INFORMS, vol. 65(6), pages 1516-1525, December.
    5. Sanjay Dominik Jena & Andrea Lodi & Claudio Sole, 2022. "On the Estimation of Discrete Choice Models to Capture Irrational Customer Behaviors," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1606-1625, May.
    6. Strauss, Arne K. & Klein, Robert & Steinhardt, Claudius, 2018. "A review of choice-based revenue management: Theory and methods," European Journal of Operational Research, Elsevier, vol. 271(2), pages 375-387.
    7. Kameng Nip & Zhenbo Wang & Zizhuo Wang, 2021. "Assortment Optimization under a Single Transition Choice Model," Production and Operations Management, Production and Operations Management Society, vol. 30(7), pages 2122-2142, July.
    8. Antoine Désir & Vineet Goyal & Danny Segev & Chun Ye, 2020. "Constrained Assortment Optimization Under the Markov Chain–based Choice Model," Management Science, INFORMS, vol. 66(2), pages 698-721, February.
    9. Aydın Alptekinoğlu & John H. Semple, 2021. "Heteroscedastic Exponomial Choice," Operations Research, INFORMS, vol. 69(3), pages 841-858, May.
    10. Badruddoza, Syed & Amin, Modhurima & McCluskey, Jill, 2019. "Assessing the Importance of an Attribute in a Demand SystemStructural Model versus Machine Learning," Working Papers 2019-5, School of Economic Sciences, Washington State University.
    11. Shipra Agrawal & Vashist Avadhanula & Vineet Goyal & Assaf Zeevi, 2019. "MNL-Bandit: A Dynamic Learning Approach to Assortment Selection," Operations Research, INFORMS, vol. 67(5), pages 1453-1485, September.
    12. David Muller & Emerson Melo & Ruben Schlotter, 2023. "A Distributionally Robust Random Utility Model," Papers 2303.05888, arXiv.org.
    13. Frank Huettner, & Tamer Boyaci, & Yalcin Akcay, 2016. "Consumer choice under limited attention when alternatives have different information costs," ESMT Research Working Papers ESMT-16-04_R2, ESMT European School of Management and Technology, revised 28 Feb 2018.
    14. Ali Aouad & Danny Segev, 2021. "Display Optimization for Vertically Differentiated Locations Under Multinomial Logit Preferences," Management Science, INFORMS, vol. 67(6), pages 3519-3550, June.
    15. Gerardo Berbeglia & Agustín Garassino & Gustavo Vulcano, 2022. "A Comparative Empirical Study of Discrete Choice Models in Retail Operations," Management Science, INFORMS, vol. 68(6), pages 4005-4023, June.
    16. Yi-Chun Chen & Dmitry Mitrofanov, 2023. "A Nonparametric Stochastic Set Model: Identification, Optimization, and Prediction," Papers 2302.04354, arXiv.org, revised Jul 2023.
    17. Frank Huettner, & Tamer Boyaci, & Yalcin Akcay, 2016. "Consumer choice under limited attention when alternatives have different information costs," ESMT Research Working Papers ESMT-16-04_R3, ESMT European School of Management and Technology, revised 26 Sep 2018.
    18. Frank Huettner, & Tamer Boyaci, & Yalcin Akcay, 2016. "Consumer choice under limited attention when options have different information costs," ESMT Research Working Papers ESMT-16-04, ESMT European School of Management and Technology, revised 04 Oct 2016.
    19. Meng Qi & Ho‐Yin Mak & Zuo‐Jun Max Shen, 2020. "Data‐driven research in retail operations—A review," Naval Research Logistics (NRL), John Wiley & Sons, vol. 67(8), pages 595-616, December.
    20. Dam, Tien Thanh & Ta, Thuy Anh & Mai, Tien, 2023. "Robust maximum capture facility location under random utility maximization models," European Journal of Operational Research, Elsevier, vol. 310(3), pages 1128-1150.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1908.01109. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.