IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2503.20711.html

Demand Estimation with Text and Image Data

Author

Listed:
  • Giovanni Compiani
  • Ilya Morozov
  • Stephan Seiler

Abstract

We propose a demand estimation approach that leverages unstructured data to infer substitution patterns. Using pre-trained deep learning models, we extract embeddings from product images and textual descriptions and incorporate them into a mixed logit demand model. This approach enables demand estimation even when researchers lack data on product attributes or when consumers value hard-to-quantify attributes such as visual design. Using a choice experiment, we show this approach substantially outperforms standard attribute-based models at counterfactual predictions of second choices. We also apply it to 40 product categories offered on Amazon.com and consistently find that unstructured data are informative about substitution patterns.

Suggested Citation

  • Giovanni Compiani & Ilya Morozov & Stephan Seiler, 2025. "Demand Estimation with Text and Image Data," Papers 2503.20711, arXiv.org, revised Feb 2026.
  • Handle: RePEc:arx:papers:2503.20711
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2503.20711
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Goldberg, Pinelopi Koujianou, 1995. "Product Differentiation and Oligopoly in International Markets: The Case of the U.S. Automobile Industry," Econometrica, Econometric Society, vol. 63(4), pages 891-951, July.
    2. Amil Petrin, 2002. "Quantifying the Benefits of New Products: The Case of the Minivan," Journal of Political Economy, University of Chicago Press, vol. 110(4), pages 705-729, August.
    3. Günter J. Hitsch & Ali Hortaçsu & Xiliang Lin, 2021. "Prices and promotions in U.S. retail markets," Quantitative Marketing and Economics (QME), Springer, vol. 19(3), pages 289-368, December.
    4. Giovanni Compiani, 2022. "Market counterfactuals and the specification of multiproduct demand: A nonparametric approach," Quantitative Economics, Econometric Society, vol. 13(2), pages 545-591, May.
    5. Koppelman, Frank S. & Wen, Chieh-Hua, 2000. "The paired combinatorial logit model: properties, estimation and application," Transportation Research Part B: Methodological, Elsevier, vol. 34(2), pages 75-89, February.
    6. Vivek Bhattacharya & Gastón Illanes & David Stillerman, 2023. "Merger Effects and Antitrust Enforcement: Evidence from US Consumer Packaged Goods," NBER Working Papers 31123, National Bureau of Economic Research, Inc.
    7. Hunt Allcott & Benjamin B Lockwood & Dmitry Taubinsky, 2019. "Regressive Sin Taxes, with an Application to the Optimal Soda Tax," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 134(3), pages 1557-1626.
    8. Jerry A. Hausman, 1996. "Valuation of New Goods under Perfect and Imperfect Competition," NBER Chapters, in: The Economics of New Goods, pages 207-248, National Bureau of Economic Research, Inc.
    9. Kelvin J. Lancaster, 1966. "A New Approach to Consumer Theory," Journal of Political Economy, University of Chicago Press, vol. 74(2), pages 132-132.
    10. Daniel McFadden & Kenneth Train, 2000. "Mixed MNL models for discrete response," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 15(5), pages 447-470.
    11. Pinkse, Joris & Slade, Margaret E., 2004. "Mergers, brand competition, and the price of a pint," European Economic Review, Elsevier, vol. 48(3), pages 617-643, June.
    12. Christopher Conlon & Julie Holland Mortimer, 2021. "Empirical properties of diversion ratios," RAND Journal of Economics, RAND Corporation, vol. 52(4), pages 693-726, December.
    13. Hendrik Döpper & Alexander MacKay & Nathan H. Miller & Joel Stiebale, 2025. "Rising Markups and the Role of Consumer Preferences," Journal of Political Economy, University of Chicago Press, vol. 133(8), pages 2462-2505.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kevin Zielnicki & Guy Aridor & Aurelien Bibaut & Allen Tran & Winston Chou & Nathan Kallus, 2025. "The Value of Personalized Recommendations: Evidence from Netflix," CESifo Working Paper Series 12257, CESifo.
    2. Clément Gorin & Stephan Heblich & Yanos Zylberberg, 2025. "State of the Art: Economic Development Through the Lens of Paintings," NBER Working Papers 33976, National Bureau of Economic Research, Inc.
    3. Annie Liang, 2025. "Using Machine Learning to Generate, Clarify, and Improve Economic Models," Papers 2508.19136, arXiv.org.
    4. Kevin Zielnicki & Guy Aridor & Aur'elien Bibaut & Allen Tran & Winston Chou & Nathan Kallus, 2025. "The Value of Personalized Recommendations: Evidence from Netflix," Papers 2511.07280, arXiv.org, revised Mar 2026.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Steven T. Berry & Philip A. Haile, 2024. "Nonparametric Identification of Differentiated Products Demand Using Micro Data," Econometrica, Econometric Society, vol. 92(4), pages 1135-1162, July.
    2. Greene, David & Hossain, Anushah & Hofmann, Julia & Helfand, Gloria & Beach, Robert, 2018. "Consumer willingness to pay for vehicle attributes: What do we Know?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 118(C), pages 258-279.
    3. Bonnet, Céline & Richards, Timothy J., 2016. "Models of Consumer Demand for Differentiated Products," TSE Working Papers 16-741, Toulouse School of Economics (TSE).
    4. Pietro Tebaldi & Alexander Torgovitsky & Hanbin Yang, 2023. "Nonparametric Estimates of Demand in the California Health Insurance Exchange," Econometrica, Econometric Society, vol. 91(1), pages 107-146, January.
    5. Mogens Fosgerau & Julien Monardo & André de Palma, 2024. "The Inverse Product Differentiation Logit Model," American Economic Journal: Microeconomics, American Economic Association, vol. 16(4), pages 329-370, November.
    6. Pereira, Pedro & Ribeiro, Tiago, 2011. "The impact on broadband access to the Internet of the dual ownership of telephone and cable networks," International Journal of Industrial Organization, Elsevier, vol. 29(2), pages 283-293, March.
    7. Lawrence Goulder, 2007. "Distributional and Efficiency Impacts of Increased U.S. Gasoline Taxes," Discussion Papers 07-009, Stanford Institute for Economic Policy Research.
    8. Wang, Ao, 2023. "Sieve BLP: A semi-nonparametric model of demand for differentiated products," Journal of Econometrics, Elsevier, vol. 235(2), pages 325-351.
    9. Andrew B. Bernard & Stephen J. Redding & Peter K. Schott, 2009. "Products and Productivity," Scandinavian Journal of Economics, Wiley Blackwell, vol. 111(4), pages 681-709, December.
    10. Kidokoro, Yukihiro, 2016. "A micro foundation for discrete choice models with multiple categories of goods," Journal of choice modelling, Elsevier, vol. 19(C), pages 54-72.
    11. Afonso Rodrigues, 2025. "Consumer Choice Over Shopping Baskets: A Linear Demand Approach," Papers 2511.11846, arXiv.org.
    12. Bokhari, Farasat A.S. & Mariuzzo, Franco, 2018. "Demand estimation and merger simulations for drugs: Logits v. AIDS," International Journal of Industrial Organization, Elsevier, vol. 61(C), pages 653-685.
    13. Aviv Nevo, 2000. "A Practitioner's Guide to Estimation of Random‐Coefficients Logit Models of Demand," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 9(4), pages 513-548, December.
    14. Steven T. Berry & Philip A. Haile, 2021. "Foundations of Demand Estimation," NBER Working Papers 29305, National Bureau of Economic Research, Inc.
    15. Federico Ciliberto & GianCarlo Moschini & Edward D. Perry, 2019. "Valuing product innovation: genetically engineered varieties in US corn and soybeans," RAND Journal of Economics, RAND Corporation, vol. 50(3), pages 615-644, September.
    16. Martin O'Connell & Pierre Dubois & Rachel Griffith, 2022. "The Use of Scanner Data for Economics Research," Annual Review of Economics, Annual Reviews, vol. 14(1), pages 723-745, August.
    17. Christopher T. Conlon & Julie Holland Mortimer, 2013. "Demand Estimation under Incomplete Product Availability," American Economic Journal: Microeconomics, American Economic Association, vol. 5(4), pages 1-30, November.
    18. W. Ross Morrow & Steven J. Skerlos, 2011. "Fixed-Point Approaches to Computing Bertrand-Nash Equilibrium Prices Under Mixed-Logit Demand," Operations Research, INFORMS, vol. 59(2), pages 328-345, April.
    19. Reiss, Peter C. & Wolak, Frank A., 2003. "Structural Econometric Modeling: Rationales and Examples from Industrial Organization," Research Papers 1831, Stanford University, Graduate School of Business.
    20. Friberg, Richard & Ganslandt, Mattias, 2006. "An empirical assessment of the welfare effects of reciprocal dumping," Journal of International Economics, Elsevier, vol. 70(1), pages 1-24, September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2503.20711. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.