IDEAS home Printed from https://ideas.repec.org/p/jmp/jm2020/pje208.html
   My bibliography  Save this paper

Identifying Consumer Preferences from User- and Crowd-Generated Digital Footprints on Amazon.com by Leveraging Machine Learning and Natural Language Processing

Author

Listed:
  • Jikhan Jeong

Abstract

Inexperienced consumers may have high uncertainty about experience goods that require technical knowledge and skills to operate effectively; therefore, experienced consumers' prior reviews can be useful for inexperienced ones. However, the one-sided review system (e.g., Amazon.com) only provides the opportunity for consumers to write a review as a buyer and contains no feedback from the seller's side, so the information displayed about individual buyers is limited. This study analyzes consumers' digital footprints (DFs) to identify and predict unobserved consumer preferences from online product reviews. It makes use of Python coding along with high-performance computing to extract reviewers' DFs for a specific product group (programmable thermostats) from a dataset of 141 million Amazon reviews. It identifies consumers' sentiment toward product content dimensions (PCDs) extracted from review text by applying topic modeling and domain expert annotations. However, some questionable reviews (posted by 'suspicious one-time reviewers' and 'always-the-same rating reviewers') are excluded. This paper obtains three main results: First, I find that the factors that affect consumer ratings are: (a) users' DFs (e.g., length of the product review, average rating across all categories, volume of prior reviews overall and in sub-categories), (b) reviewers' attitudes toward eight product content dimensions (smart connectivity, easiness, energy saving, functionality, support, price value, privacy, and the Amazon effect), and (c) other prior reviewers DFs (e.g., length of the review summary.) All the heteroskedastic ordered probit models with DF and sentiment variables show a better model fit than the base model. This paper is the first to identify the effect of service quality of the online platform (Amazon.com) on ratings. Second, extreme gradient boosting (XGBoost) is found to obtain the highest F1 score for predicting the ratings of potential consumers before they make a purchase or write a review. All the models containing DF and sentiment variables show a higher prediction performance than the base model. Classifications with a lower range of labels (three-class or binary classifications) show better prediction performance than the five-star rating classification. However, the performance for the minority class is low. Third, a convolutional neural network (CNN) on top of Bidirectional Encoder Representations from Transformers (BERT) embedding shows the highest F1 score for classifying consumers' sentiment toward a specific PCD. Overall, this approach developed in this paper is applicable, scalable, and interpretable for distinguishing important drivers of consumer reviews for different goods in a specific industry and can be used by industry to identify and predict unobserved consumer preferences and sentiment associated with product content dimensions.

Suggested Citation

  • Jikhan Jeong, 2020. "Identifying Consumer Preferences from User- and Crowd-Generated Digital Footprints on Amazon.com by Leveraging Machine Learning and Natural Language Processing," 2020 Papers pje208, Job Market Papers.
  • Handle: RePEc:jmp:jm2020:pje208
    as

    Download full text from publisher

    File URL: https://ideas.repec.org/jmp/2020/pje208.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Imke Reimers & Joel Waldfogel, 2021. "Digitization and Pre-purchase Information: The Causal and Welfare Impacts of Reviews and Crowd Ratings," American Economic Review, American Economic Association, vol. 111(6), pages 1944-1971, June.
    2. Dina Mayzlin & Yaniv Dover & Judith Chevalier, 2014. "Promotional Reviews: An Empirical Investigation of Online Review Manipulation," American Economic Review, American Economic Association, vol. 104(8), pages 2421-2455, August.
    3. Mallick, Debdulal, 2008. "Marginal and interaction effects in ordered response models," Working Papers eco_2008_13, Deakin University, Department of Economics.
    4. Michael Luca, 2011. "Reviews, Reputation, and Revenue: The Case of Yelp.com," Harvard Business School Working Papers 12-016, Harvard Business School, revised Mar 2016.
    5. Michael Anderson & Jeremy Magruder, 2012. "Learning from the Crowd: Regression Discontinuity Estimates of the Effects of an Online Review Database," Economic Journal, Royal Economic Society, vol. 122(563), pages 957-989, September.
    6. Litchfield, Julie & Reilly, Barry & Veneziani, Mario, 2012. "An analysis of life satisfaction in Albania: An heteroscedastic ordered probit model approach," Journal of Economic Behavior & Organization, Elsevier, vol. 81(3), pages 731-741.
    7. Chen, Songnian & Khan, Shakeeb, 2003. "Rates of convergence for estimating regression coefficients in heteroskedastic discrete response models," Journal of Econometrics, Elsevier, vol. 117(2), pages 245-278, December.
    8. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521747387.
    9. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    10. Matthew Gentzkow & Bryan Kelly & Matt Taddy, 2019. "Text as Data," Journal of Economic Literature, American Economic Association, vol. 57(3), pages 535-574, September.
    11. Yi Zhao & Sha Yang & Vishal Narayan & Ying Zhao, 2013. "Modeling Consumer Learning from Online Product Reviews," Marketing Science, INFORMS, vol. 32(1), pages 153-169, May.
    12. Bergtold, Jason S. & Ramsey, Steven M., 2015. "Neural Network Estimators of Binary Choice Processes: Estimation, Marginal Effects and WTP," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205649, Agricultural and Applied Economics Association.
    13. McCluskey, Jill J., 2000. "A Game Theoretic Approach to Organic Foods: An Analysis of Asymmetric Information and Policy," Agricultural and Resource Economics Review, Cambridge University Press, vol. 29(1), pages 1-9, April.
    14. Michael Luca, 2017. "Designing Online Marketplaces: Trust and Reputation Mechanisms," Innovation Policy and the Economy, University of Chicago Press, vol. 17(1), pages 77-93.
    15. Papadimitriou, Theophilos & Gogas, Periklis & Stathakis, Efthimios, 2014. "Forecasting energy markets using support vector machines," Energy Economics, Elsevier, vol. 44(C), pages 135-142.
    16. Steven Tadelis, 2016. "Reputation and Feedback Systems in Online Platform Markets," Annual Review of Economics, Annual Reviews, vol. 8(1), pages 321-340, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Greiff, Matthias & Paetzel, Fabian, 2020. "Information about average evaluations spurs cooperation: An experiment on noisy reputation systems," Journal of Economic Behavior & Organization, Elsevier, vol. 180(C), pages 334-356.
    2. Arslan Aziz & Hui Li & Rahul Telang, 2023. "The Consequences of Rating Inflation on Platforms: Evidence from a Quasi-Experiment," Information Systems Research, INFORMS, vol. 34(2), pages 590-608, June.
    3. Gesche, Tobias, 2018. "Reference Price Shifts and Customer Antagonism: Evidence from Reviews for Online Auctions," VfS Annual Conference 2018 (Freiburg, Breisgau): Digital Economy 181650, Verein für Socialpolitik / German Economic Association.
    4. Dominik Gutt & Jürgen Neumann & Wael Jabr & Dennis Kundisch, 2020. "The Fate of the App: Economic Implications of Updating under Reputation Resetting," Working Papers Dissertations 76, Paderborn University, Faculty of Business Administration and Economics.
    5. Vollaard, Ben & van Ours, Jan C., 2022. "Bias in expert product reviews," Journal of Economic Behavior & Organization, Elsevier, vol. 202(C), pages 105-118.
    6. Ana Babić Rosario & Kristine Valck & Francesca Sotgiu, 2020. "Conceptualizing the electronic word-of-mouth process: What we know and need to know about eWOM creation, exposure, and evaluation," Journal of the Academy of Marketing Science, Springer, vol. 48(3), pages 422-448, May.
    7. Christoph Carnehl & Maximilian Schaefer & André Stenzel & Kevin Ducbao Tran, 2022. "Value for Money and Selection: How Pricing Affects Airbnb Ratings," Working Papers 684, IGIER (Innocenzo Gasparini Institute for Economic Research), Bocconi University.
    8. Apostolos Filippas & John J. Horton & Joseph M. Golden, 2022. "Reputation Inflation," Marketing Science, INFORMS, vol. 41(4), pages 733-745, July.
    9. Limin Fang, 2022. "The Effects of Online Review Platforms on Restaurant Revenue, Consumer Learning, and Welfare," Management Science, INFORMS, vol. 68(11), pages 8116-8143, November.
    10. Brett Hollenbeck & Sridhar Moorthy & Davide Proserpio, 2019. "Advertising Strategy in the Presence of Reviews: An Empirical Analysis," Marketing Science, INFORMS, vol. 38(5), pages 793-811, September.
    11. Xitong Li, 2018. "Impact of Average Rating on Social Media Endorsement: The Moderating Role of Rating Dispersion and Discount Threshold," Information Systems Research, INFORMS, vol. 29(3), pages 739-754, September.
    12. Chinonso E. Etumnu & Kenneth Foster & Nicole O. Widmar & Jayson L. Lusk & David L. Ortega, 2020. "Does the distribution of ratings affect online grocery sales? Evidence from Amazon," Agribusiness, John Wiley & Sons, Ltd., vol. 36(4), pages 501-521, October.
    13. Richards, Timothy J. & Tiwari, Ashutosh, 2014. "Social Networks and Restaurant Choice," 2014 AAEA/EAAE/CAES Joint Symposium: Social Networks, Social Media and the Economics of Food, May 29-30, 2014, Montreal, Canada 166112, Agricultural and Applied Economics Association.
    14. Tommaso Bondi, 2019. "Alone, Together. Product Discovery Through Consumer Ratings," Working Papers 19-09, NET Institute.
    15. Martin, Simon & Shelegia, Sandro, 2021. "Underpromise and overdeliver? - Online product reviews and firm pricing," International Journal of Industrial Organization, Elsevier, vol. 79(C).
    16. Davide Proserpio & Wendy Xu & Georgios Zervas, 2018. "You get what you give: theory and evidence of reciprocity in the sharing economy," Quantitative Marketing and Economics (QME), Springer, vol. 16(4), pages 371-407, December.
    17. Andrey Fradkin & David Holtz, 2023. "Do Incentives to Review Help the Market? Evidence from a Field Experiment on Airbnb," Marketing Science, INFORMS, vol. 42(5), pages 853-865, September.
    18. Krügel, Jan Philipp & Paetzel, Fabian, 2021. "The Impact of Fake Reviews on Reputation Systems and Efficiency," VfS Annual Conference 2021 (Virtual Conference): Climate Economics 242415, Verein für Socialpolitik / German Economic Association.
    19. Sarigul, Sercan & Rui, Huaxia, 2014. "Nowcasting Obesity in the U.S. Using Google Search Volume Data," 2014 AAEA/EAAE/CAES Joint Symposium: Social Networks, Social Media and the Economics of Food, May 29-30, 2014, Montreal, Canada 166113, Agricultural and Applied Economics Association.
    20. Ishita Chakraborty & Minkyung Kim & K. Sudhir, 2019. "Attribute Sentiment Scoring With Online Text Reviews : Accounting for Language Structure and Attribute Self-Selection," Cowles Foundation Discussion Papers 2176R2, Cowles Foundation for Research in Economics, Yale University, revised Jun 2021.

    More about this item

    JEL classification:

    • D80 - Microeconomics - - Information, Knowledge, and Uncertainty - - - General
    • M21 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Economics - - - Business Economics
    • M31 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Marketing and Advertising - - - Marketing
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jmp:jm2020:pje208. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: RePEc Team (email available below). General contact details of provider: https://ideas.repec.org/jmp.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.