IDEAS home Printed from https://ideas.repec.org/p/qed/wpaper/1449.html
   My bibliography  Save this paper

The Bigger Picture: Combining Econometrics with Analytics Improve Forecasts of Movie Success

Author

Listed:
  • Steven Lehrer
  • Tian Xie

    (Queen's University)

Abstract

There exists significant hype regarding how much machine learning and incorporating social media data can improve forecast accuracy in commercial applications. To assess if the hype is warranted, we use data from the film industry in simulation experiments that contrast econometric approaches with tools from the predictive analytics literature. Further, we propose new strategies that combine elements from each literature in a bid to capture richer patterns of heterogeneity in the underlying relationship governing revenue. Our results demonstrate the importance of social media data and value from hybrid strategies that combine econometrics and machine learning when conducting forecasts with new big data sources. Specifically, while both least squares support vector regression and recursive partitioning strategies greatly outperform dimension reduction strategies and traditional econometrics approaches in forecast accuracy, there are further significant gains from using hybrid approaches. Further, Monte Carlo experiments demonstrate that these benefits arise from the significant heterogeneity in how social media measures and other film characteristics influence box office outcomes.

Suggested Citation

  • Steven Lehrer & Tian Xie, 2020. "The Bigger Picture: Combining Econometrics with Analytics Improve Forecasts of Movie Success," Working Paper 1449, Economics Department, Queen's University.
  • Handle: RePEc:qed:wpaper:1449
    as

    Download full text from publisher

    File URL: https://www.econ.queensu.ca/sites/econ.queensu.ca/files/wpaper/qed_wp_1449.pdf
    File Function: First version 2020
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Anita Elberse & Jehoshua Eliashberg, 2003. "Demand and Supply Dynamics for Sequentially Released Products in International Markets: The Case of Motion Pictures," Marketing Science, INFORMS, vol. 22(3), pages 329-354.
    2. Gordon Dahl & Stefano DellaVigna, 2009. "Does Movie Violence Increase Violent Crime?," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 124(2), pages 677-734.
    3. Dina Mayzlin & Yaniv Dover & Judith Chevalier, 2014. "Promotional Reviews: An Empirical Investigation of Online Review Manipulation," American Economic Review, American Economic Association, vol. 104(8), pages 2421-2455, August.
    4. Márton Mestyán & Taha Yasseri & János Kertész, 2013. "Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-8, August.
    5. Kim, Taegu & Hong, Jungsik & Kang, Pilsung, 2015. "Box office forecasting using machine learning algorithms based on SNS data," International Journal of Forecasting, Elsevier, vol. 31(2), pages 364-390.
    6. Hansen, Peter Reinhard, 2005. "A Test for Superior Predictive Ability," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 365-380, October.
    7. Yuan, Zheng & Yang, Yuhong, 2005. "Combining Linear Regression Models: When and How?," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1202-1214, December.
    8. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    9. Steven Lehrer & Tian Xie, 2017. "Box Office Buzz: Does Social Media Data Steal the Show from Model Uncertainty When Forecasting for Hollywood?," The Review of Economics and Statistics, MIT Press, vol. 99(5), pages 749-755, December.
    10. Xie, Tian, 2015. "Prediction model averaging estimator," Economics Letters, Elsevier, vol. 131(C), pages 5-8.
    11. Ramya Neelamegham & Pradeep Chintagunta, 1999. "A Bayesian Model to Forecast New Product Performance in Domestic and International Markets," Marketing Science, INFORMS, vol. 18(2), pages 115-136.
    12. Matias D Cattaneo & Michael Jansson & Xinwei Ma, 2019. "Two-Step Estimation and Inference with Possibly Many Included Covariates," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 86(3), pages 1095-1122.
    13. David F. Hendry & Bent Nielsen, 2007. "Preface to Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    14. Morris Holbrook & Michela Addis, 2008. "Art versus commerce in the movie industry: a Two-Path Model of Motion-Picture Success," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 32(2), pages 87-107, June.
    15. Aman Ullah & Huansha Wang, 2013. "Parametric and Nonparametric Frequentist Model Selection and Model Averaging," Econometrics, MDPI, vol. 1(2), pages 1-23, September.
    16. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    17. Charles F. Manski, 2004. "Statistical Treatment Rules for Heterogeneous Populations," Econometrica, Econometric Society, vol. 72(4), pages 1221-1246, July.
    18. Steven Lehrer & Tian Xie & Tao Zeng, 2021. "Does High-Frequency Social Media Data Improve Forecasts of Low-Frequency Consumer Confidence Measures? [Regression Models with Mixed Sampling Frequencies]," Journal of Financial Econometrics, Oxford University Press, vol. 19(5), pages 910-933.
    19. Gah-Yi Ban & Noureddine El Karoui & Andrew E. B. Lim, 2018. "Machine Learning and Portfolio Optimization," Management Science, INFORMS, vol. 64(3), pages 1136-1154, March.
    20. Xie, Tian, 2017. "Heteroscedasticity-robust model screening: A useful toolkit for model averaging in big data analytics," Economics Letters, Elsevier, vol. 151(C), pages 119-122.
    21. Xinyu Zhang & Jeng-Min Chiou & Yanyuan Ma, 2018. "Functional prediction through averaging estimated functional linear regression models," Biometrika, Biometrika Trust, vol. 105(4), pages 945-962.
    22. Khim-Yong Goh & Cheng-Suang Heng & Zhijie Lin, 2013. "Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content," Information Systems Research, INFORMS, vol. 24(1), pages 88-107, March.
    23. Qingfeng Liu & Ryo Okui, 2013. "Heteroscedasticity‐robust C(p) model averaging," Econometrics Journal, Royal Economic Society, vol. 16(3), pages 463-472, October.
    24. Bruce E. Hansen, 2014. "Model averaging, asymptotic risk, and regressor groups," Quantitative Economics, Econometric Society, vol. 5(3), pages 495-530, November.
    25. Xinyu Zhang & Dalei Yu & Guohua Zou & Hua Liang, 2016. "Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1775-1790, October.
    26. Young-Jin Lee & Kartik Hosanagar & Yong Tan, 2015. "Do I Follow My Friends or the Crowd? Information Cascades in Online Movie Ratings," Management Science, INFORMS, vol. 61(9), pages 2241-2258, September.
    27. Evgeny A. Antipov & Elena B. Pokryshevskaya, 2017. "Are box office revenues equally unpredictable for all movies? Evidence from a Random forest-based model," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 16(3), pages 295-307, June.
    28. Zhang, Xinyu & Ullah, Aman & Zhao, Shangwei, 2016. "On the dominance of Mallows model averaging estimator over ordinary least squares estimator," Economics Letters, Elsevier, vol. 142(C), pages 69-73.
    29. Juan Prieto-Rodriguez & Fernanda Gutierrez-Navratil & Victoria Ateca-Amestoy, 2015. "Theatre allocation as a distributor’s strategic variable over movie runs," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 39(1), pages 65-83, February.
    30. Julie Holland Mortimer, 2007. "Price Discrimination, Copyright Law, and Technological Innovation: Evidence from the Introduction of DVDs," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 122(3), pages 1307-1350.
    31. David F. Hendry & Bent Nielsen, 2007. "The Bernoulli model, from Econometric Modeling: A Likelihood Approach," Introductory Chapters, in: Econometric Modeling: A Likelihood Approach, Princeton University Press.
    32. White, Halbert, 1982. "Maximum Likelihood Estimation of Misspecified Models," Econometrica, Econometric Society, vol. 50(1), pages 1-25, January.
    33. Durlauf, Steven N. & Navarro, Salvador & Rivers, David A., 2016. "Model uncertainty and the effect of shall-issue right-to-carry laws on crime," European Economic Review, Elsevier, vol. 81(C), pages 32-67.
    34. De Vany, Arthur S. & Walls, W. David, 2004. "Motion picture profit, the stable Paretian hypothesis, and the curse of the superstar," Journal of Economic Dynamics and Control, Elsevier, vol. 28(6), pages 1035-1057, March.
    35. Tomohiro Ando & Ker-Chau Li, 2014. "A Model-Averaging Approach for High-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 254-265, March.
    36. Mark B. Houston & Ann-Kristin Kupfer & Thorsten Hennig-Thurau & Martin Spann, 2018. "Pre-release consumer buzz," Journal of the Academy of Marketing Science, Springer, vol. 46(2), pages 338-360, March.
    37. Wan, Alan T.K. & Zhang, Xinyu & Zou, Guohua, 2010. "Least squares model averaging by Mallows criterion," Journal of Econometrics, Elsevier, vol. 156(2), pages 277-283, June.
    38. Smith, Michael D. & Telang, Rahul, 2010. "Piracy or promotion? The impact of broadband Internet penetration on DVD sales," Information Economics and Policy, Elsevier, vol. 22(4), pages 289-298, December.
    39. Xinlei Chen & Yuxin Chen & Charles Weinberg, 2013. "Learning about movies: the impact of movie release types on the nationwide box office," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 37(3), pages 359-386, August.
    40. Julia Campos & David F. Hendry & Hans‐Martin Krolzig, 2003. "Consistent Model Selection by an Automatic Gets Approach," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 65(s1), pages 803-819, December.
    41. Hansen, Bruce E. & Racine, Jeffrey S., 2012. "Jackknife model averaging," Journal of Econometrics, Elsevier, vol. 167(1), pages 38-46.
    42. Martin A. Koschat, 2012. "The Impact of Movie Reviews on Box Office: Media Portfolios and the Intermediation of Genre," Journal of Media Economics, Taylor & Francis Journals, vol. 25(1), pages 35-53, February.
    43. Chakravarty, Anindita & Liu, Yong & Mazumdar, Tridib, 2010. "The Differential Effects of Online Word-of-Mouth and Critics' Reviews on Pre-release Movie Evaluation," Journal of Interactive Marketing, Elsevier, vol. 24(3), pages 185-197.
    44. Guiyang Xiong & Sundar Bharadwaj, 2014. "Prerelease Buzz Evolution Patterns and New Product Performance," Marketing Science, INFORMS, vol. 33(3), pages 401-421, May.
    45. Shyam Gopinath & Pradeep K. Chintagunta & Sriram Venkataraman, 2013. "Blogs, Advertising, and Local-Market Movie Box Office Performance," Management Science, INFORMS, vol. 59(12), pages 2635-2654, December.
    46. Gray, J. Brian & Fan, Guangzhe, 2008. "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1362-1372, January.
    47. Gerda Gemser & Martine Oostrum & Mark Leenders, 2007. "The impact of film reviews on the box office performance of art house versus mainstream motion pictures," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 31(1), pages 43-63, March.
    48. Sunčica Vujić & Xiaoyu Zhang, 2018. "Does Twitter chatter matter? Online reviews and box office revenues," Applied Economics, Taylor & Francis Journals, vol. 50(34-35), pages 3702-3717, July.
    49. Judith A. Chevalier & Yaniv Dover & Dina MayzlinDina Mayzlin, 2018. "Channels of Impact: User Reviews When Quality Is Dynamic and Managers Respond," Marketing Science, INFORMS, vol. 37(5), pages 688-709, September.
    50. Amemiya, Takeshi, 1980. "Selection of Regressors," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 21(2), pages 331-354, June.
    51. Mohanbir S. Sawhney & Jehoshua Eliashberg, 1996. "A Parsimonious Model for Forecasting Gross Box-Office Revenues of Motion Pictures," Marketing Science, INFORMS, vol. 15(2), pages 113-131.
    52. Sam K. Hui & Jehoshua Eliashberg & Edward I. George, 2008. "Modeling DVD Preorder and Sales: An Optimal Stopping Approach," Marketing Science, INFORMS, vol. 27(6), pages 1097-1110, 11-12.
    53. Bruce E. Hansen, 2007. "Least Squares Model Averaging," Econometrica, Econometric Society, vol. 75(4), pages 1175-1189, July.
    54. Pradeep K. Chintagunta & Shyam Gopinath & Sriram Venkataraman, 2010. "The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets," Marketing Science, INFORMS, vol. 29(5), pages 944-957, 09-10.
    55. Garrett P. Sonnier & Leigh McAlister & Oliver J. Rutz, 2011. "A Dynamic Model of the Effect of Online Communications on Firm Sales," Marketing Science, INFORMS, vol. 30(4), pages 702-716, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qiu, Yue, 2021. "Complete subset least squares support vector regression," Economics Letters, Elsevier, vol. 200(C).
    2. Elena Denisova-Schmidt & Martin Huber & Elvira Leontyeva & Anna Solovyeva, 2021. "Combining experimental evidence with machine learning to assess anti-corruption educational campaigns among Russian university students," Empirical Economics, Springer, vol. 60(4), pages 1661-1684, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jordi McKenzie, 2023. "The economics of movies (revisited): A survey of recent literature," Journal of Economic Surveys, Wiley Blackwell, vol. 37(2), pages 480-525, April.
    2. Zhang, Xinyu & Liu, Chu-An, 2023. "Model averaging prediction by K-fold cross-validation," Journal of Econometrics, Elsevier, vol. 235(1), pages 280-301.
    3. Steven Lehrer & Tian Xie, 2017. "Box Office Buzz: Does Social Media Data Steal the Show from Model Uncertainty When Forecasting for Hollywood?," The Review of Economics and Statistics, MIT Press, vol. 99(5), pages 749-755, December.
    4. Xie, Tian, 2017. "Heteroscedasticity-robust model screening: A useful toolkit for model averaging in big data analytics," Economics Letters, Elsevier, vol. 151(C), pages 119-122.
    5. Liao, Jun & Zou, Guohua, 2020. "Corrected Mallows criterion for model averaging," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    6. Sun, Yuying & Hong, Yongmiao & Wang, Shouyang & Zhang, Xinyu, 2023. "Penalized time-varying model averaging," Journal of Econometrics, Elsevier, vol. 235(2), pages 1355-1377.
    7. Xinyu Zhang & Dalei Yu & Guohua Zou & Hua Liang, 2016. "Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1775-1790, October.
    8. Yuting Wei & Qihua Wang & Wei Liu, 2021. "Model averaging for linear models with responses missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 535-553, June.
    9. Haowen Bao & Zongwu Cai & Yuying Sun & Shouyang Wang, 2023. "Penalized Model Averaging for High Dimensional Quantile Regressions," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202302, University of Kansas, Department of Economics, revised Jan 2023.
    10. Lehrer, Steven & Xie, Tian & Zhang, Xinyu, 2021. "Social media sentiment, model uncertainty, and volatility forecasting," Economic Modelling, Elsevier, vol. 102(C).
    11. Peng, Jingfu & Yang, Yuhong, 2022. "On improvability of model selection by model averaging," Journal of Econometrics, Elsevier, vol. 229(2), pages 246-262.
    12. Wei, Yuting & Wang, Qihua, 2021. "Cross-validation-based model averaging in linear models with response missing at random," Statistics & Probability Letters, Elsevier, vol. 171(C).
    13. Yan, Xiaodong & Wang, Hongni & Wang, Wei & Xie, Jinhan & Ren, Yanyan & Wang, Xinjun, 2021. "Optimal model averaging forecasting in high-dimensional survival analysis," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1147-1155.
    14. Fang, Fang & Li, Jialiang & Xia, Xiaochao, 2022. "Semiparametric model averaging prediction for dichotomous response," Journal of Econometrics, Elsevier, vol. 229(2), pages 219-245.
    15. Yuying Sun & Shaoxin Hong & Zongwu Cai, 2023. "Optimal Local Model Averaging for Divergent-Dimensional Functional-Coefficient Regressions," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202309, University of Kansas, Department of Economics, revised Sep 2023.
    16. Qiu, Yue & Wang, Zongrun & Xie, Tian & Zhang, Xinyu, 2021. "Forecasting Bitcoin realized volatility by exploiting measurement error under model uncertainty," Journal of Empirical Finance, Elsevier, vol. 62(C), pages 179-201.
    17. Liao, Jun & Zou, Guohua & Gao, Yan & Zhang, Xinyu, 2021. "Model averaging prediction for time series models with a diverging number of parameters," Journal of Econometrics, Elsevier, vol. 223(1), pages 190-221.
    18. Tian Xie, 2019. "Forecast Bitcoin Volatility with Least Squares Model Averaging," Econometrics, MDPI, vol. 7(3), pages 1-20, September.
    19. Liao, Jun & Zong, Xianpeng & Zhang, Xinyu & Zou, Guohua, 2019. "Model averaging based on leave-subject-out cross-validation for vector autoregressions," Journal of Econometrics, Elsevier, vol. 209(1), pages 35-60.
    20. Haili Zhang & Guohua Zou, 2020. "Cross-Validation Model Averaging for Generalized Functional Linear Model," Econometrics, MDPI, vol. 8(1), pages 1-35, February.

    More about this item

    Keywords

    Machine Learning; Model Specification; Heteroskedasticity; Heterogeneity; Social Media; Big Data;
    All these keywords.

    JEL classification:

    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • L82 - Industrial Organization - - Industry Studies: Services - - - Entertainment; Media
    • D03 - Microeconomics - - General - - - Behavioral Microeconomics: Underlying Principles
    • M21 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Economics - - - Business Economics
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:qed:wpaper:1449. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mark Babcock (email available below). General contact details of provider: https://edirc.repec.org/data/qedquca.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.