IDEAS home Printed from https://ideas.repec.org/a/inm/ormksc/v35y2016i3p363-388.html
   My bibliography  Save this article

A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing

Author

Listed:
  • Xiao Liu

    (Stern School of Business, New York University, New York, New York 10012)

  • Param Vir Singh

    (Carnegie Mellon University, Pittsburgh, Pennsylvania 15213)

  • Kannan Srinivasan

    (Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213)

Abstract

Accurate forecasting of sales/consumption is particularly important for marketing because this information can be used to adjust marketing budget allocations and overall marketing strategies. Recently, online social platforms have produced an unparalleled amount of data on consumer behavior. However, two challenges have limited the use of these data in obtaining meaningful business marketing insights. First, the data are typically in an unstructured format, such as texts, images, audio, and video. Second, the sheer volume of the data makes standard analysis procedures computationally unworkable. In this study, we combine methods from cloud computing, machine learning, and text mining to illustrate how online platform content, such as Twitter, can be effectively used for forecasting. We conduct our analysis on a significant volume of nearly two billion Tweets and 400 billion Wikipedia pages. Our main findings emphasize that, by contrast to basic surface-level measures such as the volume of or sentiments in Tweets, the information content of Tweets and their timeliness significantly improve forecasting accuracy. Our method endogenously summarizes the information in Tweets. The advantage of our method is that the classification of the Tweets is based on what is in the Tweets rather than preconceived topics that may not be relevant. We also find that, by contrast to Twitter, other online data (e.g., Google Trends, Wikipedia views, IMDB reviews, and Huffington Post news) are very weak predictors of TV show demand because users tweet about TV shows before, during, and after a TV show, whereas Google searches, Wikipedia views, IMDB reviews, and news posts typically lag behind the show.Data, as supplemental material, are available at http://dx.doi.org/10.1287/mksc.2015.0972 .

Suggested Citation

  • Xiao Liu & Param Vir Singh & Kannan Srinivasan, 2016. "A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing," Marketing Science, INFORMS, vol. 35(3), pages 363-388, May.
  • Handle: RePEc:inm:ormksc:v:35:y:2016:i:3:p:363-388
    DOI: 10.1287/mksc.2015.0972
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mksc.2015.0972
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mksc.2015.0972?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Márton Mestyán & Taha Yasseri & János Kertész, 2013. "Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-8, August.
    2. David Roodman, 2009. "How to do xtabond2: An introduction to difference and system GMM in Stata," Stata Journal, StataCorp LP, vol. 9(1), pages 86-136, March.
    3. Dhar, Vasant & Chang, Elaine A., 2009. "Does Chatter Matter? The Impact of User-Generated Content on Music Sales," Journal of Interactive Marketing, Elsevier, vol. 23(4), pages 300-307.
    4. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    5. Andersen, Torben G & Sorensen, Bent E, 1996. "GMM Estimation of a Stochastic Volatility Model: A Monte Carlo Study," Journal of Business & Economic Statistics, American Statistical Association, vol. 14(3), pages 328-352, July.
    6. Karniouchina, Ekaterina V., 2011. "Impact of star and movie buzz on motion picture distribution and box office revenue," International Journal of Research in Marketing, Elsevier, vol. 28(1), pages 62-74.
    7. Oded Netzer & Ronen Feldman & Jacob Goldenberg & Moshe Fresko, 2012. "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," Marketing Science, INFORMS, vol. 31(3), pages 521-543, May.
    8. Nickell, Stephen J, 1981. "Biases in Dynamic Models with Fixed Effects," Econometrica, Econometric Society, vol. 49(6), pages 1417-1426, November.
    9. Stephen R. Bond, 2002. "Dynamic panel data models: a guide to micro data methods and practice," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 141-162, August.
    10. Bowsher, Clive G., 2002. "On testing overidentifying restrictions in dynamic panel data models," Economics Letters, Elsevier, vol. 77(2), pages 211-220, October.
    11. David Godes & Dina Mayzlin, 2004. "Using Online Conversations to Study Word-of-Mouth Communication," Marketing Science, INFORMS, vol. 23(4), pages 545-560, June.
    12. Onishi, Hiroshi & Manchanda, Puneet, 2012. "Marketing activity, blogging and sales," International Journal of Research in Marketing, Elsevier, vol. 29(3), pages 221-234.
    13. Chakravarty, Anindita & Liu, Yong & Mazumdar, Tridib, 2010. "The Differential Effects of Online Word-of-Mouth and Critics' Reviews on Pre-release Movie Evaluation," Journal of Interactive Marketing, Elsevier, vol. 24(3), pages 185-197.
    14. Nikolay Archak & Anindya Ghose & Panagiotis G. Ipeirotis, 2011. "Deriving the Pricing Power of Product Features by Mining Consumer Reviews," Management Science, INFORMS, vol. 57(8), pages 1485-1509, August.
    15. Shyam Gopinath & Pradeep K. Chintagunta & Sriram Venkataraman, 2013. "Blogs, Advertising, and Local-Market Movie Box Office Performance," Management Science, INFORMS, vol. 59(12), pages 2635-2654, December.
    16. Jehoshua Eliashberg & Sam K. Hui & Z. John Zhang, 2007. "From Story Line to Box Office: A New Approach for Green-Lighting Movie Scripts," Management Science, INFORMS, vol. 53(6), pages 881-893, June.
    17. Andrews, Donald W. K. & Lu, Biao, 2001. "Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models," Journal of Econometrics, Elsevier, vol. 101(1), pages 123-164, March.
    18. Anindya Ghose & Panagiotis G. Ipeirotis & Beibei Li, 2012. "Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowdsourced Content," Marketing Science, INFORMS, vol. 31(3), pages 493-520, May.
    19. Manuel Arellano & Stephen Bond, 1991. "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 58(2), pages 277-297.
    20. Shyam Gopinath & Jacquelyn S. Thomas & Lakshman Krishnamurthi, 2014. "Investigating the Relationship Between the Content of Online Word of Mouth, Advertising, and Brand Performance," Marketing Science, INFORMS, vol. 33(2), pages 241-258, March.
    21. Decker, Reinhold & Trusov, Michael, 2010. "Estimating aggregate consumer preferences from online product reviews," International Journal of Research in Marketing, Elsevier, vol. 27(4), pages 293-307.
    22. Pradeep K. Chintagunta & Shyam Gopinath & Sriram Venkataraman, 2010. "The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets," Marketing Science, INFORMS, vol. 29(5), pages 944-957, 09-10.
    23. Seshadri Tirunillai & Gerard J. Tellis, 2012. "Does Chatter Really Matter? Dynamics of User-Generated Content and Stock Performance," Marketing Science, INFORMS, vol. 31(2), pages 198-215, March.
    24. Stephen Bond, 2002. "Dynamic panel data models: a guide to microdata methods and practice," CeMMAP working papers CWP09/02, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Oded Netzer & Ronen Feldman & Jacob Goldenberg & Moshe Fresko, 2012. "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," Marketing Science, INFORMS, vol. 31(3), pages 521-543, May.
    2. Kick, Markus, 2015. "Social Media Research: A Narrative Review," EconStor Preprints 182506, ZBW - Leibniz Information Centre for Economics.
    3. Marchand, André & Hennig-Thurau, Thorsten & Wiertz, Caroline, 2017. "Not all digital word of mouth is created equal: Understanding the respective impact of consumer reviews and microblogs on new product success," International Journal of Research in Marketing, Elsevier, vol. 34(2), pages 336-354.
    4. Anindya Ghose & Panagiotis G. Ipeirotis & Beibei Li, 2012. "Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowdsourced Content," Marketing Science, INFORMS, vol. 31(3), pages 493-520, May.
    5. Shijie Lu & Xin (Shane) Wang & Neil Bendle, 2020. "Does Piracy Create Online Word of Mouth? An Empirical Analysis in the Movie Industry," Management Science, INFORMS, vol. 66(5), pages 2140-2162, May.
    6. Juergen Bitzer & Erkan Goeren, 2018. "Foreign Aid and Subnational Development: A Grid Cell Analysis," Working Papers V-407-18, University of Oldenburg, Department of Economics, revised Mar 2018.
    7. Khim-Yong Goh & Cheng-Suang Heng & Zhijie Lin, 2013. "Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content," Information Systems Research, INFORMS, vol. 24(1), pages 88-107, March.
    8. Poelhekke, Steven, 2011. "Urban growth and uninsured rural risk: Booming towns in bust times," Journal of Development Economics, Elsevier, vol. 96(2), pages 461-475, November.
    9. Dominik Gutt & Jürgen Neumann & Steffen Zimmermann & Dennis Kundisch & Jianqing Chen, 2018. "Design of Review Systems - A Strategic Instrument to shape Online Review Behavior and Economic Outcomes," Working Papers Dissertations 42, Paderborn University, Faculty of Business Administration and Economics.
    10. Pauwels, Koen & Aksehirli, Zeynep & Lackman, Andrew, 2016. "Like the ad or the brand? Marketing stimulates different electronic word-of-mouth content to drive online and offline performance," International Journal of Research in Marketing, Elsevier, vol. 33(3), pages 639-655.
    11. Saboo, Alok R. & Kumar, V. & Ramani, Girish, 2016. "Evaluating the impact of social media activities on human brand sales," International Journal of Research in Marketing, Elsevier, vol. 33(3), pages 524-541.
    12. Kostyra, Daniel S. & Reiner, Jochen & Natter, Martin & Klapper, Daniel, 2016. "Decomposing the effects of online customer reviews on brand, price, and product attributes," International Journal of Research in Marketing, Elsevier, vol. 33(1), pages 11-26.
    13. Ejike Udeogu (a) , Uzochukwu Amakom (b) and Shampa Roy-Mukherjee (a), 2021. "Empirical Analysis of an Augmented Schumpeterian Endogenous Growth Model," Journal of Economic Development, Chung-Ang Unviersity, Department of Economics, vol. 46(1), pages 53-84, March.
    14. Jan F. Kiviet, 2005. "Judging Contending Estimators by Simulation: Tournaments in Dynamic Panel Data Models," Tinbergen Institute Discussion Papers 05-112/4, Tinbergen Institute.
    15. Che, Yi & Lu, Yi & Tao, Zhigang & Wang, Peng, 2013. "The impact of income on democracy revisited," Journal of Comparative Economics, Elsevier, vol. 41(1), pages 159-169.
    16. Martin Gassebner & Noel Gaston & Michael J. Lamla, 2011. "The Inverse Domino Effect: Are Economic Reforms Contagious?," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 52(1), pages 183-200, February.
    17. Nikolay Archak & Anindya Ghose & Panagiotis G. Ipeirotis, 2011. "Deriving the Pricing Power of Product Features by Mining Consumer Reviews," Management Science, INFORMS, vol. 57(8), pages 1485-1509, August.
    18. Đặng, Rey & Houanti, L’Hocine & Reddy, Krishna & Simioni, Michel, 2020. "Does board gender diversity influence firm profitability? A control function approach," Economic Modelling, Elsevier, vol. 90(C), pages 168-181.
    19. David Roodman, 2006. "How to Do xtabond2," North American Stata Users' Group Meetings 2006 8, Stata Users Group.
    20. David Roodman, 2009. "A Note on the Theme of Too Many Instruments," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 71(1), pages 135-158, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormksc:v:35:y:2016:i:3:p:363-388. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.