IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0071226.html
   My bibliography  Save this article

Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data

Author

Listed:
  • Márton Mestyán
  • Taha Yasseri
  • János Kertész

Abstract

Use of socially generated “big data” to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between “real time monitoring” and “early predicting” remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.

Suggested Citation

  • Márton Mestyán & Taha Yasseri & János Kertész, 2013. "Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-8, August.
  • Handle: RePEc:plo:pone00:0071226
    DOI: 10.1371/journal.pone.0071226
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071226
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0071226&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0071226?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Serguei Saavedra & Jordi Duch & Brian Uzzi, 2011. "Tracking Traders' Understanding of the Market Using e-Communication Data," PLOS ONE, Public Library of Science, vol. 6(10), pages 1-7, October.
    2. Sitabhra Sinha & S. Raghavendra, 2004. "Hollywood blockbusters and long-tailed distributions: An empirical study of the popularity of movies," Industrial Organization 0406008, University Library of Munich, Germany.
    3. Tim Brody & Stevan Harnad & Leslie Carr, 2006. "Earlier Web usage statistics as predictors of later citation impact," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(8), pages 1060-1072, June.
    4. Nicolas Jullien, 2012. "What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s)," Post-Print hal-00857208, HAL.
    5. Xin Shuai & Alberto Pepe & Johan Bollen, 2012. "How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-8, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Zhiqi & Chen, Yue & Glänzel, Wolfgang, 2020. "Preprints as accelerator of scholarly communication: An empirical analysis in Mathematics," Journal of Informetrics, Elsevier, vol. 14(4).
    2. Ortega, José Luis, 2018. "The life cycle of altmetric impact: A longitudinal study of six metrics from PlumX," Journal of Informetrics, Elsevier, vol. 12(3), pages 579-589.
    3. Victoria Tur-Viñes & Jesús Segarra-Saavedra & Tatiana Hidalgo-Marí, 2018. "Use of Twitter in Spanish Communication Journals," Publications, MDPI, vol. 6(3), pages 1-10, July.
    4. Xianwen Wang & Wenli Mao & Shenmeng Xu & Chunbo Zhang, 2014. "Usage history of scientific literature: Nature metrics and metrics of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 1923-1933, March.
    5. Brito, Ana C.M. & Silva, Filipi N. & de Arruda, Henrique F. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2021. "Classification of abrupt changes along viewing profiles of scientific articles," Journal of Informetrics, Elsevier, vol. 15(2).
    6. Juan C. Correa & Henry Laverde-Rojas & Julian Tejada & Fernando Marmolejo-Ramos, 2022. "The Sci-Hub effect on papers’ citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 99-126, January.
    7. Liwen Vaughan & Juan Tang & Rongbin Yang, 2017. "Investigating disciplinary differences in the relationships between citations and downloads," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1533-1545, June.
    8. Zhiqi Wang & Wolfgang Glänzel & Yue Chen, 2020. "The impact of preprints in Library and Information Science: an analysis of citations, usage and social attention indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1403-1423, November.
    9. Zohreh Zahedi & Rodrigo Costas & Paul Wouters, 2014. "How well developed are altmetrics? A cross-disciplinary analysis of the presence of ‘alternative metrics’ in scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1491-1513, November.
    10. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    11. Xianwen Wang & Zhi Wang & Shenmeng Xu, 2013. "Tracing scientist’s research trends realtimely," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 717-729, May.
    12. Andrew Wright, 2015. "Defending the Ivory Tower against the end of the world," Journal of Environmental Studies and Sciences, Springer;Association of Environmental Studies and Sciences, vol. 5(1), pages 66-69, March.
    13. Sorin Matei & Nicolas Jullien & Amira Rezgui & Diane Jackson, 2019. "The evolution of online co-production groups and its effects on content quality," Post-Print hal-01985702, HAL.
    14. Barbara McGillivray & Mathias Astell, 2019. "The relationship between usage and citations in an open access mega-journal," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 817-838, November.
    15. Niccolò Casnici & Pierpaolo Dondio & Roberto Casarin & Flaminio Squazzoni, 2015. "Decrypting Financial Markets through E-Joint Attention Efforts: On-Line Adaptive Networks of Investors in Periods of Market Uncertainty," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-15, August.
    16. Herm, Steffen & Callsen-Bracker, Hans-Markus & Kreis, Henning, 2014. "When the crowd evaluates soccer players’ market values: Accuracy and evaluation attributes of an online community," Sport Management Review, Elsevier, vol. 17(4), pages 484-492.
    17. Tang, Xuli & Li, Xin & Ding, Ying & Song, Min & Bu, Yi, 2020. "The pace of artificial intelligence innovations: Speed, talent, and trial-and-error," Journal of Informetrics, Elsevier, vol. 14(4).
    18. Beatriz Barros & Ana Fernández-Zubieta & Raul Fidalgo-Merino & Francisco Triguero, 2018. "Scientific knowledge percolation process and social impact: A case study on the biotechnology and microbiology perceptions on Twitter," Science and Public Policy, Oxford University Press, vol. 45(6), pages 804-814.
    19. Kim Holmberg & Mike Thelwall, 2014. "Disciplinary differences in Twitter scholarly communication," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1027-1042, November.
    20. Mike Thelwall, 2018. "Early Mendeley readers correlate with later citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1231-1240, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0071226. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.