IDEAS home Printed from https://ideas.repec.org/a/spr/alstar/v105y2021i2d10.1007_s10182-021-00390-z.html
   My bibliography  Save this article

Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach

Author

Listed:
  • Wai Hong Tan

    (Universiti Malaysia Kelantan)

  • Feng Chen

    (UNSW Sydney)

Abstract

The problem of tweet popularity prediction, or forecasting the total number of retweets stemming from an ancestral tweet, has attracted considerable interest recently. The prediction can be accomplished by fitting a point process model to the sequence of retweet times up to a certain censoring time and project the fitted model to a future time point. However, models employing such approach tend to have inferior prediction accuracy when the censoring time is too short before sufficient information can accumulate. To overcome this, we propose an empirical Bayes type approach of parameter estimation to combine internal knowledge on the times of historical retweets up to the censoring time and external knowledge on complete retweet sequences in the training data. We demonstrate the approach using several point process models with finite-dimensional parameters, where the prior distribution for the parameter of each model is constructed based on the external knowledge, and the likelihood is calculated based on the internal knowledge. The mode of the posterior distribution is used as the estimator of the finite-dimensional parameter, and the mean of the predictive distribution for the number of retweets implied by each of the estimated models is used to predict the tweet popularity. Using a large Twitter data set, we reveal that the proposed methodology not only enables prediction at time zero before the arrival of any retweet event, but also substantially improves the prediction performances of existing models, especially at earlier censoring times.

Suggested Citation

  • Wai Hong Tan & Feng Chen, 2021. "Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 335-352, June.
  • Handle: RePEc:spr:alstar:v:105:y:2021:i:2:d:10.1007_s10182-021-00390-z
    DOI: 10.1007/s10182-021-00390-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10182-021-00390-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10182-021-00390-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zongyang Ma & Aixin Sun & Gao Cong, 2013. "On predicting the popularity of newly emerging hashtags in Twitter," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(7), pages 1399-1410, July.
    2. Zongyang Ma & Aixin Sun & Gao Cong, 2013. "On predicting the popularity of newly emerging hashtags in Twitter," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(7), pages 1399-1410, July.
    3. Min-ge Xie & Kesar Singh, 2013. "Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review," International Statistical Review, International Statistical Institute, vol. 81(1), pages 3-39, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jabłońska-Sabuka, Matylda & Sitarz, Robert & Kraslawski, Andrzej, 2014. "Forecasting research trends using population dynamics model with Burgers’ type interaction," Journal of Informetrics, Elsevier, vol. 8(1), pages 111-122.
    2. Ali Daud & Muhammad Ahmad & M. S. I. Malik & Dunren Che, 2015. "Using machine learning techniques for rising star prediction in co-author network," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1687-1711, February.
    3. Zhao, Qihang & Feng, Xiaodong, 2022. "Utilizing citation network structure to predict paper citation counts: A Deep learning approach," Journal of Informetrics, Elsevier, vol. 16(1).
    4. Sharad Goel & Ashton Anderson & Jake Hofman & Duncan J. Watts, 2016. "The Structural Virality of Online Diffusion," Management Science, INFORMS, vol. 62(1), pages 180-196, January.
    5. Cui, Hao & Kertész, János, 2023. "“Born in Rome” or “Sleeping Beauty”: Emergence of hashtag popularity on the Chinese microblog Sina Weibo," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 619(C).
    6. Jaebong Son & Jintae Lee & Kai R. Larsen & Jiyoung Woo, 2020. "Understanding the uncertainty of disaster tweets and its effect on retweeting: The perspectives of uncertainty reduction theory and information entropy," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(10), pages 1145-1161, October.
    7. Paige Brown Jarreau & Imogene A Cancellare & Becky J Carmichael & Lance Porter & Daniel Toker & Samantha Z Yammine, 2019. "Using selfies to challenge public stereotypes of scientists," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-23, May.
    8. Arora, Anuja & Bansal, Shivam & Kandpal, Chandrashekhar & Aswani, Reema & Dwivedi, Yogesh, 2019. "Measuring social media influencer index- insights from facebook, Twitter and Instagram," Journal of Retailing and Consumer Services, Elsevier, vol. 49(C), pages 86-101.
    9. Son, Jaebong & Lee, Hyung Koo & Jin, Sung & Lee, Jintae, 2019. "Content features of tweets for effective communication during disasters: A media synchronicity theory perspective," International Journal of Information Management, Elsevier, vol. 45(C), pages 56-68.
    10. António Fonseca & Jorge Louçã, 2018. "Explaining the emergence of online popularity through a model of information diffusion," Computational and Mathematical Organization Theory, Springer, vol. 24(2), pages 169-187, June.
    11. Delbianco Fernando & Tohmé Fernando, 2023. "What is a relevant control?: An algorithmic proposal," Asociación Argentina de Economía Política: Working Papers 4643, Asociación Argentina de Economía Política.
    12. Guang Yang & Dungang Liu & Junyuan Wang & Min‐ge Xie, 2016. "Meta‐analysis framework for exact inferences with application to the analysis of rare events," Biometrics, The International Biometric Society, vol. 72(4), pages 1378-1386, December.
    13. La Vecchia, Davide & Moor, Alban & Scaillet, Olivier, 2023. "A higher-order correct fast moving-average bootstrap for dependent data," Journal of Econometrics, Elsevier, vol. 235(1), pages 65-81.
    14. Xiaokang Luo & Tirthankar Dasgupta & Minge Xie & Regina Y. Liu, 2021. "Leveraging the Fisher randomization test using confidence distributions: Inference, combination and fusion learning," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 777-797, September.
    15. Zhao, Xiujie & Chen, Piao & Gaudoin, Olivier & Doyen, Laurent, 2021. "Accelerated degradation tests with inspection effects," European Journal of Operational Research, Elsevier, vol. 292(3), pages 1099-1114.
    16. Erlis Ruli & Laura Ventura, 2021. "Can Bayesian, confidence distribution and frequentist inference agree?," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 359-373, March.
    17. Piero Veronese & Eugenio Melilli, 2021. "Confidence Distribution for the Ability Parameter of the Rasch Model," Psychometrika, Springer;The Psychometric Society, vol. 86(1), pages 131-166, March.
    18. Veronese, Piero & Melilli, Eugenio, 2018. "Some asymptotic results for fiducial and confidence distributions," Statistics & Probability Letters, Elsevier, vol. 134(C), pages 98-105.
    19. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    20. Yang Liu & Jan Hannig, 2017. "Generalized Fiducial Inference for Logistic Graded Response Models," Psychometrika, Springer;The Psychometric Society, vol. 82(4), pages 1097-1125, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:alstar:v:105:y:2021:i:2:d:10.1007_s10182-021-00390-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.