IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v7y2022i1p8-d721364.html
   My bibliography  Save this article

TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

Author

Listed:
  • Muhammad Imran

    (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar)

  • Umair Qazi

    (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar)

  • Ferda Ofli

    (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar)

Abstract

As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.

Suggested Citation

  • Muhammad Imran & Umair Qazi & Ferda Ofli, 2022. "TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels," Data, MDPI, vol. 7(1), pages 1-27, January.
  • Handle: RePEc:gam:jdataj:v:7:y:2022:i:1:p:8-:d:721364
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/7/1/8/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/7/1/8/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. James Ming Chen, 2017. "Risk and Uncertainty," Quantitative Perspectives on Behavioral Economics and Finance, in: Econophysics and Capital Asset Pricing, chapter 0, pages 189-211, Palgrave Macmillan.
    2. Katerina Tzavella & Alexander Fekete & Frank Fiedrich, 2018. "Opportunities provided by geographic information systems and volunteered geographic information for a timely emergency response during flood events in Cologne, Germany," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 91(1), pages 29-57, April.
    3. Manierre, Matthew J., 2015. "Gaps in knowledge: Tracking and explaining gender differences in health information seeking," Social Science & Medicine, Elsevier, vol. 128(C), pages 151-158.
    4. repec:aph:ajpbhl:10.2105/ajph.2016.303512_4 is not listed on IDEAS
    5. Hainan Huang & Weifan Chen & Tian Xie & Yaoyao Wei & Ziqing Feng & Weijiong Wu, 2021. "The Impact of Individual Behaviors and Governmental Guidance Measures on Pandemic-Triggered Public Sentiment Based on System Dynamics and Cross-Validation," IJERPH, MDPI, vol. 18(8), pages 1-25, April.
    6. Amy Antonio & David Tuffley, 2014. "The Gender Digital Divide in Developing Countries," Future Internet, MDPI, vol. 6(4), pages 1-15, October.
    7. David A Broniatowski & Michael J Paul & Mark Dredze, 2013. "National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-1, December.
    8. Sinnenberg, L. & Buttenheim, A.M. & Padrez, K. & Mancheno, C. & Ungar, L. & Merchant, R.M., 2017. "Twitter as a tool for health research: A systematic review," American Journal of Public Health, American Public Health Association, vol. 107(1), pages 1-8.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. HeeChel Kim & Hong-Woo Chun & Seonho Kim & Byoung-Youl Coh & Oh-Jin Kwon & Yeong-Ho Moon, 2017. "Longitudinal Study-Based Dementia Prediction for Public Health," IJERPH, MDPI, vol. 14(9), pages 1-16, August.
    2. Luis-Millán González & José Devís-Devís & Maite Pellicer-Chenoll & Miquel Pans & Alberto Pardo-Ibañez & Xavier García-Massó & Fernanda Peset & Fernanda Garzón-Farinós & Víctor Pérez-Samaniego, 2021. "The Impact of COVID-19 on Sport in Twitter: A Quantitative and Qualitative Content Analysis," IJERPH, MDPI, vol. 18(9), pages 1-20, April.
    3. Fernando Arias & Ariel Guerra-Adames & Maytee Zambrano & Efraín Quintero-Guerra & Nathalia Tejedor-Flores, 2022. "Analyzing Spanish-Language Public Sentiment in the Context of a Pandemic and Social Unrest: The Panama Case," IJERPH, MDPI, vol. 19(16), pages 1-19, August.
    4. Fantazzini, Dean, 2020. "Short-term forecasting of the COVID-19 pandemic using Google Trends data: Evidence from 158 countries," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 59, pages 33-54.
    5. Jang-Chul Kim & Sharif Mazumder & Pritam Saha, 2025. "Environmental Risk Concern and Short-Term IPO Performance of Green Stocks During the COVID-19 Crisis Period," JRFM, MDPI, vol. 18(3), pages 1-27, March.
    6. Riza Demirer & Rangan Gupta & Hossein Hassani & Xu Huang, 2020. "Time-Varying Risk Aversion and the Profitability of Carry Trades: Evidence from the Cross-Quantilogram," Economies, MDPI, vol. 8(1), pages 1-12, March.
    7. Francis Rathinam & Sayak Khatua & Zeba Siddiqui & Manya Malik & Pallavi Duggal & Samantha Watson & Xavier Vollenweider, 2021. "Using big data for evaluating development outcomes: A systematic map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    8. Nguyen, Duc Nguyen & Nguyen, Canh Phuc & Dang, Le Phuong Xuan, 2022. "Uncertainty and corporate default risk: Novel evidence from emerging markets," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 78(C).
    9. Holden, Stein T. & Tilahun, Mesfin, 2021. "Mobile phones, leadership and gender in rural business groups," World Development Perspectives, Elsevier, vol. 24(C).
    10. Hongying Dai & Brian R. Lee & Jianqiang Hao, 2017. "Predicting Asthma Prevalence by Linking Social Media Data and Traditional Surveys," The ANNALS of the American Academy of Political and Social Science, , vol. 669(1), pages 75-92, January.
    11. Zeynep Ertem & Dorrie Raymond & Lauren Ancel Meyers, 2018. "Optimal multi-source forecasting of seasonal influenza," PLOS Computational Biology, Public Library of Science, vol. 14(9), pages 1-16, September.
    12. Jose L Herrera & Ravi Srinivasan & John S Brownstein & Alison P Galvani & Lauren Ancel Meyers, 2016. "Disease Surveillance on Complex Social Networks," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-16, July.
    13. Ibrahim Musa & Hyun Woo Park & Lkhagvadorj Munkhdalai & Keun Ho Ryu, 2018. "Global Research on Syndromic Surveillance from 1993 to 2017: Bibliometric Analysis and Visualization," Sustainability, MDPI, vol. 10(10), pages 1-20, September.
    14. Shiferaw, Yegnanew A., 2024. "A spatial analysis of the digital gender gap in South Africa: Are there any fundamental differences?," Technological Forecasting and Social Change, Elsevier, vol. 204(C).
    15. Zhang, Ning & Su, Xiaoman & Qi, Shuyuan, 2023. "An empirical investigation of multiperiod tail risk forecasting models," International Review of Financial Analysis, Elsevier, vol. 86(C).
    16. Isaac Chun-Hai Fung & Jingjing Yin & Keisha D. Pressley & Carmen H. Duke & Chen Mo & Hai Liang & King-Wa Fu & Zion Tsz Ho Tse & Su-I Hou, 2019. "Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014," Data, MDPI, vol. 4(2), pages 1-12, June.
    17. Paolo Brunori & Giuliano Resce, 2020. "Searching for the peak Google Trends and the Covid-19 outbreak in Italy," SERIES 04-2020, Dipartimento di Economia e Finanza - Università degli Studi di Bari "Aldo Moro", revised Apr 2020.
    18. Fu, Tianwen & Zhuang, Xinkai & Hui, Yongchang & Liu, Jia, 2017. "Convex risk measures based on generalized lower deviation and their applications," International Review of Financial Analysis, Elsevier, vol. 52(C), pages 27-37.
    19. Leonardo Iania & Robbe Collage & Michiel Vereycken, 2023. "The Impact of Uncertainty in Macroeconomic Variables on Stock Returns in the USA," JRFM, MDPI, vol. 16(3), pages 1-15, March.
    20. Nason Maani Hessari & May CI van Schalkwyk & Sian Thomas & Mark Petticrew, 2019. "Alcohol Industry CSR Organisations: What Can Their Twitter Activity Tell Us about Their Independence and Their Priorities? A Comparative Analysis," IJERPH, MDPI, vol. 16(5), pages 1-12, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:7:y:2022:i:1:p:8-:d:721364. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.