IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v7y2022i1p8-d721364.html
   My bibliography  Save this article

TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

Author

Listed:
  • Muhammad Imran

    (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar)

  • Umair Qazi

    (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar)

  • Ferda Ofli

    (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar)

Abstract

As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.

Suggested Citation

  • Muhammad Imran & Umair Qazi & Ferda Ofli, 2022. "TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels," Data, MDPI, vol. 7(1), pages 1-27, January.
  • Handle: RePEc:gam:jdataj:v:7:y:2022:i:1:p:8-:d:721364
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/7/1/8/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/7/1/8/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. James Ming Chen, 2017. "Risk and Uncertainty," Quantitative Perspectives on Behavioral Economics and Finance, in: Econophysics and Capital Asset Pricing, chapter 0, pages 189-211, Palgrave Macmillan.
    2. Katerina Tzavella & Alexander Fekete & Frank Fiedrich, 2018. "Opportunities provided by geographic information systems and volunteered geographic information for a timely emergency response during flood events in Cologne, Germany," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 91(1), pages 29-57, April.
    3. Manierre, Matthew J., 2015. "Gaps in knowledge: Tracking and explaining gender differences in health information seeking," Social Science & Medicine, Elsevier, vol. 128(C), pages 151-158.
    4. repec:aph:ajpbhl:10.2105/ajph.2016.303512_4 is not listed on IDEAS
    5. Hainan Huang & Weifan Chen & Tian Xie & Yaoyao Wei & Ziqing Feng & Weijiong Wu, 2021. "The Impact of Individual Behaviors and Governmental Guidance Measures on Pandemic-Triggered Public Sentiment Based on System Dynamics and Cross-Validation," IJERPH, MDPI, vol. 18(8), pages 1-25, April.
    6. Amy Antonio & David Tuffley, 2014. "The Gender Digital Divide in Developing Countries," Future Internet, MDPI, vol. 6(4), pages 1-15, October.
    7. David A Broniatowski & Michael J Paul & Mark Dredze, 2013. "National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-1, December.
    8. Sinnenberg, L. & Buttenheim, A.M. & Padrez, K. & Mancheno, C. & Ungar, L. & Merchant, R.M., 2017. "Twitter as a tool for health research: A systematic review," American Journal of Public Health, American Public Health Association, vol. 107(1), pages 1-8.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tiantian Gu & Anand Venkateswaran, 2018. "Firm-supplier relations and managerial compensation," Review of Quantitative Finance and Accounting, Springer, vol. 51(3), pages 621-649, October.
    2. Amir Haghighati & Kamran Sedig, 2020. "VARTTA: A Visual Analytics System for Making Sense of Real-Time Twitter Data," Data, MDPI, vol. 5(1), pages 1-25, February.
    3. Hyekyung Woo & Youngtae Cho & Eunyoung Shim & Kihwang Lee & Gilyoung Song, 2015. "Public Trauma after the Sewol Ferry Disaster: The Role of Social Media in Understanding the Public Mood," IJERPH, MDPI, vol. 12(9), pages 1-10, September.
    4. HeeChel Kim & Hong-Woo Chun & Seonho Kim & Byoung-Youl Coh & Oh-Jin Kwon & Yeong-Ho Moon, 2017. "Longitudinal Study-Based Dementia Prediction for Public Health," IJERPH, MDPI, vol. 14(9), pages 1-16, August.
    5. Luis-Millán González & José Devís-Devís & Maite Pellicer-Chenoll & Miquel Pans & Alberto Pardo-Ibañez & Xavier García-Massó & Fernanda Peset & Fernanda Garzón-Farinós & Víctor Pérez-Samaniego, 2021. "The Impact of COVID-19 on Sport in Twitter: A Quantitative and Qualitative Content Analysis," IJERPH, MDPI, vol. 18(9), pages 1-20, April.
    6. Bhandari, Aarushi & Burroway, Rebekah, 2023. "Hold the phone! A cross-national analysis of Women's education, mobile phones, and HIV infections in low- and middle-income countries, 1990–2018," Social Science & Medicine, Elsevier, vol. 334(C).
    7. Deng, Yuping & Wu, Yanrui & Xu, Helian, 2019. "Political turnover and firm pollution discharges: An empirical study," China Economic Review, Elsevier, vol. 58(C).
    8. Wen, Fenghua & Li, Cui & Sha, Han & Shao, Liuguo, 2021. "How does economic policy uncertainty affect corporate risk-taking? Evidence from China," Finance Research Letters, Elsevier, vol. 41(C).
    9. Paolo BRUNORI & Giuliano RESCE, 2020. "Searching for the peak Google Trends and the Covid-19 outbreak in Italy," Working Papers - Economics wp2020_05.rdf, Universita' degli Studi di Firenze, Dipartimento di Scienze per l'Economia e l'Impresa.
    10. Fernando Arias & Ariel Guerra-Adames & Maytee Zambrano & Efraín Quintero-Guerra & Nathalia Tejedor-Flores, 2022. "Analyzing Spanish-Language Public Sentiment in the Context of a Pandemic and Social Unrest: The Panama Case," IJERPH, MDPI, vol. 19(16), pages 1-19, August.
    11. Fantazzini, Dean, 2020. "Short-term forecasting of the COVID-19 pandemic using Google Trends data: Evidence from 158 countries," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 59, pages 33-54.
    12. Ira Puspitasari & Alia Firdauzy, 2019. "Characterizing Consumer Behavior in Leveraging Social Media for E-Patient and Health-Related Activities," IJERPH, MDPI, vol. 16(18), pages 1-17, September.
    13. Anni Arumsari Fitriany & Piotr J. Flatau & Khoirunurrofik Khoirunurrofik & Nelly Florida Riama, 2021. "Assessment on the Use of Meteorological and Social Media Information for Forest Fire Detection and Prediction in Riau, Indonesia," Sustainability, MDPI, vol. 13(20), pages 1-13, October.
    14. Guangyu Hu & Xueyan Han & Huixuan Zhou & Yuanli Liu, 2019. "Public Perception on Healthcare Services: Evidence from Social Media Platforms in China," IJERPH, MDPI, vol. 16(7), pages 1-10, April.
    15. David A. Broniatowski, 2018. "Building the tower without climbing it: Progress in engineering systems," Systems Engineering, John Wiley & Sons, vol. 21(3), pages 259-281, May.
    16. Umar Ali Bukar & Fatimah Sidi & Marzanah A. Jabar & Rozi Nor Haizan Nor & Salfarina Abdullah & Iskandar Ishak & Mustafa Alabadla & Ali Alkhalifah, 2022. "How Advanced Technological Approaches Are Reshaping Sustainable Social Media Crisis Management and Communication: A Systematic Review," Sustainability, MDPI, vol. 14(10), pages 1-26, May.
    17. Erica L. Gallindo & Hobson A. Cruz & Mário W. L. Moreira, 2021. "Critical Examination Using Business Intelligence on the Gender Gap in Information Technology in Brazil," Mathematics, MDPI, vol. 9(15), pages 1-9, August.
    18. Riza Demirer & Rangan Gupta & Hossein Hassani & Xu Huang, 2020. "Time-Varying Risk Aversion and the Profitability of Carry Trades: Evidence from the Cross-Quantilogram," Economies, MDPI, vol. 8(1), pages 1-12, March.
    19. Francis Rathinam & Sayak Khatua & Zeba Siddiqui & Manya Malik & Pallavi Duggal & Samantha Watson & Xavier Vollenweider, 2021. "Using big data for evaluating development outcomes: A systematic map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    20. Nguyen, Duc Nguyen & Nguyen, Canh Phuc & Dang, Le Phuong Xuan, 2022. "Uncertainty and corporate default risk: Novel evidence from emerging markets," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 78(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:7:y:2022:i:1:p:8-:d:721364. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.