IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000021.html
   My bibliography  Save this article

Predicting zip code-level vaccine hesitancy in US Metropolitan Areas using machine learning models on public tweets

Author

Listed:
  • Sara Melotte
  • Mayank Kejriwal

Abstract

Although the recent rise and uptake of COVID-19 vaccines in the United States has been encouraging, there continues to be significant vaccine hesitancy in various geographic and demographic clusters of the adult population. Surveys, such as the one conducted by Gallup over the past year, can be useful in determining vaccine hesitancy, but can be expensive to conduct and do not provide real-time data. At the same time, the advent of social media suggests that it may be possible to get vaccine hesitancy signals at an aggregate level, such as at the level of zip codes. Theoretically, machine learning models can be learned using socioeconomic (and other) features from publicly available sources. Experimentally, it remains an open question whether such an endeavor is feasible, and how it would compare to non-adaptive baselines. In this article, we present a proper methodology and experimental study for addressing this question. We use publicly available Twitter data collected over the previous year. Our goal is not to devise novel machine learning algorithms, but to rigorously evaluate and compare established models. Here we show that the best models significantly outperform non-learning baselines. They can also be set up using open-source tools and software.Author summary: The rapid development of COVID-19 vaccines has been touted as a miracle of modern medicine and industry-government partnerships. Unfortunately, vaccine hesitancy has been stubbornly high in many countries, including within the United States. Surveys, such as those conducted by organizations such as Gallup, have historically proved to be useful tools with which to poll a representative sample of people on vaccine sentiment, but can be expensive to administer and tend to rely on small sample sizes. As a complementary alternative, we propose to use publicly streaming data from Twitter to quantify vaccine hesitancy at the level of zip-codes in near real-time. Unfortunately, the noise and bias often present in social media begs the practical question of whether a cost-effective approach is feasible. In this article, we propose both a methodology, as well as an experimental study, for addressing these challenges using simple machine learning models. Our goal is not to devise novel algorithms but to rigorously evaluate and compare established models. Using public US-based Twitter data collected in the wake of the pandemic, we find that learning-based models outperform non learning-based models by significant margins. The methods we evaluate can easily be set up using open-source software packages.

Suggested Citation

  • Sara Melotte & Mayank Kejriwal, 2022. "Predicting zip code-level vaccine hesitancy in US Metropolitan Areas using machine learning models on public tweets," PLOS Digital Health, Public Library of Science, vol. 1(4), pages 1-19, April.
  • Handle: RePEc:plo:pdig00:0000021
    DOI: 10.1371/journal.pdig.0000021
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000021
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000021&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000021?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000021. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.