IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0305579.html
   My bibliography  Save this article

Incorporating connectivity among Internet search data for enhanced influenza-like illness tracking

Author

Listed:
  • Shaoyang Ning
  • Ahmed Hussain
  • Qing Wang

Abstract

Big data collected from the Internet possess great potential to reveal the ever-changing trends in society. In particular, accurate infectious disease tracking with Internet data has grown in popularity, providing invaluable information for public health decision makers and the general public. However, much of the complex connectivity among the Internet search data is not effectively addressed among existing disease tracking frameworks. To this end, we propose ARGO-C (Augmented Regression with Clustered GOogle data), an integrative, statistically principled approach that incorporates the clustering structure of Internet search data to enhance the accuracy and interpretability of disease tracking. Focusing on multi-resolution %ILI (influenza-like illness) tracking, we demonstrate the improved performance and robustness of ARGO-C over benchmark methods at various geographical resolutions. We also highlight the adaptability of ARGO-C to track various diseases in addition to influenza, and to track other social or economic trends.

Suggested Citation

  • Shaoyang Ning & Ahmed Hussain & Qing Wang, 2024. "Incorporating connectivity among Internet search data for enhanced influenza-like illness tracking," PLOS ONE, Public Library of Science, vol. 19(8), pages 1-20, August.
  • Handle: RePEc:plo:pone00:0305579
    DOI: 10.1371/journal.pone.0305579
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305579
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0305579&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0305579?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Amir Hassan Zadeh & Hamed M. Zolbanin & Ramesh Sharda & Dursun Delen, 2019. "Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis," Information Systems Frontiers, Springer, vol. 21(4), pages 743-760, August.
    2. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    3. Smolinski, M.S. & Crawley, A.W. & Baltrusaitis, K. & Chunara, R. & Olsen, J.M. & Wójcik, O. & Santillana, M. & Nguyen, A. & Brownstein, J.S., 2015. "Flu near you: Crowdsourced symptom reporting spanning 2 influenza seasons," American Journal of Public Health, American Public Health Association, vol. 105(10), pages 2124-2130.
    4. repec:plo:pcbi00:1004513 is not listed on IDEAS
    5. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    6. McLaren, Nick & Shanbhogue, Rachana, 2011. "Using internet search data as economic indicators," Bank of England Quarterly Bulletin, Bank of England, vol. 51(2), pages 134-140.
    7. Fred S. Lu & Mohammad W. Hattab & Cesar Leonardo Clemente & Matthew Biggerstaff & Mauricio Santillana, 2019. "Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    8. Jean-Paul Chretien & Dylan George & Jeffrey Shaman & Rohit A Chitale & F Ellis McKenzie, 2014. "Influenza Forecasting in Human Populations: A Scoping Review," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-8, April.
    9. Dingdong Yi & Shaoyang Ning & Chia-Jung Chang & S. C. Kou, 2021. "Forecasting Unemployment Using Internet Search Data via PRISM," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1662-1673, October.
    10. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    11. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    12. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lee, Kyu Ha & Chakraborty, Sounak & Sun, Jianguo, 2017. "Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 1-13.
    2. Borup, Daniel & Rapach, David E. & Schütte, Erik Christian Montes, 2023. "Mixed-frequency machine learning: Nowcasting and backcasting weekly initial claims with daily internet search volume data," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1122-1144.
    3. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    4. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    5. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    6. Borah, Abhishek & Rutz, Oliver, 2024. "Enhanced sales forecasting model using textual search data: Fusing dynamics with big data," International Journal of Research in Marketing, Elsevier, vol. 41(4), pages 632-647.
    7. James Chapman & Ajit Desai, . "Using payments data to nowcast macroeconomic variables during the onset of Covid-19," Journal of Financial Market Infrastructures, Journal of Financial Market Infrastructures.
    8. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    9. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    10. Yu-Min Yen, 2010. "A Note on Sparse Minimum Variance Portfolios and Coordinate-Wise Descent Algorithms," Papers 1005.5082, arXiv.org, revised Sep 2013.
    11. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    12. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    13. d'Aspremont, Alexandre & Ben Arous, Simon & Bricongne, Jean-Charles & Lietti, Benjamin & Meunier, Baptiste, 2025. "Satellites turn “concrete”: Tracking cement with satellite data and neural networks," Journal of Econometrics, Elsevier, vol. 249(PC).
    14. Osamu Komori & Shinto Eguchi & John B. Copas, 2015. "Generalized t-statistic for two-group classification," Biometrics, The International Biometric Society, vol. 71(2), pages 404-416, June.
    15. Murat Genç & M. Revan Özkale, 2021. "Usage of the GO estimator in high dimensional linear models," Computational Statistics, Springer, vol. 36(1), pages 217-239, March.
    16. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    17. Chen, Bingzhen & Zhai, Wenjuan, 2025. "Unified algorithms for distributed regularized linear regression model," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 229(C), pages 867-884.
    18. Wang, Shixuan & Syntetos, Aris A. & Liu, Ying & Di Cairano-Gilfedder, Carla & Naim, Mohamed M., 2023. "Improving automotive garage operations by categorical forecasts using a large number of variables," European Journal of Operational Research, Elsevier, vol. 306(2), pages 893-908.
    19. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    20. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0305579. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.