IDEAS home Printed from https://ideas.repec.org/a/gam/jforec/v5y2023i1p15-296d1082814.html
   My bibliography  Save this article

Assessing Spurious Correlations in Big Search Data

Author

Listed:
  • Jesse T. Richman

    (Department of Political Science and Geography, Old Dominion University, BAL 7000, Norfolk, VA 23529, USA)

  • Ryan J. Roberts

    (Department of Public Service, Gardner-Webb University, Boiling Springs, NC 28017, USA)

Abstract

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random walk distributions. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress toward accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from the study authors.

Suggested Citation

  • Jesse T. Richman & Ryan J. Roberts, 2023. "Assessing Spurious Correlations in Big Search Data," Forecasting, MDPI, vol. 5(1), pages 1-12, February.
  • Handle: RePEc:gam:jforec:v:5:y:2023:i:1:p:15-296:d:1082814
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-9394/5/1/15/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-9394/5/1/15/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Adrian Letchford & Tobias Preis & Helen Susannah Moat, 2016. "Quantifying the Search Behaviour of Different Demographics Using Google Correlate," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-11, February.
    2. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    3. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    4. Ahmed Shoukry Rashad, 2022. "The Power of Travel Search Data in Forecasting the Tourism Demand in Dubai," Forecasting, MDPI, vol. 4(3), pages 1-11, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David H Chae & Sean Clouston & Mark L Hatzenbuehler & Michael R Kramer & Hannah L F Cooper & Sacoby M Wilson & Seth I Stephens-Davidowitz & Robert S Gold & Bruce G Link, 2015. "Association between an Internet-Based Measure of Area Racism and Black Mortality," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-12, April.
    2. Ishani Chaudhuri & Parthajit Kayal, 2022. "Predicting Power of Ticker Search Volume in Indian Stock Market," Working Papers 2022-214, Madras School of Economics,Chennai,India.
    3. Yang, Xin & Pan, Bing & Evans, James A. & Lv, Benfu, 2015. "Forecasting Chinese tourist volume with search engine data," Tourism Management, Elsevier, vol. 46(C), pages 386-397.
    4. Bentzen, Jeanet Sinding, 2021. "In crisis, we pray: Religiosity and the COVID-19 pandemic," Journal of Economic Behavior & Organization, Elsevier, vol. 192(C), pages 541-583.
    5. Aksoy, Cevat Giray & Ganslmeier, Michael & Poutvaara, Panu, 2020. "Public Attention and Policy Responses to COVID-19 Pandemic," IZA Discussion Papers 13427, Institute of Labor Economics (IZA).
    6. Daniele Barchiesi & Helen Susannah Moat & Christian Alis & Steven Bishop & Tobias Preis, 2015. "Quantifying International Travel Flows Using Flickr," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-8, July.
    7. Breithaupt, Patrick & Kesler, Reinhold & Niebel, Thomas & Rammer, Christian, 2020. "Intangible capital indicators based on web scraping of social media," ZEW Discussion Papers 20-046, ZEW - Leibniz Centre for European Economic Research.
    8. JooSeok Oh & Timothy Paul Connerton & Hyun-Jung Kim, 2019. "The Rediscovery of Brand Experience Dimensions with Big Data Analysis: Building for a Sustainable Brand," Sustainability, MDPI, vol. 11(19), pages 1-21, September.
    9. Götz, Thomas B. & Knetsch, Thomas A., 2019. "Google data in bridge equation models for German GDP," International Journal of Forecasting, Elsevier, vol. 35(1), pages 45-66.
    10. Jianchun Fang & Wanshan Wu & Zhou Lu & Eunho Cho, 2019. "Using Baidu Index To Nowcast Mobile Phone Sales In China," The Singapore Economic Review (SER), World Scientific Publishing Co. Pte. Ltd., vol. 64(01), pages 83-96, March.
    11. Kristina Gligorić & Arnaud Chiolero & Emre Kıcıman & Ryen W. White & Robert West, 2022. "Population-scale dietary interests during the COVID-19 pandemic," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    12. Long Wen & Chang Liu & Haiyan Song, 2019. "Forecasting tourism demand using search query data: A hybrid modelling approach," Tourism Economics, , vol. 25(3), pages 309-329, May.
    13. Abay,Kibrom A. & Hirfrfot,Kibrom Tafere & Woldemichael,Andinet, 2020. "Winners and Losers from COVID-19 : Global Evidence from Google Search," Policy Research Working Paper Series 9268, The World Bank.
    14. Jingwen Liu & Peng Zou & Yu Ma, 2022. "The Effect of Air Pollution on Food Preferences," Journal of the Academy of Marketing Science, Springer, vol. 50(2), pages 410-423, March.
    15. Stephen L. France & Yuying Shi, 2017. "Aggregating Google Trends: Multivariate Testing and Analysis," Papers 1712.03152, arXiv.org, revised Mar 2018.
    16. Qian Chen & Xiang Gao & Jianming Mo & Zhouling Xu, 2022. "Market Reaction to Local Attention around Earnings Announcements in China: Evidence from Internet Search Activity," IJFS, MDPI, vol. 10(4), pages 1-26, October.
    17. Corey Lang & John David Ryder, 2016. "The effect of tropical cyclones on climate change engagement," Climatic Change, Springer, vol. 135(3), pages 625-638, April.
    18. Smales, L.A., 2021. "Investor attention and global market returns during the COVID-19 crisis," International Review of Financial Analysis, Elsevier, vol. 73(C).
    19. Oestmann Marco & Bennöhr Lars, 2015. "Determinants of house price dynamics. What can we learn from search engine data?," Review of Economics, De Gruyter, vol. 66(1), pages 99-127, April.
    20. Georg von Graevenitz & Christian Helmers & Valentine Millot & Oliver Turnbull, 2016. "Does Online Search Predict Sales? Evidence from Big Data for Car Markets in Germany and the UK," Working Papers 71, Queen Mary, University of London, School of Business and Management, Centre for Globalisation Research.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jforec:v:5:y:2023:i:1:p:15-296:d:1082814. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.