IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/23673.html
   My bibliography  Save this paper

Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data

Author

Listed:
  • Serena Ng

Abstract

This paper seeks to better understand what makes big data analysis different, what we can and cannot do with existing econometric tools, and what issues need to be dealt with in order to work with the data efficiently. As a case study, I set out to extract any business cycle information that might exist in four terabytes of weekly scanner data. The main challenge is to handle the volume, variety, and characteristics of the data within the constraints of our computing environment. Scalable and efficient algorithms are available to ease the computation burden, but they often have unknown statistical properties and are not designed for the purpose of efficient estimation or optimal inference. As well, economic data have unique characteristics that generic algorithms may not accommodate. There is a need for computationally efficient econometric methods as big data is likely here to stay.

Suggested Citation

  • Serena Ng, 2017. "Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data," NBER Working Papers 23673, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:23673
    Note: TWP
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w23673.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Liran Einav & Jonathan Levin, 2014. "The Data Revolution and Economic Analysis," Innovation Policy and the Economy, University of Chicago Press, vol. 14(1), pages 1-24.
    2. Gary Koop & Luca Onorante, 2019. "Macroeconomic Nowcasting Using Google Probabilities☆," Advances in Econometrics, in: Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part A, volume 40, pages 17-40, Emerald Group Publishing Limited.
    3. Alberto Cavallo & Eduardo Cavallo & Roberto Rigobon, 2014. "Prices and Supply Disruptions during Natural Disasters," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 60(S2), pages 449-471, November.
    4. Pierce, David A & Grupe, Michael R & Cleveland, William P, 1984. "Seasonal Adjustment of the Weekly Monetary Aggregates: A Model-based Approach," Journal of Business & Economic Statistics, American Statistical Association, vol. 2(3), pages 260-270, July.
    5. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    6. Dolan Antenucci & Michael Cafarella & Margaret Levenstein & Christopher Ré & Matthew D. Shapiro, 2014. "Using Social Media to Measure Labor Market Flows," NBER Working Papers 20010, National Bureau of Economic Research, Inc.
    7. Judith A. Chevalier & Anil K. Kashyap & Peter E. Rossi, 2003. "Why Don't Prices Rise During Periods of Peak Demand? Evidence from Scanner Data," American Economic Review, American Economic Association, vol. 93(1), pages 15-37, March.
    8. Athey, Susan & Imbens, Guido W., 2015. "Machine Learning for Estimating Heterogeneous Causal Effects," Research Papers 3350, Stanford University, Graduate School of Business.
    9. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    10. Olivier Coibion & Yuriy Gorodnichenko & Gee Hee Hong, 2015. "The Cyclicality of Sales, Regular and Effective Prices: Business Cycle and Policy Implications," American Economic Review, American Economic Association, vol. 105(3), pages 993-1029, March.
    11. I. T. Jolliffe, 1972. "Discarding Variables in a Principal Component Analysis. I: Artificial Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 21(2), pages 160-173, June.
    12. William S. Cleveland, 2001. "Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics," International Statistical Review, International Statistical Institute, vol. 69(1), pages 21-26, April.
    13. Jessie Handbury & Tsutomu Watanabe & David E. Weinstein, 2013. "How Much Do Official Price Indexes Tell Us about Inflation?," NBER Working Papers 19504, National Bureau of Economic Research, Inc.
    14. Jonathan H. Wright, 2013. "Unseasonal Seasonals?," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 47(2 (Fall)), pages 65-126.
    15. Harvey, Andrew & Koopman, Siem Jan & Riani, Marco, 1997. "The Modeling and Seasonal Adjustment of Weekly Observations," Journal of Business & Economic Statistics, American Statistical Association, vol. 15(3), pages 354-368, July.
    16. Hyunyoung Choi & Hal Varian, 2012. "Predicting the Present with Google Trends," The Economic Record, The Economic Society of Australia, vol. 88(s1), pages 2-9, June.
    17. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    18. Christian Broda & Ephraim Leibtag & David E. Weinstein, 2009. "The Role of Prices in Measuring the Poor's Living Standards," Journal of Economic Perspectives, American Economic Association, vol. 23(2), pages 77-97, Spring.
    19. Jonathan H. Wright, 2013. "Unseasonal Seasonals?," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 44(2 (Fall)), pages 65-126.
    Full references (including those not matched with items on IDEAS)

    Citations

    RePEc Biblio mentions

    As found on the RePEc Biblio, the curated bibliography for Economics:
    1. > Econometrics > Big Data

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Miranda-Zanetti, Maximilano & Delbianco, Fernando & Tohmé, Fernando, 2019. "Tampering with inflation data: A Benford law-based analysis of national statistics in Argentina," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 525(C), pages 761-770.
    2. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    3. Plakandaras, Vasilios & Gogas, Periklis & Papadimitriou, Theophilos & Gupta, Rangan, 2019. "A re-evaluation of the term spread as a leading indicator," International Review of Economics & Finance, Elsevier, vol. 64(C), pages 476-492.
    4. Sokbae Lee & Serena Ng, 2020. "An Econometric Perspective on Algorithmic Subsampling," Annual Review of Economics, Annual Reviews, vol. 12(1), pages 45-80, August.
    5. Laurent Ferrara & Anna Simoni, 2023. "When are Google Data Useful to Nowcast GDP? An Approach via Preselection and Shrinkage," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(4), pages 1188-1202, October.
    6. Daníelsson, Jón & Macrae, Robert & Uthemann, Andreas, 2022. "Artificial intelligence and systemic risk," Journal of Banking & Finance, Elsevier, vol. 140(C).
    7. Harold D. Chiang & Jiatong Li & Yuya Sasaki, 2021. "Algorithmic subsampling under multiway clustering," Papers 2103.00557, arXiv.org, revised Oct 2022.
    8. Bluhm, Benjamin & Cutura, Jannic, 2020. "Econometrics at scale: Spark up big data in economics," SAFE Working Paper Series 266, Leibniz Institute for Financial Research SAFE.
    9. Tao Zou & Xian Li & Xuan Liang & Hansheng Wang, 2021. "On the Subbagging Estimation for Massive Data," Papers 2103.00631, arXiv.org.
    10. Jun Yu & HaiYing Wang, 2022. "Subdata selection algorithm for linear model discrimination," Statistical Papers, Springer, vol. 63(6), pages 1883-1906, December.
    11. Rishab Guha & Serena Ng, 2019. "A Machine Learning Analysis of Seasonal and Cyclical Sales in Weekly Scanner Data," NBER Chapters, in: Big Data for Twenty-First-Century Economic Statistics, pages 403-436, National Bureau of Economic Research, Inc.
    12. Gonzalo, Jesús & Pitarakis, Jean-Yves, 2021. "Spurious relationships in high-dimensional systems with strong or mild persistence," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1480-1497.
    13. Serena Ng & Susannah Scanlan, 2023. "Constructing High Frequency Economic Indicators by Imputation," Papers 2303.01863, arXiv.org, revised Oct 2023.
    14. Christopher Dobronyi & Christian Gouri'eroux, 2020. "Consumer Theory with Non-Parametric Taste Uncertainty and Individual Heterogeneity," Papers 2010.13937, arXiv.org, revised Jan 2021.
    15. Dekimpe, Marnik G., 2020. "Retailing and retailing research in the age of big data analytics," International Journal of Research in Marketing, Elsevier, vol. 37(1), pages 3-14.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Götz, Thomas B. & Knetsch, Thomas A., 2019. "Google data in bridge equation models for German GDP," International Journal of Forecasting, Elsevier, vol. 35(1), pages 45-66.
    2. Tuhkuri, Joonas, 2016. "Forecasting Unemployment with Google Searches," ETLA Working Papers 35, The Research Institute of the Finnish Economy.
    3. Nathan, Max & Rosso, Anna, 2015. "Mapping digital businesses with big data: Some early findings from the UK," Research Policy, Elsevier, vol. 44(9), pages 1714-1733.
    4. Etienne Gagnon & David López-Salido, 2020. "Small Price Responses to Large Demand Shocks," Journal of the European Economic Association, European Economic Association, vol. 18(2), pages 792-828.
    5. Georg von Graevenitz & Christian Helmers & Valentine Millot & Oliver Turnbull, 2016. "Does Online Search Predict Sales? Evidence from Big Data for Car Markets in Germany and the UK," Working Papers 71, Queen Mary, University of London, School of Business and Management, Centre for Globalisation Research.
    6. Jian Gao & Tao Zhou, 2017. "Quantifying China's Regional Economic Complexity," Papers 1703.01292, arXiv.org, revised Nov 2017.
    7. Max Nathan & Anna Rosso, 2014. "Mapping Information Economy Businesses with Big Data: Findings for the UK," CEP Occasional Papers 44, Centre for Economic Performance, LSE.
    8. Poza, Carlos & Monge, Manuel, 2020. "A real time leading economic indicator based on text mining for the Spanish economy. Fractional cointegration VAR and Continuous Wavelet Transform analysis," International Economics, Elsevier, vol. 163(C), pages 163-175.
    9. Konstantinos N. Konstantakis & Despoina Paraskeuopoulou & Panayotis G. Michaelides & Efthymios G. Tsionas, 2021. "Bank deposits and Google searches in a crisis economy: Bayesian non‐linear evidence for Greece (2009–2015)," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 26(4), pages 5408-5424, October.
    10. Tuhkuri, Joonas, 2016. "ETLAnow: A Model for Forecasting with Big Data – Forecasting Unemployment with Google Searches in Europe," ETLA Reports 54, The Research Institute of the Finnish Economy.
    11. Nathan, Max & Rosso, Anna, 2014. "Mapping information economy businesses with big data: findings from the UK," LSE Research Online Documents on Economics 60615, London School of Economics and Political Science, LSE Library.
    12. Christopher Hansman & Harrison Hong & Áureo de Paula & Vishal Singh, 2020. "A Sticky-Price View of Hoarding," NBER Working Papers 27051, National Bureau of Economic Research, Inc.
    13. James T. E. Chapman & Ajit Desai, 2023. "Macroeconomic Predictions Using Payments Data and Machine Learning," Forecasting, MDPI, vol. 5(4), pages 1-32, November.
    14. Resce, Giuliano & Maynard, Diana, 2018. "What matters most to people around the world? Retrieving Better Life Index priorities on Twitter," Technological Forecasting and Social Change, Elsevier, vol. 137(C), pages 61-75.
    15. Khai Xiang Chiong & Matthew Shum, 2019. "Random Projection Estimation of Discrete-Choice Models with Large Choice Sets," Management Science, INFORMS, vol. 65(1), pages 256-271, January.
    16. Levent Bulut, 2015. "Google Trends and Forecasting Performance of Exchange Rate Models," IPEK Working Papers 1505, Ipek University, Department of Economics.
    17. Glandon, PJ, 2018. "Sales and the (Mis)measurement of price level fluctuations," Journal of Macroeconomics, Elsevier, vol. 58(C), pages 60-77.
    18. Böhme, Marcus H. & Gröger, André & Stöhr, Tobias, 2020. "Searching for a better life: Predicting international migration with online search keywords," Journal of Development Economics, Elsevier, vol. 142(C).
    19. van der Wielen, Wouter & Barrios, Salvador, 2021. "Economic sentiment during the COVID pandemic: Evidence from search behaviour in the EU," Journal of Economics and Business, Elsevier, vol. 115(C).
    20. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.

    More about this item

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:23673. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.