IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2203.09118.html
   My bibliography  Save this paper

Time and the Value of Data

Author

Listed:
  • Ehsan Valavi
  • Joel Hestness
  • Newsha Ardalani
  • Marco Iansiti

Abstract

Managers often believe that collecting more data will continually improve the accuracy of their machine learning models. However, we argue in this paper that when data lose relevance over time, it may be optimal to collect a limited amount of recent data instead of keeping around an infinite supply of older (less relevant) data. In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy. Expectedly, the model's accuracy improves by increasing the flow of data (defined as data collection rate); however, it requires other tradeoffs in terms of refreshing or retraining machine learning models more frequently. Using these results, we investigate how the business value created by machine learning models scales with data and when the stock of data establishes a sustainable competitive advantage. We argue that data's time-dependency weakens the barrier to entry that the stock of data creates. As a result, a competing firm equipped with a limited (yet sufficient) amount of recent data can develop more accurate models. This result, coupled with the fact that older datasets may deteriorate models' accuracy, suggests that created business value doesn't scale with the stock of available data unless the firm offloads less relevant data from its data repository. Consequently, a firm's growth policy should incorporate a balance between the stock of historical data and the flow of new data. We complement our theoretical results with an experiment. In the experiment, we empirically measure the loss in the accuracy of a next word prediction model trained on datasets from various time periods. Our empirical measurements confirm the economic significance of the value decline over time. For example, 100MB of text data, after seven years, becomes as valuable as 50MB of current data for the next word prediction task.

Suggested Citation

  • Ehsan Valavi & Joel Hestness & Newsha Ardalani & Marco Iansiti, 2022. "Time and the Value of Data," Papers 2203.09118, arXiv.org.
  • Handle: RePEc:arx:papers:2203.09118
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2203.09118
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Charles I. Jones & Christopher Tonetti, 2020. "Nonrivalry and the Economics of Data," American Economic Review, American Economic Association, vol. 110(9), pages 2819-2858, September.
    2. Ajay Agrawal & Joshua Gans & Avi Goldfarb, 2019. "Economic Policy for Artificial Intelligence," Innovation Policy and the Economy, University of Chicago Press, vol. 19(1), pages 139-159.
    3. Dirk Bergemann & Alessandro Bonatti & Tan Gan, 2022. "The economics of social data," RAND Journal of Economics, RAND Corporation, vol. 53(2), pages 263-296, June.
    4. Maximilian Schäfer & Geza Sapi & Szabolcs Lorincz, 2018. "The Effect of Big Data on Recommendation Quality: The Example of Internet Search," Discussion Papers of DIW Berlin 1730, DIW Berlin, German Institute for Economic Research.
    5. Juliane Begenau & Maryam Farboodi & Laura Veldkamp, 2018. "Big Data in Finance and the Growth of Large Firms," Working Papers 18-08, New York University, Leonard N. Stern School of Business, Department of Economics.
    6. Juliane Begenau & Maryam Farboodi & Laura Veldkamp, 2018. "Big Data in Finance and the Growth of Large Firms," NBER Working Papers 24550, National Bureau of Economic Research, Inc.
    7. Maryam Farboodi & Roxana Mihet & Thomas Philippon & Laura Veldkamp, 2019. "Big Data and Firm Dynamics," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 38-42, May.
    8. Arnold, René & Marcus, J. Scott & Petropoulos, Georgios & Schneider, Anna, 2018. "Is data the new oil? Diminishing returns to scale," 29th European Regional ITS Conference, Trento 2018 184927, International Telecommunications Society (ITS).
    9. Maryam Farboodi & Laura Veldkamp, 2021. "A Model of the Data Economy," NBER Working Papers 28427, National Bureau of Economic Research, Inc.
    10. Mr. Yan Carriere-Swallow & Mr. V. Haksar, 2019. "The Economics and Implications of Data: An Integrated Perspective," IMF Departmental Papers / Policy Papers 2019/013, International Monetary Fund.
    11. Erik Brynjolfsson & Tom Mitchell & Daniel Rock, 2018. "What Can Machines Learn, and What Does It Mean for Occupations and the Economy?," AEA Papers and Proceedings, American Economic Association, vol. 108, pages 43-47, May.
    12. de Cornière, Alexandre & Taylor, Greg, 2022. "Data and Competition: a Simple Framework with Applications to Mergers and Market Structure," CEPR Discussion Papers 14446, C.E.P.R. Discussion Papers.
    13. Juliane Begenau & Laura Veldkamp & Maryam Farboodi, 2018. "Big Data in Finance and the Growth of Large Firms," 2018 Meeting Papers 155, Society for Economic Dynamics.
    14. Ichihashi, Shota, 2021. "The economics of data externalities," Journal of Economic Theory, Elsevier, vol. 196(C).
    15. Lesley Chiou & Catherine Tucker, 2017. "Search Engines and Data Retention: Implications for Privacy and Antitrust," NBER Working Papers 23815, National Bureau of Economic Research, Inc.
    16. Begenau, Juliane & Farboodi, Maryam & Veldkamp, Laura, 2018. "Big data in finance and the growth of large firms," Journal of Monetary Economics, Elsevier, vol. 97(C), pages 71-87.
    17. Patrick Bajari & Victor Chernozhukov & Ali Hortaçsu & Junichi Suzuki, 2019. "The Impact of Big Data on Firm Performance: An Empirical Investigation," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 33-37, May.
    18. Imke Reimers & Benjamin R. Shiller, 2018. "Welfare Implications of Proprietary Data Collection: An Application to Telematics in Auto Insurance," Working Papers 119R, Brandeis University, Department of Economics and International Business School, revised May 2018.
    19. de Cornière, Alexandre & Taylor, Greg, 2020. "Data and Competition: a General Framework with Applications to Mergers, Market Structure, and Privacy Policy," TSE Working Papers 20-1076, Toulouse School of Economics (TSE).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Liu, Rui & Zheng, Linhao & Chen, Zheang & Cheng, Mengyao & Ren, Yuzhuo, 2024. "Digitalization through supply chains: Evidence from the customer concentration of Chinese listed companies," Economic Modelling, Elsevier, vol. 134(C).
    2. Walther, Ansgar & Uettwiller, Antoine, 2019. "The Market for Data Privacy," CEPR Discussion Papers 13588, C.E.P.R. Discussion Papers.
    3. Yiquan Gu & Leonardo Madio & Carlo Reggiani, 2022. "Data brokers co-opetition [The impact of big data on firm performance: an empirical investigation]," Oxford Economic Papers, Oxford University Press, vol. 74(3), pages 820-839.
    4. Isaac Baley & Laura Veldkamp, 2021. "Bayesian learning," Economics Working Papers 1797, Department of Economics and Business, Universitat Pompeu Fabra.
    5. Ufuk Akcigit & Sina T. Ates, 2021. "Ten Facts on Declining Business Dynamism and Lessons from Endogenous Growth Theory," American Economic Journal: Macroeconomics, American Economic Association, vol. 13(1), pages 257-298, January.
    6. Luo, Sumei & Sun, Yongkun & Zhou, Rui, 2022. "Can fintech innovation promote household consumption? Evidence from China family panel studies," International Review of Financial Analysis, Elsevier, vol. 82(C).
    7. Georgios Petropoulos & Bertin Martens & Geoffrey Parker & Marshall Van Alstyne, 2023. "Platform Competition and Information Sharing," CESifo Working Paper Series 10663, CESifo.
    8. MARTENS Bertin, 2020. "An economic perspective on data and platform market power," JRC Working Papers on Digital Economy 2020-09, Joint Research Centre.
    9. Flavio Pino, 2022. "The microeconomics of data – a survey," Economia e Politica Industriale: Journal of Industrial and Business Economics, Springer;Associazione Amici di Economia e Politica Industriale, vol. 49(3), pages 635-665, September.
    10. Tadas Limba & Andrejus Novikovas & Andrius Stankevičius & Antanas Andrulevičius & Manuela Tvaronavičienė, 2020. "Big Data Manifestation in Municipal Waste Management and Cryptocurrency Sectors: Positive and Negative Implementation Factors," Sustainability, MDPI, vol. 12(7), pages 1-14, April.
    11. Jennie Bai & Massimo Massa, 2021. "Is Human-Interaction-based Information Substitutable? Evidence from Lockdown," NBER Working Papers 29513, National Bureau of Economic Research, Inc.
    12. Zhou, Zhongsheng & Li, Zhuo, 2023. "Corporate digital transformation and trade credit financing," Journal of Business Research, Elsevier, vol. 160(C).
    13. Peress, Joel & Schmidt, Daniel, 2021. "Noise traders incarnate: Describing a realistic noise trading process," Journal of Financial Markets, Elsevier, vol. 54(C).
    14. Hong, Liu & Nikbakht, Ehsan & Zhou, Tianpeng, 2023. "Does product market competition affect the adoption of FinTech by non-financial firms?," Finance Research Letters, Elsevier, vol. 54(C).
    15. Shiyang Huang & Yan Xiong & Liyan Yang, 2022. "Skill Acquisition and Data Sales," Management Science, INFORMS, vol. 68(8), pages 6116-6144, August.
    16. Bergemann, Dirk & Ottaviani, Marco, 2021. "Information Markets and Nonmarkets," CEPR Discussion Papers 16459, C.E.P.R. Discussion Papers.
    17. Zhou, Xi & Chen, Shou, 2021. "FinTech innovation regulation based on reputation theory with the participation of new media," Pacific-Basin Finance Journal, Elsevier, vol. 67(C).
    18. Anh Tuan Bui & Thu Phuong Pham, 2021. "Financial and Labour Obstacles and Firm Employment: Evidence from Europe and Central Asia Firms," Sustainability, MDPI, vol. 13(15), pages 1-18, August.
    19. Sehwa Kim & Seil Kim & Anya V. Kleymenova & Rongchen Li, 2023. "Current Expected Credit Losses (CECL) Standard and Banks' Information Production," Finance and Economics Discussion Series 2023-063, Board of Governors of the Federal Reserve System (U.S.).
    20. Adolfo Figueroa, 2019. "Do Market Prices Reflect Real Scarcity? Theories and Facts," Revista Economía, Fondo Editorial - Pontificia Universidad Católica del Perú, vol. 42(83), pages 54-74.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2203.09118. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.