IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2503.03602.html
   My bibliography  Save this paper

When Should we Expect Non-Decreasing Returns from Data in Prediction Tasks?

Author

Listed:
  • Maximilian Schaefer

Abstract

This article studies the change in the prediction accuracy of a response variable when the number of predictors increases, and all variables follow a multivariate normal distribution. Assuming that the correlations between variables are independently drawn, I show that adding variables leads to globally increasing returns to scale when the mean of the correlation distribution is zero. The speed of learning depends positively on the variance of the correlation distribution. I use simulations to study the more complex case of correlation distributions with a non-zero mean and find a pattern of decreasing returns followed by increasing returns to scale - as long as the variance of correlations is not degenerate, in which case globally decreasing returns emerge. I train a collaborative filtering algorithm using the MovieLens 1M dataset to analyze returns from adding variables in a more realistic setting and find globally increasing returns to scale across $2,000$ variables. The results suggest significant scale advantages from additional variables in prediction tasks.

Suggested Citation

  • Maximilian Schaefer, 2025. "When Should we Expect Non-Decreasing Returns from Data in Prediction Tasks?," Papers 2503.03602, arXiv.org.
  • Handle: RePEc:arx:papers:2503.03602
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2503.03602
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrei Hagiu & Julian Wright, 2023. "Data‐enabled learning, network effects, and competitive advantage," RAND Journal of Economics, RAND Corporation, vol. 54(4), pages 638-667, December.
    2. Daron Acemoglu & Ali Makhdoumi & Azarakhsh Malekian & Asu Ozdaglar, 2022. "Too Much Data: Prices and Inefficiencies in Data Markets," American Economic Journal: Microeconomics, American Economic Association, vol. 14(4), pages 218-256, November.
    3. Charles I. Jones & Christopher Tonetti, 2020. "Nonrivalry and the Economics of Data," American Economic Review, American Economic Association, vol. 110(9), pages 2819-2858, September.
    4. Dirk Bergemann & Alessandro Bonatti & Tan Gan, 2022. "The economics of social data," RAND Journal of Economics, RAND Corporation, vol. 53(2), pages 263-296, June.
    5. Christian Peukert & Ananya Sen & Jörg Claussen, 2024. "The Editor and the Algorithm: Recommendation Technology in Online News," Management Science, INFORMS, vol. 70(9), pages 5816-5831, September.
    6. Tobias J. Klein & Madina Kurmangaliyeva & Jens Prufer & Patricia Prufer, 2022. "How important are user-generated data for search result quality? Experimental evidence," University of East Anglia School of Economics Working Paper Series 2022-07, School of Economics, University of East Anglia, Norwich, UK..
    7. Dirk Bergemann & Jacques Crémer & David Dinielli & Carl-Christian Groh & Paul Heidhues & Maximilian Schäfer & Monika Schnitzer & Fiona Scott Morton & Katja Seim & Michael Sullivan, 2023. "Market design for personal data," Post-Print hal-04213208, HAL.
      • Jacques Crémer & Dirk Bergemann & David Dinielli & Carl-Christian Groh & Paul Heidhues & Maximilian Schäfer & Monika Schnitzer & Fiona Scott Morton & Katja Seim & Michael Sullivan, 2023. "Market Design for Personal Data," Post-Print hal-04810695, HAL.
    8. Maryam Farboodi & Roxana Mihet & Thomas Philippon & Laura Veldkamp, 2019. "Big Data and Firm Dynamics," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 38-42, May.
    9. Ali Merali, 2024. "Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation," Papers 2409.02391, arXiv.org, revised Dec 2024.
    10. Chade, Hector & Schlee, Edward, 2002. "Another Look at the Radner-Stiglitz Nonconcavity in the Value of Information," Journal of Economic Theory, Elsevier, vol. 107(2), pages 421-452, December.
    11. Luis Aguiar & Christian Peukert & Maximilian Schäfer & Hannes Ullrich, 2022. "Facebook Shadow Profiles," Discussion Papers of DIW Berlin 1998, DIW Berlin, German Institute for Economic Research.
    12. Keppo, Jussi & Moscarini, Giuseppe & Smith, Lones, 2008. "The demand for information: More heat than light," Journal of Economic Theory, Elsevier, vol. 138(1), pages 21-50, January.
    13. Catherine Tucker, 2019. "Digital Data, Platforms and the Usual [Antitrust] Suspects: Network Effects, Switching Costs, Essential Facility," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 54(4), pages 683-694, June.
    14. Lesley Chiou & Catherine Tucker, 2017. "Search Engines and Data Retention: Implications for Privacy and Antitrust," NBER Working Papers 23815, National Bureau of Economic Research, Inc.
    15. Eduardo M. Azevedo & Alex Deng & José Luis Montiel Olea & Justin Rao & E. Glen Weyl, 2020. "A/B Testing with Fat Tails," Journal of Political Economy, University of Chicago Press, vol. 128(12), pages 4614-4000.
    16. Patrick Bajari & Victor Chernozhukov & Ali Hortaçsu & Junichi Suzuki, 2019. "The Impact of Big Data on Firm Performance: An Empirical Investigation," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 33-37, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bruno Carballa Smichowski & Yassine Lefouili & Andrea Mantovani & Carlo Reggiani, 2025. "Data Sharing or Analytics Sharing ?," Working Papers hal-04956937, HAL.
    2. Schaefer, Maximilian & Sapi, Geza, 2023. "Complementarities in learning from data: Insights from general search," Information Economics and Policy, Elsevier, vol. 65(C).
    3. Ehsan Valavi & Joel Hestness & Newsha Ardalani & Marco Iansiti, 2022. "Time and the Value of Data," Papers 2203.09118, arXiv.org.
    4. Bergemann, Dirk & Ottaviani, Marco, 2021. "Information Markets and Nonmarkets," CEPR Discussion Papers 16459, C.E.P.R. Discussion Papers.
    5. Georgios Petropoulos & Bertin Martens & Geoffrey Parker & Marshall Van Alstyne, 2023. "Platform Competition and Information Sharing," CESifo Working Paper Series 10663, CESifo.
    6. MARTENS Bertin, 2020. "An economic perspective on data and platform market power," JRC Working Papers on Digital Economy 2020-09, Joint Research Centre.
    7. Yosuke Uno & Akira Sonoda & Masaki Bessho, 2021. "The Economics of Privacy: A Primer Especially for Policymakers," Bank of Japan Working Paper Series 21-E-11, Bank of Japan.
    8. Flavio Pino, 2022. "The microeconomics of data – a survey," Economia e Politica Industriale: Journal of Industrial and Business Economics, Springer;Associazione Amici di Economia e Politica Industriale, vol. 49(3), pages 635-665, September.
    9. Catherine E. Tucker, 2023. "The Economics of Privacy: An Agenda," NBER Chapters, in: The Economics of Privacy, National Bureau of Economic Research, Inc.
    10. S Nageeb Ali & Greg Lewis & Shoshana Vasserman, 2023. "Voluntary Disclosure and Personalized Pricing," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 90(2), pages 538-571.
    11. Yiquan Gu & Leonardo Madio & Carlo Reggiani, 2022. "Data brokers co-opetition [The impact of big data on firm performance: an empirical investigation]," Oxford Economic Papers, Oxford University Press, vol. 74(3), pages 820-839.
    12. Francesco Angelini & Luca V. Ballestra & Massimiliano Castellani, 2022. "Digital leisure and the gig economy: a two-sector model of growth," Papers 2212.02119, arXiv.org.
    13. Olivier Armantier & Sebastian Doerr & Jon Frost & Andreas Fuster & Kelly Shue, 2024. "Nothing to hide? Gender and age differences in willingness to share data," Swiss Finance Institute Research Paper Series 24-99, Swiss Finance Institute.
    14. Bertin Martens & Alexandre de Streel & Inge Graef & Thomas Tombal & Nestor Duch-Brown, 2020. "Business-to-Business data sharing: An economic and legal analysis," JRC Working Papers on Digital Economy 2020-05, Joint Research Centre.
    15. Guy Aridor & Yeon-Koo Che & Tobias Salz, 2020. "The Effect of Privacy Regulation on the Data Industry: Empirical Evidence from GDPR," NBER Working Papers 26900, National Bureau of Economic Research, Inc.
    16. Wim Naudé & Ricardo Vinuesa, 2020. "Data, global development, and COVID-19: Lessons and consequences," WIDER Working Paper Series wp-2020-109, World Institute for Development Economic Research (UNU-WIDER).
    17. Laura Abrardi & Carlo Cambini & Laura Rondi, 2022. "Artificial intelligence, firms and consumer behavior: A survey," Journal of Economic Surveys, Wiley Blackwell, vol. 36(4), pages 969-991, September.
    18. Steffen, Nico & Wiewiorra, Lukas & Kroon, Peter, 2021. "Wettbewerb und Regulierung in der Plattform- und Datenökonomie," WIK Discussion Papers 481, WIK Wissenschaftliches Institut für Infrastruktur und Kommunikationsdienste GmbH.
    19. Dirk Bergemann & Alessandro Bonatti, 2024. "Data, Competition, and Digital Platforms," American Economic Review, American Economic Association, vol. 114(8), pages 2553-2595, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2503.03602. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.