IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2003.10014.html
   My bibliography  Save this paper

Reinforcement Learning in Economics and Finance

Author

Listed:
  • Arthur Charpentier
  • Romuald Elie
  • Carl Remlinger

Abstract

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy -- a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.

Suggested Citation

  • Arthur Charpentier & Romuald Elie & Carl Remlinger, 2020. "Reinforcement Learning in Economics and Finance," Papers 2003.10014, arXiv.org.
  • Handle: RePEc:arx:papers:2003.10014
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2003.10014
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
    2. Weitzman, Martin L, 1979. "Optimal Search for the Best Alternative," Econometrica, Econometric Society, vol. 47(3), pages 641-654, May.
    3. Fudenberg, Drew & Levine, David, 1998. "Learning in games," European Economic Review, Elsevier, vol. 42(3-5), pages 631-639, May.
    4. Bergemann, Dirk & Hege, Ulrich, 1998. "Venture capital financing, moral hazard, and learning," Journal of Banking & Finance, Elsevier, vol. 22(6-8), pages 703-735, August.
    5. Vira Semenova, 2018. "Machine Learning for Dynamic Discrete Choice," Papers 1808.02569, arXiv.org, revised Nov 2018.
    6. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
    7. Wolpin, Kenneth I, 1984. "An Estimable Dynamic Stochastic Model of Fertility and Child Mortality," Journal of Political Economy, University of Chicago Press, vol. 92(5), pages 852-874, October.
    8. Marcet, Albert & Sargent, Thomas J, 1989. "Convergence of Least-Squares Learning in Environments with Hidden State Variables and Private Information," Journal of Political Economy, University of Chicago Press, vol. 97(6), pages 1306-1322, December.
    9. Ed Hopkins, 2002. "Two Competing Models of How People Learn in Games," Econometrica, Econometric Society, vol. 70(6), pages 2141-2166, November.
    10. Victor Aguirregabiria & Pedro Mira, 2002. "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models," Econometrica, Econometric Society, vol. 70(4), pages 1519-1543, July.
    11. Simon F'ecamp & Joseph Mikael & Xavier Warin, 2019. "Risk management with machine-learning-based algorithms," Papers 1902.05287, arXiv.org, revised Aug 2020.
    12. Marcet, Albert & Sargent, Thomas J., 1989. "Convergence of least squares learning mechanisms in self-referential linear stochastic models," Journal of Economic Theory, Elsevier, vol. 48(2), pages 337-368, August.
    13. Xavier Gabaix, 2014. "A Sparsity-Based Model of Bounded Rationality," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1661-1710.
    14. Ulrich Doraszelski & Mark Satterthwaite, 2010. "Computable Markov‐perfect industry dynamics," RAND Journal of Economics, RAND Corporation, vol. 41(2), pages 215-243, June.
    15. Maximilian Kasy & Anja Sautmann, 2021. "Adaptive Treatment Assignment in Experiments for Policy Choice," Econometrica, Econometric Society, vol. 89(1), pages 113-132, January.
    16. Mitsuru Igami, 0. "Artificial intelligence as structural estimation: Deep Blue, Bonanza, and AlphaGo," Econometrics Journal, Royal Economic Society, vol. 23(3), pages 1-24.
    17. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    18. Dirk Bergemann & Ulrigh Hege, 2005. "The Financing of Innovation: Learning and Stopping," RAND Journal of Economics, The RAND Corporation, vol. 36(4), pages 719-752, Winter.
    19. Che‐Lin Su & Kenneth L. Judd, 2012. "Constrained Optimization Approaches to Estimation of Structural Models," Econometrica, Econometric Society, vol. 80(5), pages 2213-2230, September.
    20. V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
    21. Merrill M. Flood, 1956. "The Traveling-Salesman Problem," Operations Research, INFORMS, vol. 4(1), pages 61-75, February.
    22. Bernheim, B Douglas, 1984. "Rationalizable Strategic Behavior," Econometrica, Econometric Society, vol. 52(4), pages 1007-1028, July.
    23. Aumann, Robert J., 1997. "Rationality and Bounded Rationality," Games and Economic Behavior, Elsevier, vol. 21(1-2), pages 2-14, October.
    24. Nicola Gennaioli & Andrei Shleifer, 2010. "What Comes to Mind," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 125(4), pages 1399-1433.
    25. Dirk Bergemann & Ulrigh Hege, 2005. "The Financing of Innovation: Learning and Stopping," RAND Journal of Economics, The RAND Corporation, vol. 36(4), pages 719-752, Winter.
    26. Franke, Reiner, 2003. "Reinforcement learning in the El Farol model," Journal of Economic Behavior & Organization, Elsevier, vol. 51(3), pages 367-388, July.
    27. W. Brian Arthur, 1994. "Inductive Reasoning, Bounded Rationality and the Bar Problem," Working Papers 94-03-014, Santa Fe Institute.
    28. Bergemann, Dirk & Valimaki, Juuso, 1996. "Learning and Strategic Pricing," Econometrica, Econometric Society, vol. 64(5), pages 1125-1149, September.
    29. Bastien Baldacci & Iuliia Manziuk & Thibaut Mastrolia & Mathieu Rosenbaum, 2019. "Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach," Papers 1912.01129, arXiv.org.
    30. Rothschild, Michael, 1974. "A two-armed bandit theory of market pricing," Journal of Economic Theory, Elsevier, vol. 9(2), pages 185-202, October.
    31. Pakes, Ariel S, 1986. "Patents as Options: Some Estimates of the Value of Holding European Patent Stocks," Econometrica, Econometric Society, vol. 54(4), pages 755-784, July.
    32. Chaim Fershtman & Ariel Pakes, 2012. "Dynamic Games with Asymmetric Information: A Framework for Empirical Work," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 127(4), pages 1611-1661.
    33. Kiyotaki, Nobuhiro & Wright, Randall, 1989. "On Money as a Medium of Exchange," Journal of Political Economy, University of Chicago Press, vol. 97(4), pages 927-954, August.
    34. Horst, Ulrich, 2005. "Stationary equilibria in discounted stochastic games with weakly interacting players," Games and Economic Behavior, Elsevier, vol. 51(1), pages 83-108, April.
    35. Basci, Erdem, 1999. "Learning by imitation," Journal of Economic Dynamics and Control, Elsevier, vol. 23(9-10), pages 1569-1585, September.
    36. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    37. Richard Ericson & Ariel Pakes, 1995. "Markov-Perfect Industry Dynamics: A Framework for Empirical Work," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 62(1), pages 53-82.
    38. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    39. Miller, Robert A, 1984. "Job Matching and Occupational Choice," Journal of Political Economy, University of Chicago Press, vol. 92(6), pages 1086-1120, December.
    40. Arthur, W Brian, 1991. "Designing Economic Agents that Act Like Human Agents: A Behavioral Approach to Bounded Rationality," American Economic Review, American Economic Association, vol. 81(2), pages 353-359, May.
    41. Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, II: Price Competition, Kinked Demand Curves, and Edgeworth Cycles," Econometrica, Econometric Society, vol. 56(3), pages 571-599, May.
    42. Kanishka Misra & Eric M. Schwartz & Jacob Abernethy, 2019. "Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments," Marketing Science, INFORMS, vol. 38(2), pages 226-252, March.
    43. Lars Peter Hansen & Thomas J. Sargent, 2013. "Recursive Models of Dynamic Linear Economies," Economics Books, Princeton University Press, edition 1, number 10141.
    44. Waltman, Ludo & Kaymak, Uzay, 2008. "Q-learning agents in a Cournot oligopoly model," Journal of Economic Dynamics and Control, Elsevier, vol. 32(10), pages 3275-3293, October.
    45. Sumitra Ganesh & Nelson Vadori & Mengda Xu & Hua Zheng & Prashant Reddy & Manuela Veloso, 2019. "Reinforcement Learning for Market Making in a Multi-agent Dealer Market," Papers 1911.05892, arXiv.org.
    46. Jovanovic, Boyan, 1982. "Selection and the Evolution of Industry," Econometrica, Econometric Society, vol. 50(3), pages 649-670, May.
    47. Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, I: Overview and Quantity Competition with Large Fixed Costs," Econometrica, Econometric Society, vol. 56(3), pages 549-569, May.
    48. Tatsiana Levina & Yuri Levin & Jeff McGill & Mikhail Nediak, 2009. "Dynamic Pricing with Online Learning and Strategic Consumers: An Application of the Aggregating Algorithm," Operations Research, INFORMS, vol. 57(2), pages 327-341, April.
    49. Raghabendra Chattopadhyay & Esther Duflo, 2004. "Women as Policy Makers: Evidence from a Randomized Policy Experiment in India," Econometrica, Econometric Society, vol. 72(5), pages 1409-1443, September.
    50. Michael Schwind, 2007. "Dynamic Pricing and Automated Resource Allocation for Complex Information Services," Lecture Notes in Economics and Mathematical Systems, Springer, number 978-3-540-68003-1, July.
    51. G. A. Croes, 1958. "A Method for Solving Traveling-Salesman Problems," Operations Research, INFORMS, vol. 6(6), pages 791-812, December.
    52. Arthur, W Brian, 1994. "Inductive Reasoning and Bounded Rationality," American Economic Review, American Economic Association, vol. 84(2), pages 406-411, May.
    53. Thomas Spooner & John Fearnley & Rahul Savani & Andreas Koukorinis, 2018. "Market Making via Reinforcement Learning," Papers 1804.04216, arXiv.org.
    54. Svitlana Vyetrenko & Shaojie Xu, 2019. "Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response," Papers 1906.02312, arXiv.org, revised Jan 2021.
    55. Drew Fudenberg & David K. Levine, 1998. "The Theory of Learning in Games," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262061945, April.
    56. Ariel Pakes & Mark Schankerman, 1984. "The Rate of Obsolescence of Patents, Research Gestation Lags, and the Private Rate of Return to Research Resources," NBER Chapters, in: R&D, Patents, and Productivity, pages 73-88, National Bureau of Economic Research, Inc.
    57. Sargent, Thomas J., 1993. "Bounded Rationality in Macroeconomics: The Arne Ryde Memorial Lectures," OUP Catalogue, Oxford University Press, number 9780198288695.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Johann Lussange & Boris Gutkin, 2023. "Order book regulatory impact on stock market quality: a multi-agent reinforcement learning perspective," Papers 2302.04184, arXiv.org.
    2. Rui (Aruhan) Shi, 2021. "Learning from Zero: How to Make Consumption-Saving Decisions in a Stochastic Environment with an AI Algorithm," CESifo Working Paper Series 9255, CESifo.
    3. Andrew Paskaramoorthy & Terence van Zyl & Tim Gebbie, 2020. "A Framework for Online Investment Algorithms," Papers 2003.13360, arXiv.org.
    4. Rui & Shi, 2021. "Learning from zero: how to make consumption-saving decisions in a stochastic environment with an AI algorithm," Papers 2105.10099, arXiv.org, revised Feb 2022.
    5. Juan Manuel Sánchez-Cartas & Alberto Tejero & Gonzalo León, 2021. "Algorithmic Pricing and Price Gouging. Consequences of High-Impact, Low Probability Events," Sustainability, MDPI, vol. 13(5), pages 1-14, February.
    6. Ben Hambly & Renyuan Xu & Huining Yang, 2020. "Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon," Papers 2011.10300, arXiv.org, revised Jun 2021.
    7. Laura Leal & Mathieu Lauri`ere & Charles-Albert Lehalle, 2020. "Learning a functional control for high-frequency finance," Papers 2006.09611, arXiv.org, revised Feb 2021.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Charpentier & Romuald Élie & Carl Remlinger, 2023. "Reinforcement Learning in Economics and Finance," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 425-462, June.
    2. Victor Aguirregabiria & Jihye Jeon, 2020. "Firms’ Beliefs and Learning: Models, Identification, and Empirical Evidence," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 56(2), pages 203-235, March.
    3. Sebastian Galiani & Juan Pantano, 2021. "Structural Models: Inception and Frontier," NBER Working Papers 28698, National Bureau of Economic Research, Inc.
    4. Victor Aguirregabiria & Victor Aguirregabiria & Aviv Nevo & Aviv Nevo, 2010. "Recent Developments in Empirical IO: Dynamic Demand and Dynamic Games," Working Papers tecipa-419, University of Toronto, Department of Economics.
    5. Peter Arcidiacono & Paul B. Ellickson, 2011. "Practical Methods for Estimation of Dynamic Discrete Choice Models," Annual Review of Economics, Annual Reviews, vol. 3(1), pages 363-394, September.
    6. Duffy, John, 2006. "Agent-Based Models and Human Subject Experiments," Handbook of Computational Economics, in: Leigh Tesfatsion & Kenneth L. Judd (ed.), Handbook of Computational Economics, edition 1, volume 2, chapter 19, pages 949-1011, Elsevier.
    7. Andriy Norets, 2010. "Continuity and differentiability of expected value functions in dynamic discrete choice models," Quantitative Economics, Econometric Society, vol. 1(2), pages 305-322, November.
    8. Hanming Fang & Yang Wang, 2015. "Estimating Dynamic Discrete Choice Models With Hyperbolic Discounting, With An Application To Mammography Decisions," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 56(2), pages 565-596, May.
    9. Hiroyuki Kasahara & Katsumi Shimotsu, 2018. "Estimation of Discrete Choice Dynamic Programming Models," The Japanese Economic Review, Japanese Economic Association, vol. 69(1), pages 28-58, March.
    10. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
    11. Victor Aguirregabiria & Margaret Slade, 2017. "Empirical models of firms and industries," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 50(5), pages 1445-1488, December.
    12. Blevins, Jason R. & Kim, Minhae, 2024. "Nested Pseudo likelihood estimation of continuous-time dynamic discrete games," Journal of Econometrics, Elsevier, vol. 238(2).
    13. Jason R. Blevins & Wei Shi & Donald R. Haurin & Stephanie Moulton, 2020. "A Dynamic Discrete Choice Model Of Reverse Mortgage Borrower Behavior," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 61(4), pages 1437-1477, November.
    14. , & ,, 2010. "A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification," Theoretical Economics, Econometric Society, vol. 5(3), September.
    15. Kets, W., 2008. "Networks and learning in game theory," Other publications TiSEM 7713fce1-3131-498c-8c6f-3, Tilburg University, School of Economics and Management.
    16. Peter Arcidiacono & Robert A. Miller, 2011. "Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity," Econometrica, Econometric Society, vol. 79(6), pages 1823-1867, November.
    17. George‐Levi Gayle & Limor Golan & Mehmet A. Soytas, 2018. "Estimation of dynastic life‐cycle discrete choice models," Quantitative Economics, Econometric Society, vol. 9(3), pages 1195-1241, November.
    18. Gayle, George-Levi & Golan, Limor & Soytas, Mehmet A., 2022. "What is the source of the intergenerational correlation in earnings?," Journal of Monetary Economics, Elsevier, vol. 129(C), pages 24-45.
    19. Arcidiacono, Peter & Miller, Robert A., 2020. "Identifying dynamic discrete choice models off short panels," Journal of Econometrics, Elsevier, vol. 215(2), pages 473-485.
    20. Sara Amoroso, 2014. "The hidden costs of R&D collaboration," JRC Working Papers on Corporate R&D and Innovation 2014-02, Joint Research Centre.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2003.10014. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.