IDEAS home Printed from https://ideas.repec.org/a/kap/compec/v62y2023i1d10.1007_s10614-021-10119-4.html
   My bibliography  Save this article

Reinforcement Learning in Economics and Finance

Author

Listed:
  • Arthur Charpentier

    (Université du Québec à Montréal (UQAM))

  • Romuald Élie

    (LAMA, Université Gustave Eiffel, CNRS)

  • Carl Remlinger

    (LAMA, Université Gustave Eiffel, CNRS)

Abstract

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy – a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.

Suggested Citation

  • Arthur Charpentier & Romuald Élie & Carl Remlinger, 2023. "Reinforcement Learning in Economics and Finance," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 425-462, June.
  • Handle: RePEc:kap:compec:v:62:y:2023:i:1:d:10.1007_s10614-021-10119-4
    DOI: 10.1007/s10614-021-10119-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10614-021-10119-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10614-021-10119-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
    2. Fudenberg, Drew & Levine, David, 1998. "Learning in games," European Economic Review, Elsevier, vol. 42(3-5), pages 631-639, May.
    3. Victor Aguirregabiria & Pedro Mira, 2002. "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models," Econometrica, Econometric Society, vol. 70(4), pages 1519-1543, July.
    4. Marcet, Albert & Sargent, Thomas J., 1989. "Convergence of least squares learning mechanisms in self-referential linear stochastic models," Journal of Economic Theory, Elsevier, vol. 48(2), pages 337-368, August.
    5. Thierry Magnac & David Thesmar, 2002. "Identifying Dynamic Discrete Decision Processes," Econometrica, Econometric Society, vol. 70(2), pages 801-816, March.
    6. Che‐Lin Su & Kenneth L. Judd, 2012. "Constrained Optimization Approaches to Estimation of Structural Models," Econometrica, Econometric Society, vol. 80(5), pages 2213-2230, September.
    7. V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
    8. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
    9. Franke, Reiner, 2003. "Reinforcement learning in the El Farol model," Journal of Economic Behavior & Organization, Elsevier, vol. 51(3), pages 367-388, July.
    10. Bergemann, Dirk & Valimaki, Juuso, 1996. "Learning and Strategic Pricing," Econometrica, Econometric Society, vol. 64(5), pages 1125-1149, September.
    11. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2018. "Econometrics and Machine Learning," Economie et Statistique / Economics and Statistics, Institut National de la Statistique et des Etudes Economiques (INSEE), issue 505-506, pages 147-169.
    12. Rothschild, Michael, 1974. "A two-armed bandit theory of market pricing," Journal of Economic Theory, Elsevier, vol. 9(2), pages 185-202, October.
    13. Dirk Bergemann & Ulrigh Hege, 2005. "The Financing of Innovation: Learning and Stopping," RAND Journal of Economics, The RAND Corporation, vol. 36(4), pages 719-752, Winter.
    14. Kiyotaki, Nobuhiro & Wright, Randall, 1989. "On Money as a Medium of Exchange," Journal of Political Economy, University of Chicago Press, vol. 97(4), pages 927-954, August.
    15. Richard Ericson & Ariel Pakes, 1995. "Markov-Perfect Industry Dynamics: A Framework for Empirical Work," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 62(1), pages 53-82.
    16. Miller, Robert A, 1984. "Job Matching and Occupational Choice," Journal of Political Economy, University of Chicago Press, vol. 92(6), pages 1086-1120, December.
    17. Arthur, W Brian, 1991. "Designing Economic Agents that Act Like Human Agents: A Behavioral Approach to Bounded Rationality," American Economic Review, American Economic Association, vol. 81(2), pages 353-359, May.
    18. Bernheim, B Douglas, 1984. "Rationalizable Strategic Behavior," Econometrica, Econometric Society, vol. 52(4), pages 1007-1028, July.
    19. Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, I: Overview and Quantity Competition with Large Fixed Costs," Econometrica, Econometric Society, vol. 56(3), pages 549-569, May.
    20. Tilman Börgers & Antonio J. Morales & Rajiv Sarin, 2004. "Expedient and Monotone Learning Rules," Econometrica, Econometric Society, vol. 72(2), pages 383-405, March.
    21. Arthur, W Brian, 1994. "Inductive Reasoning and Bounded Rationality," American Economic Review, American Economic Association, vol. 84(2), pages 406-411, May.
    22. Ed Hopkins, 2002. "Two Competing Models of How People Learn in Games," Econometrica, Econometric Society, vol. 70(6), pages 2141-2166, November.
    23. Sinitskaya, Ekaterina & Tesfatsion, Leigh, 2015. "Macroeconomies as constructively rational games," Journal of Economic Dynamics and Control, Elsevier, vol. 61(C), pages 152-182.
    24. Vira Semenova, 2018. "Machine Learning for Dynamic Discrete Choice," Papers 1808.02569, arXiv.org, revised Nov 2018.
    25. Rustichini, Aldo, 1999. "Optimal Properties of Stimulus--Response Learning Models," Games and Economic Behavior, Elsevier, vol. 29(1-2), pages 244-273, October.
    26. Maximilian Kasy & Anja Sautmann, 2021. "Adaptive Treatment Assignment in Experiments for Policy Choice," Econometrica, Econometric Society, vol. 89(1), pages 113-132, January.
    27. Xavier Gabaix, 2014. "A Sparsity-Based Model of Bounded Rationality," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1661-1710.
    28. Harald Uhlig & Martin Lettau, 1999. "Rules of Thumb versus Dynamic Programming," American Economic Review, American Economic Association, vol. 89(1), pages 148-174, March.
    29. McLennan, Andrew, 1984. "Price dispersion and incomplete learning in the long run," Journal of Economic Dynamics and Control, Elsevier, vol. 7(3), pages 331-347, September.
    30. Horst, Ulrich, 2005. "Stationary equilibria in discounted stochastic games with weakly interacting players," Games and Economic Behavior, Elsevier, vol. 51(1), pages 83-108, April.
    31. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    32. Koichiro Ito & Mar Reguant, 2016. "Sequential Markets, Market Power, and Arbitrage," American Economic Review, American Economic Association, vol. 106(7), pages 1921-1957, July.
    33. Raghabendra Chattopadhyay & Esther Duflo, 2004. "Women as Policy Makers: Evidence from a Randomized Policy Experiment in India," Econometrica, Econometric Society, vol. 72(5), pages 1409-1443, September.
    34. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    35. Thomas Spooner & John Fearnley & Rahul Savani & Andreas Koukorinis, 2018. "Market Making via Reinforcement Learning," Papers 1804.04216, arXiv.org.
    36. Bill Gibson, 2007. "A Multi-Agent Systems Approach to Microeconomic Foundations of Macro," UMASS Amherst Economics Working Papers 2007-10, University of Massachusetts Amherst, Department of Economics.
    37. Weitzman, Martin L, 1979. "Optimal Search for the Best Alternative," Econometrica, Econometric Society, vol. 47(3), pages 641-654, May.
    38. Ariel Pakes & Mark Schankerman, 1984. "The Rate of Obsolescence of Patents, Research Gestation Lags, and the Private Rate of Return to Research Resources," NBER Chapters, in: R&D, Patents, and Productivity, pages 73-88, National Bureau of Economic Research, Inc.
    39. Godfrey Keller & Sven Rady, 1999. "Optimal Experimentation in a Changing Environment," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 66(3), pages 475-507.
    40. Cyert, Richard M & DeGroot, Morris H, 1974. "Rational Expectations and Bayesian Analysis," Journal of Political Economy, University of Chicago Press, vol. 82(3), pages 521-536, May/June.
    41. Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
    42. Granato, Jim & Guse, Eran A. & Wong, M. C. Sunny, 2008. "Learning From The Expectations Of Others," Macroeconomic Dynamics, Cambridge University Press, vol. 12(3), pages 345-377, June.
    43. Marcet, Albert & Sargent, Thomas J, 1989. "Convergence of Least-Squares Learning in Environments with Hidden State Variables and Private Information," Journal of Political Economy, University of Chicago Press, vol. 97(6), pages 1306-1322, December.
    44. Merrill M. Flood, 1956. "The Traveling-Salesman Problem," Operations Research, INFORMS, vol. 4(1), pages 61-75, February.
    45. W. Brian Arthur, 1994. "Inductive Reasoning, Bounded Rationality and the Bar Problem," Working Papers 94-03-014, Santa Fe Institute.
    46. Bastien Baldacci & Iuliia Manziuk & Thibaut Mastrolia & Mathieu Rosenbaum, 2019. "Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach," Papers 1912.01129, arXiv.org.
    47. Chaim Fershtman & Ariel Pakes, 2012. "Dynamic Games with Asymmetric Information: A Framework for Empirical Work," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 127(4), pages 1611-1661.
    48. Basci, Erdem, 1999. "Learning by imitation," Journal of Economic Dynamics and Control, Elsevier, vol. 23(9-10), pages 1569-1585, September.
    49. Bergemann, Dirk & Hege, Ulrich, 1998. "Venture capital financing, moral hazard, and learning," Journal of Banking & Finance, Elsevier, vol. 22(6-8), pages 703-735, August.
    50. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    51. Maskin, Eric & Tirole, Jean, 1988. "A Theory of Dynamic Oligopoly, II: Price Competition, Kinked Demand Curves, and Edgeworth Cycles," Econometrica, Econometric Society, vol. 56(3), pages 571-599, May.
    52. Pakes, Ariel S, 1986. "Patents as Options: Some Estimates of the Value of Holding European Patent Stocks," Econometrica, Econometric Society, vol. 54(4), pages 755-784, July.
    53. Jovanovic, Boyan, 1982. "Selection and the Evolution of Industry," Econometrica, Econometric Society, vol. 50(3), pages 649-670, May.
    54. Michael Schwind, 2007. "Dynamic Pricing and Automated Resource Allocation for Complex Information Services," Lecture Notes in Economics and Mathematical Systems, Springer, number 978-3-540-68003-1, December.
    55. G. A. Croes, 1958. "A Method for Solving Traveling-Salesman Problems," Operations Research, INFORMS, vol. 6(6), pages 791-812, December.
    56. Drew Fudenberg & David K. Levine, 1998. "The Theory of Learning in Games," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262061945, December.
    57. Sergiu Hart & Andreu Mas-Colell, 2013. "Uncoupled Dynamics Do Not Lead To Nash Equilibrium," World Scientific Book Chapters, in: Simple Adaptive Strategies From Regret-Matching to Uncoupled Dynamics, chapter 7, pages 153-163, World Scientific Publishing Co. Pte. Ltd..
    58. Feldman, Mark, 1987. "Bayesian learning and convergence to rational expectations," Journal of Mathematical Economics, Elsevier, vol. 16(3), pages 297-313, June.
    59. Wolpin, Kenneth I, 1984. "An Estimable Dynamic Stochastic Model of Fertility and Child Mortality," Journal of Political Economy, University of Chicago Press, vol. 92(5), pages 852-874, October.
    60. Pearce, David G, 1984. "Rationalizable Strategic Behavior and the Problem of Perfection," Econometrica, Econometric Society, vol. 52(4), pages 1029-1050, July.
    61. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    62. Escobar, Juan F., 2013. "Equilibrium analysis of dynamic models of imperfect competition," International Journal of Industrial Organization, Elsevier, vol. 31(1), pages 92-101.
    63. Aumann, Robert J., 1997. "Rationality and Bounded Rationality," Games and Economic Behavior, Elsevier, vol. 21(1-2), pages 2-14, October.
    64. Kanishka Misra & Eric M. Schwartz & Jacob Abernethy, 2019. "Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments," Marketing Science, INFORMS, vol. 38(2), pages 226-252, March.
    65. Lars Peter Hansen & Thomas J. Sargent, 2013. "Recursive Models of Dynamic Linear Economies," Economics Books, Princeton University Press, edition 1, number 10141.
    66. Waltman, Ludo & Kaymak, Uzay, 2008. "Q-learning agents in a Cournot oligopoly model," Journal of Economic Dynamics and Control, Elsevier, vol. 32(10), pages 3275-3293, October.
    67. Sumitra Ganesh & Nelson Vadori & Mengda Xu & Hua Zheng & Prashant Reddy & Manuela Veloso, 2019. "Reinforcement Learning for Market Making in a Multi-agent Dealer Market," Papers 1911.05892, arXiv.org.
    68. Tatsiana Levina & Yuri Levin & Jeff McGill & Mikhail Nediak, 2009. "Dynamic Pricing with Online Learning and Strategic Consumers: An Application of the Aggregating Algorithm," Operations Research, INFORMS, vol. 57(2), pages 327-341, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guangsheng Yu & Qin Wang & Caijun Sun & Lam Duc Nguyen & H. M. N. Dilum Bandara & Shiping Chen, 2024. "Maximizing NFT Incentives: References Make You Rich," Papers 2402.06459, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Charpentier & Romuald Elie & Carl Remlinger, 2020. "Reinforcement Learning in Economics and Finance," Papers 2003.10014, arXiv.org.
    2. Victor Aguirregabiria & Jihye Jeon, 2020. "Firms’ Beliefs and Learning: Models, Identification, and Empirical Evidence," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 56(2), pages 203-235, March.
    3. Sebastian Galiani & Juan Pantano, 2021. "Structural Models: Inception and Frontier," NBER Working Papers 28698, National Bureau of Economic Research, Inc.
    4. Duffy, John, 2006. "Agent-Based Models and Human Subject Experiments," Handbook of Computational Economics, in: Leigh Tesfatsion & Kenneth L. Judd (ed.), Handbook of Computational Economics, edition 1, volume 2, chapter 19, pages 949-1011, Elsevier.
    5. Kets, W., 2008. "Networks and learning in game theory," Other publications TiSEM 7713fce1-3131-498c-8c6f-3, Tilburg University, School of Economics and Management.
    6. Hanming Fang & Yang Wang, 2015. "Estimating Dynamic Discrete Choice Models With Hyperbolic Discounting, With An Application To Mammography Decisions," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 56(2), pages 565-596, May.
    7. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
    8. Jason R. Blevins & Wei Shi & Donald R. Haurin & Stephanie Moulton, 2020. "A Dynamic Discrete Choice Model Of Reverse Mortgage Borrower Behavior," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 61(4), pages 1437-1477, November.
    9. Peter Arcidiacono & Paul B. Ellickson, 2011. "Practical Methods for Estimation of Dynamic Discrete Choice Models," Annual Review of Economics, Annual Reviews, vol. 3(1), pages 363-394, September.
    10. Aguirregabiria, Victor & Nevo, Aviv, 2010. "Recent developments in empirical IO: dynamic demand and dynamic games," MPRA Paper 27814, University Library of Munich, Germany.
    11. Andriy Norets, 2010. "Continuity and differentiability of expected value functions in dynamic discrete choice models," Quantitative Economics, Econometric Society, vol. 1(2), pages 305-322, November.
    12. Arcidiacono, Peter & Miller, Robert A., 2020. "Identifying dynamic discrete choice models off short panels," Journal of Econometrics, Elsevier, vol. 215(2), pages 473-485.
    13. Hopkins, Ed, 2007. "Adaptive learning models of consumer behavior," Journal of Economic Behavior & Organization, Elsevier, vol. 64(3-4), pages 348-368.
    14. Hiroyuki Kasahara & Katsumi Shimotsu, 2018. "Estimation of Discrete Choice Dynamic Programming Models," The Japanese Economic Review, Japanese Economic Association, vol. 69(1), pages 28-58, March.
    15. Nikhil Agarwal & Itai Ashlagi & Michael A. Rees & Paulo Somaini & Daniel Waldinger, 2021. "Equilibrium Allocations Under Alternative Waitlist Designs: Evidence From Deceased Donor Kidneys," Econometrica, Econometric Society, vol. 89(1), pages 37-76, January.
    16. Victor Aguirregabiria & Margaret Slade, 2017. "Empirical models of firms and industries," Canadian Journal of Economics, Canadian Economics Association, vol. 50(5), pages 1445-1488, December.
    17. Blevins, Jason R. & Kim, Minhae, 2024. "Nested Pseudo likelihood estimation of continuous-time dynamic discrete games," Journal of Econometrics, Elsevier, vol. 238(2).
    18. Rambha, Tarun & Nozick, Linda K. & Davidson, Rachel, 2021. "Modeling hurricane evacuation behavior using a dynamic discrete choice framework," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 75-100.
    19. Le-Yu Chen, 2009. "Identification of structural dynamic discrete choice models," CeMMAP working papers CWP08/09, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    20. Nikhil Agarwal & Itai Ashlagi & Michael A. Rees & Paulo J. Somaini & Daniel C. Waldinger, 2019. "Equilibrium Allocations under Alternative Waitlist Designs: Evidence from Deceased Donor Kidneys," NBER Working Papers 25607, National Bureau of Economic Research, Inc.

    More about this item

    Keywords

    Causality; Control; Machine learning; Markov decision process; Multi-armed bandits; Online-learning; Q-learning; Regret; Reinforcement learning; Rewards; Sequential learning;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C41 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Duration Analysis; Optimal Timing Strategies
    • C44 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Operations Research; Statistical Decision Theory
    • C54 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Quantitative Policy Modeling
    • C57 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Econometrics of Games and Auctions
    • C61 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Optimization Techniques; Programming Models; Dynamic Analysis
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques
    • C68 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computable General Equilibrium Models
    • C70 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - General
    • C90 - Mathematical and Quantitative Methods - - Design of Experiments - - - General
    • D40 - Microeconomics - - Market Structure, Pricing, and Design - - - General
    • D70 - Microeconomics - - Analysis of Collective Decision-Making - - - General
    • D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:compec:v:62:y:2023:i:1:d:10.1007_s10614-021-10119-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.