On the Convergence of Reinforcement Learning
AbstractThis paper examines the convergence of payoffs and strategies in Erev and Roth`s model of reinforcement learning. When all players use this rule it eliminates iteratively dominated strategies and in two-person constant-sum games average payoffs converge to the value of the game. Strategies converge in constant-sum games with unique equilibria if they are pure or in 2 Ã— 2 games also if they are mixed. The long-run behaviour of the learning rule is governed by equations related to Maynard Smith`s version of the replicator dynamic. Properties of the learning rule against general opponents are also studied. In particular it is shown that it guarantees that the lim sup of a player`s average payoffs is at least his minmax payoff.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by University of Oxford, Department of Economics in its series Economics Series Working Papers with number 96.
Date of creation: 01 Mar 2002
Date of revision:
reinforcement learning; games;
Other versions of this item:
- C72 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Noncooperative Games
- D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search, Learning, and Information
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Sergiu Hart & Andreu Mas-Colell, 1996.
"A simple adaptive procedure leading to correlated equilibrium,"
Economics Working Papers
200, Department of Economics and Business, Universitat Pompeu Fabra, revised Dec 1996.
- Sergiu Hart & Andreu Mas-Colell, 2000. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Econometrica, Econometric Society, vol. 68(5), pages 1127-1150, September.
- Sergiu Hart & Andreu Mas-Colell, 1997. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Game Theory and Information 9703006, EconWPA, revised 24 Mar 1997.
- S. Hart & A. Mas-Collel, 2010. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Levine's Working Paper Archive 572, David K. Levine.
- Martin Posch, 1997. "Cycling in a stochastic learning algorithm for normal form games," Journal of Evolutionary Economics, Springer, vol. 7(2), pages 193-207.
- Colin Camerer & Teck-Hua Ho, 1999. "Experience-weighted Attraction Learning in Normal Form Games," Econometrica, Econometric Society, vol. 67(4), pages 827-874, July.
- Kuan, Chung-Ming & White, Halbert, 1994. "Adaptive Learning with Nonlinear Dynamics Driven by Dependent Processes," Econometrica, Econometric Society, vol. 62(5), pages 1087-1114, September.
- Ed Hopkins, 2001.
"Two Competing Models of How People Learn in Games,"
NajEcon Working Paper Reviews
- T. Borgers & R. Sarin, 2010.
"Learning Through Reinforcement and Replicator Dynamics,"
Levine's Working Paper Archive
380, David K. Levine.
- Borgers, Tilman & Sarin, Rajiv, 1997. "Learning Through Reinforcement and Replicator Dynamics," Journal of Economic Theory, Elsevier, vol. 77(1), pages 1-14, November.
- Tilman B�rgers & Rajiv Sarin, . "Learning Through Reinforcement and Replicator Dynamics," ELSE working papers 051, ESRC Centre on Economics Learning and Social Evolution.
- Benaim, Michel & Hirsch, Morris W., 1999. "Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games," Games and Economic Behavior, Elsevier, vol. 29(1-2), pages 36-72, October.
- Josef Hofbauer & Karl H. Schlag, 2000.
"Sophisticated imitation in cyclic games,"
Journal of Evolutionary Economics,
Springer, vol. 10(5), pages 523-543.
- Laslier, Jean-Francois & Topol, Richard & Walliser, Bernard, 2001.
"A Behavioral Learning Process in Games,"
Games and Economic Behavior,
Elsevier, vol. 37(2), pages 340-366, November.
- Laslier, J.-F. & Topol, R. & Walliser, B., 1999. "A Behavioral Learning Process in Games," Papers 99-03, Paris X - Nanterre, U.F.R. de Sc. Ec. Gest. Maths Infor..
- J.-F. Laslier & R. Topol & B. Walliser, 1999. "A behavioral learning process in games," THEMA Working Papers 99-03, THEMA (THéorie Economique, Modélisation et Applications), Université de Cergy-Pontoise.
- Gale, John & Binmore, Kenneth G. & Samuelson, Larry, 1995. "Learning to be imperfect: The ultimatum game," Games and Economic Behavior, Elsevier, vol. 8(1), pages 56-90.
- Sergiu Hart & Andreu Mas-Colell, 1999.
"A General Class of Adaptive Strategies,"
Game Theory and Information
9904001, EconWPA, revised 23 Mar 2000.
- Arthur, W Brian, 1993. "On Designing Economic Agents That Behave Like Human Agents," Journal of Evolutionary Economics, Springer, vol. 3(1), pages 1-22, February.
This item has more than 25 citations. To prevent cluttering this page, these citations are listed on a separate page. reading list or among the top items on IDEAS.Access and download statisticsgeneral information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Caroline Wise).
If references are entirely missing, you can add them using this form.