On the Convergence of Reinforcement Learning
AbstractThis paper examines the convergence of payoffs and strategies in Erev and Roth`s model of reinforcement learning. When all players use this rule it eliminates iteratively dominated strategies and in two-person constant-sum games average payoffs converge to the value of the game. Strategies converge in constant-sum games with unique equilibria if they are pure or in 2 Ã— 2 games also if they are mixed. The long-run behaviour of the learning rule is governed by equations related to Maynard Smith`s version of the replicator dynamic. Properties of the learning rule against general opponents are also studied. In particular it is shown that it guarantees that the lim sup of a player`s average payoffs is at least his minmax payoff.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by University of Oxford, Department of Economics in its series Economics Series Working Papers with number 96.
Date of creation: 01 Mar 2002
Date of revision:
reinforcement learning; games;
Other versions of this item:
- C72 - Mathematical and Quantitative Methods - - Game Theory and Bargaining Theory - - - Noncooperative Games
- D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search, Learning, and Information
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- T. Borgers & R. Sarin, 2010.
"Learning Through Reinforcement and Replicator Dynamics,"
Levine's Working Paper Archive
380, David K. Levine.
- Borgers, Tilman & Sarin, Rajiv, 1997. "Learning Through Reinforcement and Replicator Dynamics," Journal of Economic Theory, Elsevier, vol. 77(1), pages 1-14, November.
- Tilman B�rgers & Rajiv Sarin, . "Learning Through Reinforcement and Replicator Dynamics," ELSE working papers 051, ESRC Centre on Economics Learning and Social Evolution.
- Sergiu Hart & Andreu Mas-Colell, 2000.
"A Simple Adaptive Procedure Leading to Correlated Equilibrium,"
Econometric Society, vol. 68(5), pages 1127-1150, September.
- S. Hart & A. Mas-Collel, 2010. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Levine's Working Paper Archive 572, David K. Levine.
- Sergiu Hart & Andreu Mas-Colell, 1996. "A simple adaptive procedure leading to correlated equilibrium," Economics Working Papers 200, Department of Economics and Business, Universitat Pompeu Fabra, revised Dec 1996.
- Sergiu Hart & Andreu Mas-Colell, 1997. "A Simple Adaptive Procedure Leading to Correlated Equilibrium," Game Theory and Information 9703006, EconWPA, revised 24 Mar 1997.
- Ed Hopkins, 2001.
"Two Competing Models of How People Learn in Games,"
NajEcon Working Paper Reviews
- Colin Camerer & Teck-Hua Ho, 1999. "Experience-weighted Attraction Learning in Normal Form Games," Econometrica, Econometric Society, vol. 67(4), pages 827-874, July.
- Laslier, J.-F. & Topol, R. & Walliser, B., 1999.
"A Behavioral Learning Process in Games,"
99-03, Paris X - Nanterre, U.F.R. de Sc. Ec. Gest. Maths Infor..
- Arthur, W Brian, 1993. "On Designing Economic Agents That Behave Like Human Agents," Journal of Evolutionary Economics, Springer, vol. 3(1), pages 1-22, February.
- Kuan, Chung-Ming & White, Halbert, 1994. "Adaptive Learning with Nonlinear Dynamics Driven by Dependent Processes," Econometrica, Econometric Society, vol. 62(5), pages 1087-1114, September.
- Josef Hofbauer & Karl H. Schlag, 2000.
"Sophisticated imitation in cyclic games,"
Journal of Evolutionary Economics,
Springer, vol. 10(5), pages 523-543.
- Sergiu Hart & Andreu Mas-Colell, 1999.
"A general class of adaptative strategies,"
Economics Working Papers
373, Department of Economics and Business, Universitat Pompeu Fabra.
- Martin Posch, 1997. "Cycling in a stochastic learning algorithm for normal form games," Journal of Evolutionary Economics, Springer, vol. 7(2), pages 193-207.
- Gale, John & Binmore, Kenneth G. & Samuelson, Larry, 1995. "Learning to be imperfect: The ultimatum game," Games and Economic Behavior, Elsevier, vol. 8(1), pages 56-90.
- Benaim, Michel & Hirsch, Morris W., 1999. "Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games," Games and Economic Behavior, Elsevier, vol. 29(1-2), pages 36-72, October.
This item has more than 25 citations. To prevent cluttering this page, these citations are listed on a separate page. reading list or among the top items on IDEAS.Access and download statisticsgeneral information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Caroline Wise).
If references are entirely missing, you can add them using this form.