Long-run choice anomalies in reinforcement learning with bounded memory

My bibliography Save this article

Long-run choice anomalies in reinforcement learning with bounded memory

Author

Listed:

Giffin, Erin
Lillethun, Erik

Registered:

Abstract

Violations of expected utility (EU) maximization have been demonstrated in many settings; however, anomalies are often reduced after repeated choices. We examine if sufficient experiential learning allows convergence to EU-maximization. In the model, a decision maker with long but finite memory repeatedly makes choices in the same decision problem with uncertainty. We focus on the existence and severity of a certain unambiguous type of long-run choice anomaly: ranking reversals (a non-EU maximizing action being most frequently chosen in the long run). We show reversals exist for almost all preferences, even in realistic examples. Reversals tend to happen when payoff differences are heavily skewed. Longer memory does not eliminate the possibility of ranking reversals, but it does make reversals less severe. Our key takeaway is that finite memory can produce major violations of the expected utility ranking even in a model where both memory and the decision-making process are unbiased.

Suggested Citation

Giffin, Erin & Lillethun, Erik, 2025. "Long-run choice anomalies in reinforcement learning with bounded memory," Journal of Economic Behavior & Organization, Elsevier, vol. 231(C).

Handle: RePEc:eee:jeborg:v:231:y:2025:i:c:s0167268125000216
DOI: 10.1016/j.jebo.2025.106901

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

David Huffman & Collin Raymond & Julia Shvets, 2022. "Persistent Overconfidence and Biased Memory: Evidence from Managers," American Economic Review, American Economic Association, vol. 112(10), pages 3141-3175, October.
Borgers, Tilman & Sarin, Rajiv, 2000. "Naive Reinforcement Learning with Endogenous Aspirations," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 41(4), pages 921-950, November.
- Tilman Börgers & Rajiv Sarin, "undated". "Naive Reinforcement Learning With Endogenous Aspiration," ELSE working papers 037, ESRC Centre on Economics Learning and Social Evolution.
- T. Borgers & R. Sarin, 2010. "Naïve Reinforcement Learning With Endogenous Aspirations," Levine's Working Paper Archive 381, David K. Levine.
Daniel Kahneman & Amos Tversky, 2013. "Prospect Theory: An Analysis of Decision Under Risk," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 6, pages 99-127, World Scientific Publishing Co. Pte. Ltd..
- Kahneman, Daniel & Tversky, Amos, 1979. "Prospect Theory: An Analysis of Decision under Risk," Econometrica, Econometric Society, vol. 47(2), pages 263-291, March.
- Amos Tversky & Daniel Kahneman, 1979. "Prospect Theory: An Analysis of Decision under Risk," Levine's Working Paper Archive 7656, David K. Levine.
Shaun Larcom & Ferdinand Rauch & Tim Willems, 2017. "The Benefits of Forced Experimentation: Striking Evidence from the London Underground Network," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 132(4), pages 2019-2055.
- Ferdinand Rauch & Shaun Larcom & Tim Willems, 2015. "The Benefits of Forced Experimentation: Striking Evidence from the London Underground Network," Economics Series Working Papers 755, University of Oxford, Department of Economics.
- Willems, Tim & Rauch, Ferdinand & Larcom, Shaun, 2015. "The Benefits of Forced Experimentation: Striking Evidence from the London Underground Network," CEPR Discussion Papers 10854, C.E.P.R. Discussion Papers.
- Larcom, Shaun & Rauch, Ferdinand & Willems, Tim, 2015. "The benefits of forced experimentation: strikingevidence from the London Underground network," LSE Research Online Documents on Economics 63832, London School of Economics and Political Science, LSE Library.
- Shaun Larcom & Ferdinand Rauch & Tim Willems, 2015. "The Benefits of Forced Experimentation: Striking Evidence from the London Underground Network," CEP Discussion Papers dp1372, Centre for Economic Performance, LSE.
Sarin, Rajiv, 2000. "Decision Rules with Bounded Memory," Journal of Economic Theory, Elsevier, vol. 90(1), pages 151-160, January.
David Eil & Justin M. Rao, 2011. "The Good News-Bad News Effect: Asymmetric Processing of Objective Information about Yourself," American Economic Journal: Microeconomics, American Economic Association, vol. 3(2), pages 114-138, May.
Andrea Wilson, 2014. "Bounded Memory and Biases in Information Processing," Econometrica, Econometric Society, vol. 82, pages 2257-2294, November.
Vernon L. Smith, 1962. "An Experimental Study of Competitive Market Behavior," Journal of Political Economy, University of Chicago Press, vol. 70(2), pages 111-111.
- Vernon L. Smith, 1962. "An Experimental Study of Competitive Market Behavior," Journal of Political Economy, University of Chicago Press, vol. 70(3), pages 322-322.
Yan Chen & Robert S. Gazzale, 2004. "When Does Learning in Games Generate Convergence to Nash Equilibria? The Role of Supermodularity in an Experimental Setting," Department of Economics Working Papers 2004-02, Department of Economics, Williams College.
Cheung, Yin-Wong & Friedman, Daniel, 1997. "Individual Learning in Normal Form Games: Some Laboratory Results," Games and Economic Behavior, Elsevier, vol. 19(1), pages 46-76, April.
Wei Chen & Shu-Yu Liu & Chih-Han Chen & Yi-Shan Lee, 2011. "Bounded Memory, Inertia, Sampling and Weighting Model for Market Entry Games," Games, MDPI, vol. 2(1), pages 1-13, March.
Ignacio Palacios-Huerta, 2003. "Learning to Open Monty Hall's Doors," Experimental Economics, Springer;Economic Science Association, vol. 6(3), pages 235-251, November.
Roth, Alvin E. & Erev, Ido, 1995. "Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term," Games and Economic Behavior, Elsevier, vol. 8(1), pages 164-212.
Sandholm, William H. & Izquierdo, Segismundo S. & Izquierdo, Luis R., 2020. "Stability for best experienced payoff dynamics," Journal of Economic Theory, Elsevier, vol. 185(C).
Colin Camerer & Teck-Hua Ho, 1999. "Experience-weighted Attraction Learning in Normal Form Games," Econometrica, Econometric Society, vol. 67(4), pages 827-874, July.
John G. Cross, 1973. "A Stochastic Learning Model of Economic Behavior," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 87(2), pages 239-266.
Arifovic, Jasmina & Ledyard, John, 2011. "A behavioral model for mechanism design: Individual evolutionary learning," Journal of Economic Behavior & Organization, Elsevier, vol. 78(3), pages 374-395, May.
Yan Chen & Robert Gazzale, 2004. "When Does Learning in Games Generate Convergence to Nash Equilibria? The Role of Supermodularity in an Experimental Setting," American Economic Review, American Economic Association, vol. 94(5), pages 1505-1535, December.
Cox, James C. & Walker, Mark, 1998. "Learning to play Cournot duopoly strategies," Journal of Economic Behavior & Organization, Elsevier, vol. 36(2), pages 141-161, August.
Andrea Amelio & Florian Zimmermann, 2023. "Motivated Memory in Economics—A Review," Games, MDPI, vol. 14(1), pages 1-15, January.
- Andrea Amelio & Florian Zimmermann, 2022. "Motivated Memory in Economics - a Review," ECONtribute Discussion Papers Series 213, University of Bonn and University of Cologne, Germany.
Sandholm, William H. & Izquierdo, Segismundo S. & Izquierdo, Luis R., 2019. "Best experienced payoff dynamics and cooperation in the Centipede game," Theoretical Economics, Econometric Society, vol. 14(4), November.
Gottlieb, Daniel, 2014. "Imperfect memory and choice under risk," Games and Economic Behavior, Elsevier, vol. 85(C), pages 127-158.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Jaspersen, Johannes G. & Montibeller, Gilberto, 2020. "On the learning patterns and adaptive behavior of terrorist organizations," European Journal of Operational Research, Elsevier, vol. 282(1), pages 221-234.
Chmura, Thorsten & Goerg, Sebastian J. & Selten, Reinhard, 2012. "Learning in experimental 2×2 games," Games and Economic Behavior, Elsevier, vol. 76(1), pages 44-73.
- Chmura, Thorsten & Goerg, Sebastian J. & Selten, Reinhard, 2008. "Learning in experimental 2×2 games," Bonn Econ Discussion Papers 18/2008, University of Bonn, Bonn Graduate School of Economics (BGSE).
- Thorsten Chmura & Sebastian Goerg & Reinhard Selten, 2011. "Learning in experimental 2 x 2 games," Discussion Paper Series of the Max Planck Institute for Research on Collective Goods 2011_26, Max Planck Institute for Research on Collective Goods.
Jinkwon Lee, 2007. "Repetition And Financial Incentives In Economics Experiments," Journal of Economic Surveys, Wiley Blackwell, vol. 21(3), pages 628-681, July.
Oyarzun, Carlos & Sarin, Rajiv, 2013. "Learning and risk aversion," Journal of Economic Theory, Elsevier, vol. 148(1), pages 196-225.
- Carlos Oyarzun & Rajiv Sarin, 2005. "Learning and Risk Aversion," Levine's Bibliography 784828000000000482, UCLA Department of Economics.
- Carlos Oyarzun & Rajiv Sarin, 2012. "Learning and Risk Aversion," Levine's Working Paper Archive 786969000000000572, David K. Levine.
Masiliūnas, Aidas, 2023. "Learning in rent-seeking contests with payoff risk and foregone payoff information," Games and Economic Behavior, Elsevier, vol. 140(C), pages 50-72.
Yechiam, Eldad & Busemeyer, Jerome R., 2008. "Evaluating generalizability and parameter consistency in learning models," Games and Economic Behavior, Elsevier, vol. 63(1), pages 370-394, May.
Jonathan Newton, 2018. "Evolutionary Game Theory: A Renaissance," Games, MDPI, vol. 9(2), pages 1-67, May.
Shu-Heng Chen & Yi-Lin Hsieh, 2011. "Reinforcement Learning in Experimental Asset Markets," Eastern Economic Journal, Palgrave Macmillan;Eastern Economic Association, vol. 37(1), pages 109-133.
Chen, Shu-Heng, 2012. "Varieties of agents in agent-based computational economics: A historical and an interdisciplinary perspective," Journal of Economic Dynamics and Control, Elsevier, vol. 36(1), pages 1-25.
Atanasios Mitropoulos, 2001. "Learning Under Little Information: An Experiment on Mutual Fate Control," Game Theory and Information 0110003, University Library of Munich, Germany.
Duffy, John, 2006. "Agent-Based Models and Human Subject Experiments," Handbook of Computational Economics, in: Leigh Tesfatsion & Kenneth L. Judd (ed.), Handbook of Computational Economics, edition 1, volume 2, chapter 19, pages 949-1011, Elsevier.
- John Duffy, 2004. "Agent-Based Models and Human Subject Experiments," Computational Economics 0412001, University Library of Munich, Germany.
Andreas Ortmann & Leonidas Spiliopoulos, 2017. "The beauty of simplicity? (Simple) heuristics and the opportunities yet to be realized," Chapters, in: Morris Altman (ed.), Handbook of Behavioural Economics and Smart Decision-Making, chapter 7, pages 119-136, Edward Elgar Publishing.
- Andreas Ortmann & Leonidas Spiliopoulos, 2015. "The beauty of simplicity? (Simple) heuristics and the opportunities yet to be realized," Discussion Papers 2015-25, School of Economics, The University of New South Wales.
Mitropoulos, Atanasios, 2001. "Learning under minimal information: An experiment on mutual fate control," Journal of Economic Psychology, Elsevier, vol. 22(4), pages 523-557, August.
Spiliopoulos, Leonidas, 2008. "Do repeated game players detect patterns in opponents? Revisiting the Nyarko & Schotter belief elicitation experiment," MPRA Paper 6666, University Library of Munich, Germany.
Terracol, Antoine & Vaksmann, Jonathan, 2009. "Dumbing down rational players: Learning and teaching in an experimental game," Journal of Economic Behavior & Organization, Elsevier, vol. 70(1-2), pages 54-71, May.
- Antoine Terracol & Jonathan Vaksmann, 2007. "Dumbing down rational players: Learning and teaching in an experimental game," Documents de travail du Centre d'Economie de la Sorbonne bla07017, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
- Antoine Terracol & Jonathan Vaksmann, 2009. "Dumbing down rational players: Learning and teaching in an experimental game," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-00607223, HAL.
- Antoine Terracol & Jonathan Vaksmann, 2009. "Dumbing down rational players: Learning and teaching in an experimental game," Post-Print hal-00607223, HAL.
- Antoine Terracol & Jonathan Vaksmann, 2007. "Dumbing down rational players: learning and teaching in an experimental game," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00145436, HAL.
- Antoine Terracol & Jonathan Vaksmann, 2009. "Dumbing down rational players: Learning and teaching in an experimental game," PSE-Ecole d'économie de Paris (Postprint) hal-00607223, HAL.
- Antoine Terracol & Jonathan Vaksmann, 2009. "Dumbing down rational players: Learning and teaching in an experimental game," Post-Print hal-00672292, HAL.
- Antoine Terracol & Jonathan Vaksmann, 2007. "Dumbing down rational players: learning and teaching in an experimental game," Post-Print halshs-00145436, HAL.
Ianni, A., 2002. "Reinforcement learning and the power law of practice: some analytical results," Discussion Paper Series In Economics And Econometrics 203, Economics Division, School of Social Sciences, University of Southampton.
Erhao Xie, 2019. "Monetary Payoff and Utility Function in Adaptive Learning Models," Staff Working Papers 19-50, Bank of Canada.
Osili, Una Okonkwo & Paulson, Anna, 2014. "Crises and confidence: Systemic banking crises and depositor behavior," Journal of Financial Economics, Elsevier, vol. 111(3), pages 646-660.
Cason, Timothy N. & Saijo, Tatsuyoshi & Yamato, Takehiko & Yokotani, Konomu, 2004. "Non-excludable public good experiments," Games and Economic Behavior, Elsevier, vol. 49(1), pages 81-102, October.
- Saijo, Tatsuyoshi & Yamato, Takehiko & Yokotani, Konomu & Cason, Timothy N., 2002. "Non-Excludable Public Good Experiments," Working Papers 1154, California Institute of Technology, Division of the Humanities and Social Sciences.
- Tatsuyoshi Saijo, 2003. "Non-Excludable Public Good Experiments," Theory workshop papers 505798000000000027, UCLA Department of Economics.
Claude Meidinger, 2018. "Cooperation and evolution of meaning in senders-receivers games," Post-Print halshs-01960762, HAL.

More about this item

Keywords

Expected utility; Choice anomalies; Reinforcement learning; Bounded memory; Markov chains;
All these keywords.

JEL classification:

D81 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Criteria for Decision-Making under Risk and Uncertainty
D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness
D91 - Microeconomics - - Micro-Based Behavioral Economics - - - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on Decision Making

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jeborg:v:231:y:2025:i:c:s0167268125000216. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jebo .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Long-run choice anomalies in reinforcement learning with bounded memory

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data