Structure Learning in Human Sequential Decision-Making

My bibliography Save this article

Structure Learning in Human Sequential Decision-Making

Author

Listed:

Daniel E Acuña
Paul Schrater

Registered:

Abstract

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.Author Summary: Every decision-making experiment has a structure that specifies how rewards are obtained, which is usually explained to the subject at the beginning of the experiment. Participants frequently fail to act as if they understand the experimental structure, even in tasks as simple as determining which of two biased coins they should choose to maximize the number of trials that produce “heads”. We hypothesize that participants' behavior is not driven by top-down instructions—rather, participants must learn through experience how the rewards are generated. We formalize this hypothesis using a fully rational optimal Bayesian reinforcement learning approach that models optimal structure learning in sequential decision making. In an experimental test of structure learning in humans, we show that humans learn reward structure from experience in a near optimal manner. Our results demonstrate that behavior purported to show that humans are error-prone and suboptimal decision makers can result from an optimal learning approach. Our findings provide a compelling new family of rational hypotheses for behavior previously deemed irrational, including under- and over-exploration.

Suggested Citation

Daniel E Acuña & Paul Schrater, 2010. "Structure Learning in Human Sequential Decision-Making," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-12, December.

Handle: RePEc:plo:pcbi00:1001003
DOI: 10.1371/journal.pcbi.1001003

Download full text from publisher

References listed on IDEAS

Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
Jeffrey Banks & David Porter & Mark Olson, 1997. "An experimental analysis of the bandit problem," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 10(1), pages 55-77.
Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
Nathaniel D. Daw & John P. O'Doherty & Peter Dayan & Ben Seymour & Raymond J. Dolan, 2006. "Cortical substrates for exploratory decisions in humans," Nature, Nature, vol. 441(7095), pages 876-879, June.
Robert J. Meyer & Yong Shi, 1995. "Sequential Choice Under Ambiguity: Intuitive Solutions to the Armed-Bandit Problem," Management Science, INFORMS, vol. 41(5), pages 817-834, May.
Yutaka Sakai & Tomoki Fukai, 2008. "When Does Reward Maximization Lead to Matching Law?," PLOS ONE, Public Library of Science, vol. 3(11), pages 1-7, November.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Janet M. Currie & W. Bentley MacLeod, 2018. "Understanding Doctor Decision Making: The Case of Depression," NBER Working Papers 24955, National Bureau of Economic Research, Inc.
- Janet M. Currie & W. Bentley MacLeod, 2020. "Understanding Doctor Decision Making: The Case of Depression," Working Papers 2020-77, Princeton University. Economics Department..
Francesco Rigoli & Christoph Mathys & Karl J Friston & Raymond J Dolan, 2017. "A unifying Bayesian account of contextual effects in value-based choice," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-28, October.
Elyse H Norton & Stephen M Fleming & Nathaniel D Daw & Michael S Landy, 2017. "Suboptimal Criterion Learning in Static and Dynamic Environments," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-28, January.
Amir Dezfouli & Kristi Griffiths & Fabio Ramos & Peter Dayan & Bernard W Balleine, 2019. "Models that learn how humans learn: The case of decision-making and its disorders," PLOS Computational Biology, Public Library of Science, vol. 15(6), pages 1-33, June.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Alina Ferecatu & Arnaud De Bruyn, 2022. "Understanding Managers’ Trade-Offs Between Exploration and Exploitation," Marketing Science, INFORMS, vol. 41(1), pages 139-165, January.
Hu, Yingyao & Kayaba, Yutaka & Shum, Matthew, 2013. "Nonparametric learning rules from bandit experiments: The eyes have it!," Games and Economic Behavior, Elsevier, vol. 81(C), pages 215-231.
- Yingyao Hu & Yutaka Kayaba & Matthew Shum, 2010. "Nonparametric learning rules from bandit experiments: the eyes have it!," CeMMAP working papers CWP15/10, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Yingyao Hu & Yutaka Kayaba & Matt Shum, 2010. "Nonparametric Learning Rules from Bandit Experiments: The Eyes have it!," Economics Working Paper Archive 560, The Johns Hopkins University,Department of Economics.
Yilmaz Kocer, 2010. "Endogenous Learning with Bounded Memory," Working Papers 1290, Princeton University, Department of Economics, Econometric Research Program..
Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
Eric Guerci & Nobuyuki Hanaki & Naoki Watanabe, 2017. "Meaningful learning in weighted voting games: an experiment," Theory and Decision, Springer, vol. 83(1), pages 131-153, June.
- Eric Guerci & Nobuyuki Hanaki & Naoki Watanabe, 2015. "Meaningful Learning in Weighted Voting Games: An Experiment," GREDEG Working Papers 2015-40, Groupe de REcherche en Droit, Economie, Gestion (GREDEG CNRS), Université Côte d'Azur, France.
- Eric Guerci & Nobuyuki Hanaki & Naoki Watanabe, 2017. "Meaningful Learning in Weighted Voting Games: An Experiment," Post-Print halshs-01216244, HAL.
Gars, Jared & Ward, Patrick S., 2019. "Can differences in individual learning explain patterns of technology adoption? Evidence on heterogeneous learning patterns and hybrid rice adoption in Bihar, India," World Development, Elsevier, vol. 115(C), pages 178-189.
repec:cup:judgdm:v:17:y:2022:i:4:p:691-719 is not listed on IDEAS
repec:jdm:journl:v:17:y:2022:i:4:p:691-719 is not listed on IDEAS
Johannes Hoelzemann & Nicolas Klein, 2021. "Bandits in the lab," Quantitative Economics, Econometric Society, vol. 12(3), pages 1021-1051, July.
- Johannes HOELZEMANN & Nicolas KLEIN, 2018. "Bandits in the Lab," Cahiers de recherche 12-2018, Centre interuniversitaire de recherche en Ã©conomie quantitative, CIREQ.
- HOELZEMANN, Johannes & KLEIN, Nicolas, 2018. "Bandits in the Lab," Cahiers de recherche 2018-09, Universite de Montreal, Departement de sciences economiques.
Gars, Jared & Ward, Patrick S., 2016. "The role of learning in technology adoption: Evidence on hybrid rice adoption in Bihar, India," IFPRI discussion papers 1591, International Food Policy Research Institute (IFPRI).
repec:cup:judgdm:v:12:y:2017:i:2:p:104-117 is not listed on IDEAS
repec:plo:pcbi00:1003759 is not listed on IDEAS
Hudja, Stanton, 2021. "Is Experimentation Invariant to Group Size? A Laboratory Analysis of Innovation Contests," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 91(C).
Eric Guerci & Nobuyuki Hanaki & Naoki Watanabe, 2015. "Meaningful Learning in Weighted Voting Games: An Experiment," Working Papers halshs-01216244, HAL.
Christopher Anderson, 2012. "Ambiguity aversion in multi-armed bandit problems," Theory and Decision, Springer, vol. 72(1), pages 15-33, January.
Andrew M. Davis & Vishal Gaur & Dayoung Kim, 2021. "Consumer Learning from Own Experience and Social Information: An Experimental Study," Management Science, INFORMS, vol. 67(5), pages 2924-2943, May.
Paul M. Krueger & Robert C. Wilson & Jonathan D. Cohen, 2017. "Strategies for exploration in the domain of losses," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 12(2), pages 104-117, March.
Nobuyuki Hanaki & Alan Kirman & Paul Pezanis-Christou, 2016. "Counter Intuitive Learning: An Exploratory Study," School of Economics and Public Policy Working Papers 2016-12, University of Adelaide, School of Economics and Public Policy.
- Nobuyuki Hanaki & Alan Kirman & Paul Pezanis-Christou, 2016. "Counter intuitive learning: An exploratory study," Working Papers hal-01358716, HAL.
- Nobuyuki Hanaki & Alan P. Kirman & Paul Pezanis-Christou, 2016. "Counter Intuitive Learning: An Exploratory Study," CESifo Working Paper Series 6029, CESifo.
Christina Fang & Daniel Levinthal, 2009. "Near-Term Liability of Exploitation: Exploration and Exploitation in Multistage Problems," Organization Science, INFORMS, vol. 20(3), pages 538-551, June.
Ayaka Kato & Kenji Morita, 2016. "Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-41, October.
Jean Paul Rabanal & Aleksei Chernulich & John Horowitz & Olga A. Rud & Manizha Sharifova, 2019. "Market timing under public and private information," Working Papers 151, Peruvian Economic Association.
Naoki Watanabe, 2022. "Reconsidering Meaningful Learning in a Bandit Experiment on Weighted Voting: Subjects’ Search Behavior," The Review of Socionetwork Strategies, Springer, vol. 16(1), pages 81-107, April.
Marcoul, Philippe & Weninger, Quinn, 2008. "Search and active learning with correlated information: Empirical evidence from mid-Atlantic clam fishermen," Journal of Economic Dynamics and Control, Elsevier, vol. 32(6), pages 1921-1948, June.
- Marcoul, Philippe & Weninger, Quinn, 2008. "Search and Active Learning with Correlated Information: Empirical Evidence from Mid-Atlantic Clam Fishermen," Staff General Research Papers Archive 11601, Iowa State University, Department of Economics.
- Marcoul, Philippe & Weninger, Quinn, 2008. "Search and active learning with correlated information: Empirical evidence from mid-Atlantic clam fishermen," ISU General Staff Papers 200806010700001485, Iowa State University, Department of Economics.
Maime Guan & Ryan Stokes & Joachim Vandekerckhove & Michael D. Lee, 2020. "A cognitive modeling analysis of risk in sequential choice tasks}," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 15(5), pages 823-850, September.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1001003. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Structure Learning in Human Sequential Decision-Making

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data