IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005090.html
   My bibliography  Save this article

When Does Model-Based Control Pay Off?

Author

Listed:
  • Wouter Kool
  • Fiery A Cushman
  • Samuel J Gershman

Abstract

Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to “model-free” and “model-based” strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.Author Summary: When you make a choice about what groceries to get for dinner, you can rely on two different strategies. You can make your choice by relying on habit, simply buying the items you need to make a meal that is second nature to you. However, you can also plan your actions in a more deliberative way, realizing that the friend who will join you is a vegetarian, and therefore you should not make the burgers that have become a staple in your cooking. These two strategies differ in how computationally demanding and accurate they are. While the habitual strategy is less computationally demanding (costs less effort and time), the deliberative strategy is more accurate. Scientists have been able to study the distinction between these strategies using a task that allows them to measure how much people rely on habit and planning strategies. Interestingly, we have discovered that in this task, the deliberative strategy does not increase performance accuracy, and hence does not induce a trade-off between accuracy and demand. We describe why this happens, and improve the task so that it embodies an accuracy-demand trade-off, providing evidence for theories of cost-based arbitration between cognitive strategies.

Suggested Citation

  • Wouter Kool & Fiery A Cushman & Samuel J Gershman, 2016. "When Does Model-Based Control Pay Off?," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-34, August.
  • Handle: RePEc:plo:pcbi00:1005090
    DOI: 10.1371/journal.pcbi.1005090
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005090
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005090&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005090?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Amir Dezfouli & Bernard W Balleine, 2013. "Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized," PLOS Computational Biology, Public Library of Science, vol. 9(12), pages 1-14, December.
    2. David K. Levine & Drew Fudenberg, 2006. "A Dual-Self Model of Impulse Control," American Economic Review, American Economic Association, vol. 96(5), pages 1449-1476, December.
    3. Peter Smittenaar & George Prichard & Thomas H B FitzGerald & Joern Diedrichsen & Raymond J Dolan, 2014. "Transcranial Direct Current Stimulation of Right Dorsolateral Prefrontal Cortex Does Not Affect Model-Based or Model-Free Reinforcement Learning in Humans," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-8, January.
    4. Andrew Westbrook & Daria Kester & Todd S Braver, 2013. "What Is the Subjective Cost of Cognitive Effort? Load, Trait, and Aging Effects Revealed by Economic Preference," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-8, July.
    5. Thomas Akam & Rui Costa & Peter Dayan, 2015. "Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task," PLOS Computational Biology, Public Library of Science, vol. 11(12), pages 1-25, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bruno Miranda & W M Nishantha Malalasekera & Timothy E Behrens & Peter Dayan & Steven W Kennerley, 2020. "Combined model-free and model-sensitive reinforcement learning in non-human primates," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-25, June.
    2. Carolina Feher da Silva & Todd A Hare, 2018. "A note on the analysis of two-stage task results: How changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probabi," PLOS ONE, Public Library of Science, vol. 13(4), pages 1-13, April.
    3. Mikhail S. Spektor & Hannah Seidler, 2022. "Violations of economic rationality due to irrelevant information during learning in decision from experience," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 17(2), pages 425-448, March.
    4. Nitzan Shahar & Tobias U Hauser & Michael Moutoussis & Rani Moran & Mehdi Keramati & NSPN consortium & Raymond J Dolan, 2019. "Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-25, February.
    5. repec:cup:judgdm:v:17:y:2022:i:2:p:425-448 is not listed on IDEAS
    6. He A Xu & Alireza Modirshanechi & Marco P Lehmann & Wulfram Gerstner & Michael H Herzog, 2021. "Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-32, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bruno Miranda & W M Nishantha Malalasekera & Timothy E Behrens & Peter Dayan & Steven W Kennerley, 2020. "Combined model-free and model-sensitive reinforcement learning in non-human primates," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-25, June.
    2. Amir Dezfouli & Bernard W Balleine, 2019. "Learning the structure of the world: The adaptive nature of state-space and action representations in multi-stage decision-making," PLOS Computational Biology, Public Library of Science, vol. 15(9), pages 1-22, September.
    3. Thomas Akam & Rui Costa & Peter Dayan, 2015. "Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task," PLOS Computational Biology, Public Library of Science, vol. 11(12), pages 1-25, December.
    4. Leonhard K. Lades & Liam Delaney, 2024. "Self-control failures, as judged by themselves," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-14, December.
    5. Luigi Guiso, 2015. "A Test of Narrow Framing and its Origin," Italian Economic Journal: A Continuation of Rivista Italiana degli Economisti and Giornale degli Economisti, Springer;Società Italiana degli Economisti (Italian Economic Association), vol. 1(1), pages 61-100, March.
    6. Kerri Brick & Martine Visser & Justine Burns, 2012. "Risk Aversion: Experimental Evidence from South African Fishing Communities," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 94(1), pages 133-152.
    7. Julia Grass & Florian Krieger & Philipp Paulus & Samuel Greiff & Anja Strobel & Alexander Strobel, 2019. "Thinking in action: Need for Cognition predicts Self-Control together with Action Orientation," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-20, August.
    8. Lohse, Johannes & Goeschl, Timo & Diederich , Johannes, 2014. "Giving is a question of time: Response times and contributions to a real world public good," Working Papers 0566, University of Heidelberg, Department of Economics.
    9. André Lapied & Thomas Rongiconi, 2013. "Ambiguity as a Source of Temptation: Modeling Unstable Beliefs," Working Papers halshs-00797631, HAL.
    10. Eddie Dekel & Barton L. Lipman & Aldo Rustichini, 2009. "Temptation-Driven Preferences," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 76(3), pages 937-971.
    11. Bryan, Gharad & Karlan, Dean & Nelson, Scott, 2009. "Commitment Contracts," Working Papers 73, Yale University, Department of Economics.
    12. Laureti, Carolina & Szafarz, Ariane, 2023. "Banking regulation and costless commitment contracts for time-inconsistent agents," Economic Modelling, Elsevier, vol. 129(C).
    13. Blair Cleave & Nikos Nikiforakis & Robert Slonim, 2013. "Is there selection bias in laboratory experiments? The case of social and risk preferences," Experimental Economics, Springer;Economic Science Association, vol. 16(3), pages 372-382, September.
    14. Ernesto Dal Bó & Marko Terviö, 2013. "Self-Esteem, Moral Capital, And Wrongdoing," Journal of the European Economic Association, European Economic Association, vol. 11(3), pages 599-663, June.
    15. repec:dgr:uvatin:20110044 is not listed on IDEAS
    16. Strulik, Holger, 2023. "Hooked on weight control: An economic theory of anorexia nervosa and its impact on health and longevity," Journal of Health Economics, Elsevier, vol. 88(C).
    17. Stefano DellaVigna, 2009. "Psychology and Economics: Evidence from the Field," Journal of Economic Literature, American Economic Association, vol. 47(2), pages 315-372, June.
    18. Brocas, Isabelle & Carrillo, Juan D., 2021. "Value computation and modulation: A neuroeconomic theory of self-control as constrained optimization," Journal of Economic Theory, Elsevier, vol. 198(C).
    19. Hicken, Allen & Leider, Stephen & Ravanilla, Nico & Yang, Dean, 2018. "Temptation in vote-selling: Evidence from a field experiment in the Philippines," Journal of Development Economics, Elsevier, vol. 131(C), pages 1-14.
    20. Khwaja, Ahmed & Silverman, Dan & Sloan, Frank, 2007. "Time preference, time discounting, and smoking decisions," Journal of Health Economics, Elsevier, vol. 26(5), pages 927-949, September.
    21. Andersen, Steffen & Harrison, Glenn W. & Lau, Morten Igel & Rutström, Elisabet E., 2014. "Dual criteria decisions," Journal of Economic Psychology, Elsevier, vol. 41(C), pages 101-113.
      • Andersen, Steffen & Harrison, Glenn W. & Lau, Morten Igel & Rutström, Elisabet, 2009. "Dual Criteria Decisions," Working Papers 02-2009, Copenhagen Business School, Department of Economics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005090. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.