IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005145.html
   My bibliography  Save this article

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation

Author

Listed:
  • Ayaka Kato
  • Kenji Morita

Abstract

It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.Author Summary: Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.

Suggested Citation

  • Ayaka Kato & Kenji Morita, 2016. "Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-41, October.
  • Handle: RePEc:plo:pcbi00:1005145
    DOI: 10.1371/journal.pcbi.1005145
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005145
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005145&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005145?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Paul E. M. Phillips & Garret D. Stuber & Michael L. A. V. Heien & R. Mark Wightman & Regina M. Carelli, 2003. "Subsecond dopamine release promotes cocaine seeking," Nature, Nature, vol. 422(6932), pages 614-618, April.
    2. Mark W. Howe & Patrick L. Tierney & Stefan G. Sandberg & Paul E. M. Phillips & Ann M. Graybiel, 2013. "Prolonged dopamine signalling in striatum signals proximity and value of distant rewards," Nature, Nature, vol. 500(7464), pages 575-579, August.
    3. Neir Eshel & Michael Bukwich & Vinod Rao & Vivian Hemmelder & Ju Tian & Naoshige Uchida, 2015. "Erratum: Arithmetic and local circuitry underlying dopamine prediction errors," Nature, Nature, vol. 527(7578), pages 398-398, November.
    4. Neir Eshel & Michael Bukwich & Vinod Rao & Vivian Hemmelder & Ju Tian & Naoshige Uchida, 2015. "Arithmetic and local circuitry underlying dopamine prediction errors," Nature, Nature, vol. 525(7568), pages 243-246, September.
    5. John N. J. Reynolds & Brian I. Hyland & Jeffery R. Wickens, 2001. "A cellular mechanism of reward-related learning," Nature, Nature, vol. 413(6851), pages 67-70, September.
    6. Erev, Ido & Roth, Alvin E, 1998. "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria," American Economic Review, American Economic Association, vol. 88(4), pages 848-881, September.
    7. Eric A. Yttri & Joshua T. Dudman, 2016. "Opponent and bidirectional control of movement velocity in the basal ganglia," Nature, Nature, vol. 533(7603), pages 402-406, May.
    8. M. W. Howe & D. A. Dombeck, 2016. "Rapid signalling in distinct dopaminergic axons during locomotion and reward," Nature, Nature, vol. 535(7613), pages 505-510, July.
    9. Nathaniel D. Daw & John P. O'Doherty & Peter Dayan & Ben Seymour & Raymond J. Dolan, 2006. "Cortical substrates for exploratory decisions in humans," Nature, Nature, vol. 441(7095), pages 876-879, June.
    10. Paul E. M. Phillips & Garret D. Stuber & Michael L. A. V. Heien & R. Mark Wightman & Regina M. Carelli, 2003. "Erratum: Subsecond dopamine release promotes cocaine seeking," Nature, Nature, vol. 423(6938), pages 461-461, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
    2. Kathleen Wiencke & Annette Horstmann & David Mathar & Arno Villringer & Jane Neumann, 2020. "Dopamine release, diffusion and uptake: A computational model for synaptic and volume transmission," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-26, November.
    3. Vincent Moens & Alexandre Zénon, 2019. "Learning and forgetting using reinforced Bayesian change detection," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-41, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Miguel Á. Luján & Dan P. Covey & Reana Young-Morrison & LanYuan Zhang & Andrew Kim & Fiorella Morgado & Sachin Patel & Caroline E. Bass & Carlos Paladini & Joseph F. Cheer, 2023. "Mobilization of endocannabinoids by midbrain dopamine neurons is required for the encoding of reward prediction," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    2. Hachi E. Manzur & Ksenia Vlasov & You-Jhe Jhong & Hung-Yen Chen & Shih-Chieh Lin, 2023. "The behavioral signature of stepwise learning strategy in male rats and its neural correlate in the basal forebrain," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. John N. J. Reynolds & Riccardo Avvisati & Paul D. Dodson & Simon D. Fisher & Manfred J. Oswald & Jeffery R. Wickens & Yan-Feng Zhang, 2022. "Coincidence of cholinergic pauses, dopaminergic activation and depolarisation of spiny projection neurons drives synaptic plasticity in the striatum," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    4. Maël Lebreton & Karin Bacily & Stefano Palminteri & Jan B Engelmann, 2019. "Contextual influence on confidence judgments in human reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-27, April.
    5. Tal Neiman & Yonatan Loewenstein, 2011. "Reinforcement learning in professional basketball players," Discussion Paper Series dp593, The Federmann Center for the Study of Rationality, the Hebrew University, Jerusalem.
    6. Phanish Puranam & Murali Swamy, 2016. "How Initial Representations Shape Coupled Learning Processes," Organization Science, INFORMS, vol. 27(2), pages 323-335, April.
    7. Giovanni Leone & Charlotte Postel & Alison Mary & Florence Fraisse & Thomas Vallée & Fausto Viader & Vincent Sayette & Denis Peschanski & Jaques Dayan & Francis Eustache & Pierre Gagnepain, 2022. "Altered predictive control during memory suppression in PTSD," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    8. R Becket Ebitz & Brianna J Sleezer & Hank P Jedema & Charles W Bradberry & Benjamin Y Hayden, 2019. "Tonic exploration governs both flexibility and lapses," PLOS Computational Biology, Public Library of Science, vol. 15(11), pages 1-37, November.
    9. Yu-Hsuan Lin & Kuan-I Lin & Yuan-Chien Pan & Sheng-Hsuan Lin, 2020. "Investigation of the Role of Anxiety and Depression on the Formation of Phantom Vibration and Ringing Syndrome Caused by Working Stress during Medical Internship," IJERPH, MDPI, vol. 17(20), pages 1-10, October.
    10. Allen P. F. Chen & Lu Chen & Kaiyo W. Shi & Eileen Cheng & Shaoyu Ge & Qiaojie Xiong, 2023. "Nigrostriatal dopamine modulates the striatal-amygdala pathway in auditory fear conditioning," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    11. Lior Matityahu & Naomi Gilin & Gideon A. Sarpong & Yara Atamna & Lior Tiroshi & Nicolas X. Tritsch & Jeffery R. Wickens & Joshua A. Goldberg, 2023. "Acetylcholine waves and dopamine release in the striatum," Nature Communications, Nature, vol. 14(1), pages 1-23, December.
    12. Daniel E Acuña & Paul Schrater, 2010. "Structure Learning in Human Sequential Decision-Making," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-12, December.
    13. Alina Ferecatu & Arnaud De Bruyn, 2022. "Understanding Managers’ Trade-Offs Between Exploration and Exploitation," Marketing Science, INFORMS, vol. 41(1), pages 139-165, January.
    14. Allen P. F. Chen & Jeffrey M. Malgady & Lu Chen & Kaiyo W. Shi & Eileen Cheng & Joshua L. Plotkin & Shaoyu Ge & Qiaojie Xiong, 2022. "Nigrostriatal dopamine pathway regulates auditory discrimination behavior," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    15. Noah Gans & George Knox & Rachel Croson, 2007. "Simple Models of Discrete Choice and Their Performance in Bandit Experiments," Manufacturing & Service Operations Management, INFORMS, vol. 9(4), pages 383-408, December.
    16. Terry E. Daniel & Eyran J. Gisches & Amnon Rapoport, 2009. "Departure Times in Y-Shaped Traffic Networks with Multiple Bottlenecks," American Economic Review, American Economic Association, vol. 99(5), pages 2149-2176, December.
    17. Iftekhar, M. S. & Tisdell, J. G., 2018. "Learning in repeated multiple unit combinatorial auctions: An experimental study," Working Papers 267301, University of Western Australia, School of Agricultural and Resource Economics.
    18. Ianni, A., 2002. "Reinforcement learning and the power law of practice: some analytical results," Discussion Paper Series In Economics And Econometrics 203, Economics Division, School of Social Sciences, University of Southampton.
    19. Yongping Bao & Ludwig Danwitz & Fabian Dvorak & Sebastian Fehrler & Lars Hornuf & Hsuan Yu Lin & Bettina von Helversen, 2022. "Similarity and Consistency in Algorithm-Guided Exploration," CESifo Working Paper Series 10188, CESifo.
    20. Benaïm, Michel & Hofbauer, Josef & Hopkins, Ed, 2009. "Learning in games with unstable equilibria," Journal of Economic Theory, Elsevier, vol. 144(4), pages 1694-1709, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005145. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.