IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1011385.html
   My bibliography  Save this article

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks

Author

Listed:
  • Kim T Blackwell
  • Kenji Doya

Abstract

A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.Author summary: Humans and animals are exceedingly adept at learning to perform complicated tasks when the only feedback is reward for correct actions. Early phases of learning are characterized by exploration of possible actions, and later phases of learning are characterized by optimizing the action sequence. Experimental evidence suggests that reward is encoded by the dopamine signal, and that dopamine also can influence the degree of exploration. Reinforcement learning algorithms are machine learning algorithms that use the reward signal to determine the value of taking an action. These algorithms have some similarity to information processing by the basal ganglia, and can explain several types of learning behavior. We extend one of these algorithms, Q learning, to increase the similarity to basal ganglia circuitry, and evaluate performance on several learning tasks. We show that by incorporating two opposing basal ganglia pathways, we can improve performance on operant conditioning tasks and a difficult sequence learning task. These results suggest that incorporating additional aspects of brain circuitry could further improve performance of reinforcement learning algorithms.

Suggested Citation

  • Kim T Blackwell & Kenji Doya, 2023. "Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks," PLOS Computational Biology, Public Library of Science, vol. 19(8), pages 1-31, August.
  • Handle: RePEc:plo:pcbi00:1011385
    DOI: 10.1371/journal.pcbi.1011385
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011385
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011385&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1011385?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
    2. Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.
    3. Jonathan Nicholas & Nathaniel D. Daw & Daphna Shohamy, 2025. "Proactive and reactive construction of memory-based preferences," Nature Communications, Nature, vol. 16(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011385. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.