Author
Listed:
- Moritz Möller
- Sanjay Manohar
- Rafal Bogacz
Abstract
To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We show that the new model has an advantage over conventional reinforcement learning models in a value tracking task, and approaches a theoretic limit of performance provided by the Kalman filter. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. In the proposed network, dopaminergic neurons encode reward prediction errors scaled by standard deviation of rewards. We show that such scaling may arise if the striatal neurons learn the standard deviation of rewards and modulate the activity of dopaminergic neurons. The model is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.Author summary: The basal ganglia system is a collection of subcortical nuclei in the mammalian brain. This system and its dopaminergic inputs are associated with learning from rewards. Here, dopamine is thought to signal errors in reward prediction. The structure and function of the basal ganglia system are not fully understood yet—for example, the basal ganglia are split into two antagonistic pathways, but the reason for this split and the role of the two pathways are unknown. Further, it has been found that under some circumstances, rewards of different sizes lead to dopamine responses of similar size, which cannot be explained with the reward prediction error theory. Here, we propose a new model of learning in the basal ganglia—the scaled prediction error model. According to our model, both reward average and reward uncertainty are tracked and represented in the two basal ganglia pathways. The learned reward uncertainty is then used to scale dopaminergic reward prediction errors, which effectively renders learning adaptive to reward noise. We show that such learning is more robust than learning from unscaled prediction errors and that it explains several physiological features of the basal ganglia system.
Suggested Citation
Moritz Möller & Sanjay Manohar & Rafal Bogacz, 2022.
"Uncertainty–guided learning with scaled prediction errors in the basal ganglia,"
PLOS Computational Biology, Public Library of Science, vol. 18(5), pages 1-24, May.
Handle:
RePEc:plo:pcbi00:1009816
DOI: 10.1371/journal.pcbi.1009816
Download full text from publisher
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.
- Minsu Abel Yang & Min Whan Jung & Sang Wan Lee, 2025.
"Striatal arbitration between choice strategies guides few-shot adaptation,"
Nature Communications, Nature, vol. 16(1), pages 1-26, December.
- Payam Piray & Nathaniel D. Daw, 2024.
"Computational processes of simultaneous learning of stochasticity and volatility in humans,"
Nature Communications, Nature, vol. 15(1), pages 1-16, December.
- Payam Piray & Nathaniel D. Daw, 2021.
"A model for learning based on the joint estimation of stochasticity and volatility,"
Nature Communications, Nature, vol. 12(1), pages 1-16, December.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009816. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.