Author
Listed:
- Ari E Kahn
- Dani S Bassett
- Nathaniel D Daw
Abstract
Decisions in humans and other organisms depend, in part, on learning and using models that capture the statistical structure of the world, including the long-run expected outcomes of our actions. One prominent approach to forecasting such long-run outcomes is the successor representation (SR), which predicts future states aggregated over multiple timesteps. Although much behavioral and neural evidence suggests that people and animals use such a representation, it remains unknown how they acquire it. It has frequently been assumed to be learned by temporal difference bootstrapping (SR-TD(0)), but this assumption has largely not been empirically tested or compared to alternatives including eligibility traces (SR-TD(λ>0)). Here we address this gap by leveraging trial-by-trial reaction times in graph sequence learning tasks, which are favorable for studying learning dynamics because the long horizons in these studies differentiate the transient update dynamics of different learning rules. We examined the behavior of SR-TD(λ) on a probabilistic graph learning task alongside a number of alternatives, and found that behavior was best explained by a hybrid model which learned via SR-TD(λ) alongside an additional predictive model of recency. The relatively large λ we estimate indicates a predominant role of eligibility trace mechanisms over the bootstrap-based chaining typically assumed. Our results provide insight into how humans learn predictive representations, and demonstrate that people simultaneously learn the SR alongside lower-order predictions.Author summary: Our ability to plan intelligently requires predicting the state of the world multiple steps into the future. Enumerating future outcomes step-by-step, however, is slow and costly. Instead, research has shown that people rely on simplified models of the world that skip across multiple steps at once. How do we construct these simplified models? One promising idea is the successor representation (SR), which predicts future events via a simple and neurally plausible computation. The SR has been shown to explain a range of behavioral phenomena, but these studies have not identified which among many learning rules the brain uses to build the SR. Plausible mechanisms for learning associations over delays (called bootstrapping and eligibility traces) both converge to identical simplified world models, and thus existing studies on the SR, which focus on well trained behavior, are unable to distinguish between them. Here, we answer this question by examining behavior on a graph learning task, where stimulus-by-stimulus reaction times have been shown to reflect predictions over long temporal horizons. Through both model fitting and model-agnostic comparisons, we find that behavior is best explained by a learning rule heavily dependent on eligibility traces, in contrast to previous work which generally assumed an (untested) bootstrapping update rule.
Suggested Citation
Ari E Kahn & Dani S Bassett & Nathaniel D Daw, 2025.
"Trial-by-trial learning of successor representations in human behavior,"
PLOS Computational Biology, Public Library of Science, vol. 21(11), pages 1-18, November.
Handle:
RePEc:plo:pcbi00:1013696
DOI: 10.1371/journal.pcbi.1013696
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013696. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.