IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009070.html
   My bibliography  Save this article

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

Author

Listed:
  • He A Xu
  • Alireza Modirshanechi
  • Marco P Lehmann
  • Wulfram Gerstner
  • Michael H Herzog

Abstract

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.Author summary: Humans like to explore their environment: children play with toys, tourists explore touristic sites, and readers start a new book. Exploration is useful to build knowledge about the world in the form of a ‘world-model’. However, since the world is complex and changing, the learned world-model is sometimes wrong: if so, the feeling of surprise arises. Here, we distinguish surprise from novelty; we show that humans use surprise as a signal to decide when to adapt their behavior, while they use novelty to decide where and what to explore—to eventually develop an improved world-model. Intuitively, it seems obvious to use world-models to plan future actions. However, we show that in a complex and changing environment where planning needs heavy computations, participants rarely follow an explicit plan and take their actions mainly by shaping habits. Importantly, we show that the main role of their world-model is to signal when to be surprised and, hence, when to adapt their habits. In summary, our results show how surprise and novelty interact with human reinforcement learning, contribute to human adaptive and exploratory behavior, and correlate with EEG signals.

Suggested Citation

  • He A Xu & Alireza Modirshanechi & Marco P Lehmann & Wulfram Gerstner & Michael H Herzog, 2021. "Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-32, June.
  • Handle: RePEc:plo:pcbi00:1009070
    DOI: 10.1371/journal.pcbi.1009070
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009070
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009070&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009070?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas Akam & Rui Costa & Peter Dayan, 2015. "Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task," PLOS Computational Biology, Public Library of Science, vol. 11(12), pages 1-25, December.
    2. Roland T. Rust & David C. Schmittlein, 1985. "A Bayesian Cross-Validated Likelihood Method for Comparing Alternative Specifications of Quantitative Models," Marketing Science, INFORMS, vol. 4(1), pages 20-40.
    3. Mathias Pessiglione & Ben Seymour & Guillaume Flandin & Raymond J. Dolan & Chris D. Frith, 2006. "Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans," Nature, Nature, vol. 442(7106), pages 1042-1045, August.
    4. Sam Gijsen & Miro Grundei & Robert T Lange & Dirk Ostwald & Felix Blankenburg, 2021. "Neural surprise in somatosensory Bayesian learning," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-36, February.
    5. Florent Meyniel & Maxime Maheu & Stanislas Dehaene, 2016. "Human Inferences about Sequences: A Minimal Transition Probability Model," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-26, December.
    6. Jean Daunizeau & Vincent Adam & Lionel Rigoux, 2014. "VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data," PLOS Computational Biology, Public Library of Science, vol. 10(1), pages 1-16, January.
    7. Samuel J Gershman & Angela Radulescu & Kenneth A Norman & Yael Niv, 2014. "Statistical Computations Underlying the Dynamics of Memory Updating," PLOS Computational Biology, Public Library of Science, vol. 10(11), pages 1-13, November.
    8. Wouter Kool & Fiery A Cushman & Samuel J Gershman, 2016. "When Does Model-Based Control Pay Off?," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-34, August.
    9. Vincent Moens & Alexandre Zénon, 2019. "Learning and forgetting using reinforced Bayesian change detection," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-41, April.
    10. Carolina Feher da Silva & Todd A. Hare, 2020. "Humans primarily use model-based inference in the two-stage task," Nature Human Behaviour, Nature, vol. 4(10), pages 1053-1066, October.
    11. E Fong & C C Holmes, 2020. "On the marginal likelihood and cross-validation," Biometrika, Biometrika Trust, vol. 107(2), pages 489-496.
    12. Micha Heilbron & Florent Meyniel, 2019. "Confidence resets reveal hierarchical adaptive learning in humans," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-24, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antoine Collomb-Clerc & Maëlle C. M. Gueguen & Lorella Minotti & Philippe Kahane & Vincent Navarro & Fabrice Bartolomei & Romain Carron & Jean Regis & Stephan Chabardès & Stefano Palminteri & Julien B, 2023. "Human thalamic low-frequency oscillations correlate with expected value and outcomes during reinforcement learning," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    2. Florent Meyniel, 2020. "Brain dynamics for confidence-weighted learning," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-27, June.
    3. Flavia Mancini & Suyi Zhang & Ben Seymour, 2022. "Computational and neural mechanisms of statistical pain learning," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    4. Maël Lebreton & Karin Bacily & Stefano Palminteri & Jan B Engelmann, 2019. "Contextual influence on confidence judgments in human reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-27, April.
    5. Sam Gijsen & Miro Grundei & Robert T Lange & Dirk Ostwald & Felix Blankenburg, 2021. "Neural surprise in somatosensory Bayesian learning," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-36, February.
    6. Stefano Palminteri & Emma J Kilford & Giorgio Coricelli & Sarah-Jayne Blakemore, 2016. "The Computational Development of Reinforcement Learning during Adolescence," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-25, June.
    7. Bruno Miranda & W M Nishantha Malalasekera & Timothy E Behrens & Peter Dayan & Steven W Kennerley, 2020. "Combined model-free and model-sensitive reinforcement learning in non-human primates," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-25, June.
    8. Chih-Chung Ting & Nahuel Salem-Garcia & Stefano Palminteri & Jan B. Engelmann & Maël Lebreton, 2023. "Neural and computational underpinnings of biased confidence in human reinforcement learning," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    9. Wouter Kool & Fiery A Cushman & Samuel J Gershman, 2016. "When Does Model-Based Control Pay Off?," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-34, August.
    10. Micha Heilbron & Florent Meyniel, 2019. "Confidence resets reveal hierarchical adaptive learning in humans," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-24, April.
    11. Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
    12. Namwoon Kim & Jin K. Han & Rajendra K. Srivastava, 2002. "A Dynamic IT Adoption Model for the SOHO Market: PC Generational Decisions with Technological Expectations," Management Science, INFORMS, vol. 48(2), pages 222-240, February.
    13. repec:wyi:journl:002122 is not listed on IDEAS
    14. David Vaquero-Puyuelo & Concepción De-la-Cámara & Beatriz Olaya & Patricia Gracia-García & Antonio Lobo & Raúl López-Antón & Javier Santabárbara, 2021. "Anhedonia as a Potential Risk Factor of Alzheimer’s Disease in a Community-Dwelling Elderly Sample: Results from the ZARADEMP Project," IJERPH, MDPI, vol. 18(4), pages 1-12, February.
    15. Kim, Namwoon & Srivastava, Rajendra K., 2007. "Modeling cross-price effects on inter-category dynamics: The case of three computing platforms," Omega, Elsevier, vol. 35(3), pages 290-301, June.
    16. Abhik Roy & Jagmohan Raju, 2011. "The influence of demand factors on dynamic competitive pricing strategy: An empirical study," Marketing Letters, Springer, vol. 22(3), pages 259-281, September.
    17. Eric T. Bradlow & David C. Schmittlein, 2000. "The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines," Marketing Science, INFORMS, vol. 19(1), pages 43-62, June.
    18. Zixuan Tang & Chen Qu & Yang Hu & Julien Benistant & Frederic Moisan & Edmund Derrington & Jean-Claude Dreher, 2023. "Strengths of social ties modulate brain computations for third-party punishment," Post-Print hal-04325737, HAL.
    19. Hanan Shteingart & Tal Neiman & Yonatan Loewenstein, 2012. "The Role of First Impression in Operant Learning," Discussion Paper Series dp626, The Federmann Center for the Study of Rationality, the Hebrew University, Jerusalem.
    20. Isabella Rischall & Laura Hunter & Greg Jensen & Jacqueline Gottlieb, 2023. "Inefficient prioritization of task-relevant attributes during instrumental information demand," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    21. Sophie Massin, 2011. "La notion d'addiction en économie : La théorie du choix rationnel à l'épreuve," Revue d'économie politique, Dalloz, vol. 121(5), pages 713-750.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009070. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.