IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008317.html
   My bibliography  Save this article

Reward-predictive representations generalize across tasks in reinforcement learning

Author

Listed:
  • Lucas Lehnert
  • Michael L Littman
  • Michael J Frank

Abstract

In computer science, reinforcement learning is a powerful framework with which artificial agents can learn to maximize their performance for any given Markov decision process (MDP). Advances over the last decade, in combination with deep neural networks, have enjoyed performance advantages over humans in many difficult task settings. However, such frameworks perform far less favorably when evaluated in their ability to generalize or transfer representations across different tasks. Existing algorithms that facilitate transfer typically are limited to cases in which the transition function or the optimal policy is portable to new contexts, but achieving “deep transfer” characteristic of human behavior has been elusive. Such transfer typically requires discovery of abstractions that permit analogical reuse of previously learned representations to superficially distinct tasks. Here, we demonstrate that abstractions that minimize error in predictions of reward outcomes generalize across tasks with different transition and reward functions. Such reward-predictive representations compress the state space of a task into a lower dimensional representation by combining states that are equivalent in terms of both the transition and reward functions. Because only state equivalences are considered, the resulting state representation is not tied to the transition and reward functions themselves and thus generalizes across tasks with different reward and transition functions. These results contrast with those using abstractions that myopically maximize reward in any given MDP and motivate further experiments in humans and animals to investigate if neural and cognitive systems involved in state representation perform abstractions that facilitate such equivalence relations.Author summary: Humans are capable of transferring abstract knowledge from one task to another. For example, in a right-hand-drive country, a driver has to use the right arm to operate the shifter. A driver who learned how to drive in a right-hand-drive country can adapt to operating a left-hand-drive car and use the other arm for shifting instead of re-learning how to drive. Despite the fact that both tasks require different coordination of motor skills, both tasks are the same in an abstract sense: In both tasks, a car is operated and there is the same progression from 1st to 2nd gear and so on. We study distinct algorithms by which a reinforcement learning agent can discover state representations that encode knowledge about a particular task, and evaluate how well they can generalize. Through a sequence of simulation results, we show that state abstractions that minimize errors in prediction about future reward outcomes generalize across tasks, even those that superficially differ in both the goals (rewards) and the transitions from one state to the next. This work motivates biological studies to determine if distinct circuits are adapted to maximize reward vs. to discover useful state representations.

Suggested Citation

  • Lucas Lehnert & Michael L Littman & Michael J Frank, 2020. "Reward-predictive representations generalize across tasks in reinforcement learning," PLOS Computational Biology, Public Library of Science, vol. 16(10), pages 1-27, October.
  • Handle: RePEc:plo:pcbi00:1008317
    DOI: 10.1371/journal.pcbi.1008317
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008317
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008317&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008317?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Nicholas T Franklin & Michael J Frank, 2018. "Compositional clustering in task structure learning," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-25, April.
    2. I. Momennejad & E. M. Russek & J. H. Cheong & M. M. Botvinick & N. D. Daw & S. J. Gershman, 2017. "The successor representation in human reinforcement learning," Nature Human Behaviour, Nature, vol. 1(9), pages 680-692, September.
    3. Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
    4. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    5. Nicky J. Welton & Howard H. Z. Thom, 2015. "Value of Information," Medical Decision Making, , vol. 35(5), pages 564-566, July.
    6. Evan M Russek & Ida Momennejad & Matthew M Botvinick & Samuel J Gershman & Nathaniel D Daw, 2017. "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-35, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Amirhosein Mosavi & Yaser Faghan & Pedram Ghamisi & Puhong Duan & Sina Faizollahzadeh Ardabili & Ely Salwana & Shahab S. Band, 2020. "Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics," Mathematics, MDPI, vol. 8(10), pages 1-42, September.
    2. Jaron T Colas & Wolfgang M Pauli & Tobias Larsen & J Michael Tyszka & John P O’Doherty, 2017. "Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-32, October.
    3. Momchil S Tomov & Samyukta Yagati & Agni Kumar & Wanqian Yang & Samuel J Gershman, 2020. "Discovery of hierarchical representations for efficient planning," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-42, April.
    4. Liu, Hui & Yu, Chengqing & Wu, Haiping & Duan, Zhu & Yan, Guangxi, 2020. "A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting," Energy, Elsevier, vol. 202(C).
    5. Ruohan Zhang & Shun Zhang & Matthew H Tong & Yuchen Cui & Constantin A Rothkopf & Dana H Ballard & Mary M Hayhoe, 2018. "Modeling sensory-motor decisions in natural behavior," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-22, October.
    6. Vincenzo Varriale & Antonello Cammarano & Francesca Michelino & Mauro Caputo, 2021. "Sustainable Supply Chains with Blockchain, IoT and RFID: A Simulation on Order Management," Sustainability, MDPI, vol. 13(11), pages 1-23, June.
    7. Valeria Costantini & Francesco Crespi & Giovanni Marin & Elena Paglialunga, 2016. "Eco-innovation, sustainable supply chains and environmental performance in European industries," LEM Papers Series 2016/19, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    8. Lee, Alice J. & Ames, Daniel R., 2017. "“I can’t pay more” versus “It’s not worth more”: Divergent effects of constraint and disparagement rationales in negotiations," Organizational Behavior and Human Decision Processes, Elsevier, vol. 141(C), pages 16-28.
    9. Hussain, Hadia & Murtaza, Murtaza & Ajmal, Areeb & Ahmed, Afreen & Khan, Muhammad Ovais Khalid, 2020. "A study on the effects of social media advertisement on consumer’s attitude and customer response," MPRA Paper 104675, University Library of Munich, Germany.
    10. A. G. Fatullayev & Nizami A. Gasilov & Şahin Emrah Amrahov, 2019. "Numerical solution of linear inhomogeneous fuzzy delay differential equations," Fuzzy Optimization and Decision Making, Springer, vol. 18(3), pages 315-326, September.
    11. Cyril Chalendard, 2015. "Use of internal information, external information acquisition and customs underreporting," Working Papers halshs-01179445, HAL.
    12. Arun Advani & William Elming & Jonathan Shaw, 2023. "The Dynamic Effects of Tax Audits," The Review of Economics and Statistics, MIT Press, vol. 105(3), pages 545-561, May.
    13. Philippe Aghion & Ufuk Akcigit & Matthieu Lequien & Stefanie Stantcheva, 2017. "Tax Simplicity and Heterogeneous Learning," NBER Working Papers 24049, National Bureau of Economic Research, Inc.
    14. Tulika Saha & Sriparna Saha & Pushpak Bhattacharyya, 2020. "Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-28, July.
    15. Marie Bjørneby & Annette Alstadsæter & Kjetil Telle, 2018. "Collusive tax evasion by employers and employees. Evidence from a randomized fi eld experiment in Norway," Discussion Papers 891, Statistics Norway, Research Department.
    16. Chuangen Gao & Shuyang Gu & Jiguo Yu & Hai Du & Weili Wu, 2022. "Adaptive seeding for profit maximization in social networks," Journal of Global Optimization, Springer, vol. 82(2), pages 413-432, February.
    17. Koessler, Frederic & Laclau, Marie & Renault, Jérôme & Tomala, Tristan, 2022. "Long information design," Theoretical Economics, Econometric Society, vol. 17(2), May.
    18. Jamal El-Den & Pratap Adikhari & Pratap Adikhari, 2017. "Social media in the service of social entrepreneurship: Identifying factors for better services," Journal of Advances in Humanities and Social Sciences, Dr. Yi-Hsing Hsieh, vol. 3(2), pages 105-114.
    19. Annette Alstadsæter & Wojciech Kopczuk & Kjetil Telle, 2019. "Social networks and tax avoidance: evidence from a well-defined Norwegian tax shelter," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 26(6), pages 1291-1328, December.
    20. Xiongnan Jin & Sejin Chun & Jooik Jung & Kyong-Ho Lee, 0. "A fast and scalable approach for IoT service selection based on a physical service model," Information Systems Frontiers, Springer, vol. 0, pages 1-16.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008317. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.