IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000894.html
   My bibliography  Save this article

Reinforcement Learning on Slow Features of High-Dimensional Input Streams

Author

Listed:
  • Robert Legenstein
  • Niko Wilbert
  • Laurenz Wiskott

Abstract

Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.Author Summary: Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. It is an open question how sensory information is processed by the brain in order to learn and perform rewarding behaviors. In this article, we propose a learning system that combines the autonomous extraction of important information from the sensory input with reward-based learning. The extraction of salient information is learned by exploiting the temporal continuity of real-world stimuli. A subsequent neural circuit then learns rewarding behaviors based on this representation of the sensory input. We demonstrate in two control tasks that this system is capable of learning complex behaviors on raw visual input.

Suggested Citation

  • Robert Legenstein & Niko Wilbert & Laurenz Wiskott, 2010. "Reinforcement Learning on Slow Features of High-Dimensional Input Streams," PLOS Computational Biology, Public Library of Science, vol. 6(8), pages 1-13, August.
  • Handle: RePEc:plo:pcbi00:1000894
    DOI: 10.1371/journal.pcbi.1000894
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000894
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000894&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000894?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chklovskii, Dmitri B & Koulakov, Alexei A, 2000. "A wire length minimization approach to ocular dominance patterns in mammalian visual cortex," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 284(1), pages 318-334.
    2. Attila Losonczy & Judit K. Makara & Jeffrey C. Magee, 2008. "Compartmentalized dendritic plasticity and input feature storage in neurons," Nature, Nature, vol. 452(7186), pages 436-441, March.
    3. Mathias Franzius & Henning Sprekeler & Laurenz Wiskott, 2007. "Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells," PLOS Computational Biology, Public Library of Science, vol. 3(8), pages 1-18, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hanan Shteingart & Yonatan Loewenstein, 2014. "Reinforcement Learning and Human Behavior," Discussion Paper Series dp656, The Federmann Center for the Study of Rationality, the Hebrew University, Jerusalem.
    2. Sven Dähne & Niko Wilbert & Laurenz Wiskott, 2014. "Slow Feature Analysis on Retinal Waves Leads to V1 Complex Cells," PLOS Computational Biology, Public Library of Science, vol. 10(5), pages 1-13, May.
    3. Gianluigi Mongillo & Hanan Shteingart & Yonatan Loewenstein, 2014. "The Misbehavior of Reinforcement Learning," Discussion Paper Series dp661, The Federmann Center for the Study of Rationality, the Hebrew University, Jerusalem.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matteo Farinella & Daniel T Ruedt & Padraig Gleeson & Frederic Lanore & R Angus Silver, 2014. "Glutamate-Bound NMDARs Arising from In Vivo-like Network Activity Extend Spatio-temporal Integration in a L5 Cortical Pyramidal Cell Model," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-21, April.
    2. Balázs Ujfalussy & Tamás Kiss & Péter Érdi, 2009. "Parallel Computational Subunits in Dentate Granule Cells Generate Multiple Place Fields," PLOS Computational Biology, Public Library of Science, vol. 5(9), pages 1-16, September.
    3. Dejan Pecevski & Lars Buesing & Wolfgang Maass, 2011. "Probabilistic Inference in General Graphical Models through Sampling in Stochastic Networks of Spiking Neurons," PLOS Computational Biology, Public Library of Science, vol. 7(12), pages 1-25, December.
    4. Ian Cone & Claudia Clopath, 2024. "Latent representations in hippocampal network model co-evolve with behavioral exploration of task structure," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    5. Hanle Zheng & Zhong Zheng & Rui Hu & Bo Xiao & Yujie Wu & Fangwen Yu & Xue Liu & Guoqi Li & Lei Deng, 2024. "Temporal dendritic heterogeneity incorporated with spiking neural networks for learning multi-timescale dynamics," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    6. Sven Dähne & Niko Wilbert & Laurenz Wiskott, 2014. "Slow Feature Analysis on Retinal Waves Leads to V1 Complex Cells," PLOS Computational Biology, Public Library of Science, vol. 10(5), pages 1-13, May.
    7. Linda Judák & Balázs Chiovini & Gábor Juhász & Dénes Pálfi & Zsolt Mezriczky & Zoltán Szadai & Gergely Katona & Benedek Szmola & Katalin Ócsai & Bernadett Martinecz & Anna Mihály & Ádám Dénes & Bálint, 2022. "Sharp-wave ripple doublets induce complex dendritic spikes in parvalbumin interneurons in vivo," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    8. Ruy Gómez-Ocádiz & Massimiliano Trippa & Chun-Lei Zhang & Lorenzo Posani & Simona Cocco & Rémi Monasson & Christoph Schmidt-Hieber, 2022. "A synaptic signal for novelty processing in the hippocampus," Nature Communications, Nature, vol. 13(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000894. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.