Continual Reinforcement Learning with Multi-Timescale Successor Features
Raymond Chua, Blake Richards, Doina Precup, McGill University, Canada; Christos Kaplanis, DeepMind, United Kingdom
Posters 2 Poster
Pacific Ballroom H-O
Fri, 26 Aug, 19:30 - 21:30 Pacific Time (UTC -7)
Learning and memory consolidation in the brain occur over multiple timescales. Inspired by this observation, it has been shown that catastrophic forgetting in reinforcement learning (RL) agents can be mitigated by consolidating Q-value function parameters at multiple timescales. In this work, we combine this approach with successor features, and show that by consolidating successor features and preferences learned over multiple timescales we can further mitigate catastrophic forgetting. In particular, we show that agents trained with this approach rapidly recall previously rewarding sites in large environments, whereas those trained without this decomposition and consolidation mechanism do not. These results therefore contribute to our understanding of the functional role of synaptic plasticity and memory systems operating at multiple timescales, and demonstrate that RL can be improved by capturing features of biological memory with greater fidelity.