Tor Lattimore
[intermediate/advanced] Tools and Techniques of Reinforcement Learning to Overcome Bellman’s Curse of Dimensionality
The key idea of Bellman to use value functions to organize the search for good policies is at the heart of all successful reinforcement learning algorithms, no matter whether they are derived from principles of dynamic programming, or are based on climbing on the performance surface by following the gradient of the surface. The value functions themselves need to be approximated and when their domain, such as the state-space, is high-dimensional, naive approaches will suffer from the curse of dimensionality. In this lecture I will focus on when and how can we develop clever computational tools and techniques (and what are these) that can avoid the curse.
- Markov decision processes (MDPs) and value functions
- Basic results on the complexity of learning and planning in MDPs
- Using function approximation: From policy iteration to policy gradient
- How does entropy regularization fit into the picture?
- Low rank MDPs and relatives
- On the limits of efficient algorithms that use function approximation
Linear algebra (matrices, vectors, tensors, norms), calculus (functions over vector spaces, derivative, Lipschitz continuity, Banach’s fixed point theorem), probability theory (probabilities, expectations, concentration of measure for averages, martingales), basics of Markov decision processes (definitions, value functions, optimal value functions).
We will review some of these, but only very briefly.
Short bio
Tor Lattimore is a staff research scientist at DeepMind working mostly on decision-making algorithms. He was previously an assistant professor at Indiana University, Bloomington and before that a postdoc at the University of Alberta. He received a PhD from the Australian National University under the supervision of Marcus Hutter. Together with Csaba Szepesvári, he is the author of the book “Bandit Algorithms”, which is published by Cambridge University Press and freely available at http://banditalgs.com.