Sean Meyn
[introductory/intermediate] Reinforcement Learning: Fundamentals, and Roadmaps for Successful Design
Summary
The theory of reinforcement learning (RL) is currently grounded in the theory of optimal control—typically MDPs. The dream of RL is automatic control that is truly automatic: without any knowledge of physics or biology or medicine, an RL algorithm tunes itself to become a super controller: the smoothest ride into space, and the most expert micro-surgeon! We are currently far from practical autonomous driving or surgery, but the science is progressing quickly. With so much enthusiasm we expect many breakthroughs in the near future.
This lecture series will present the basics of algorithm design, and is hopefully of interest to both newcomers and experienced researchers in RL. It “begins at the beginning”, with the basics of algorithm design grounded in refinements of the “ODE method”. The focus is algorithm design with emphasis on fast convergence. Also fundamental in RL are information theoretic concepts used to investigate the design of “exploration” for learning. The course will visit this topic, but with far less depth.
Syllabus
The content will closely follow the new monograph by Sean Meyn “Control Systems and Reinforcement Learning”, covering the following topics:
1. Revising the ODE method (drawing from chapters 4 and 8):
This is a survey that is relevant to a much broader community of researchers in machine learning. The key message: Algorithm acceleration is possible once we better understand the nonlinear dynamics associated with the algorithms we construct. There are many new techniques to gain understanding.
2. Variance matters (from chapters 7 and 8):
How nonlinear dynamics coupled with random disturbances impact convergence rates.
3. Stochastic control and TD learning (chapters 9 and 10):
Putting algorithm design to work: TD and Q-learning are often very slow to converge. We will learn why, and how these algorithms can be accelerated.
4. Zap and Deep Q-learning (chapters 9 and 10, and recent literature):
New techniques to tame complex nonlinear algorithm dynamics.
References
- S. Meyn. Control Systems and Reinforcement Learning, is to be published by Cambridge University Press. Pre-publication version online: https://meyn.ece.ufl.edu/2021/08/01/control-systems-and-reinforcement-learning/
- A. M. Devraj, A. Busic, and S. Meyn. Fundamental design principles for reinforcement learning algorithms. In K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, editors, Handbook on Reinforcement Learning and Control, Studies in Systems, Decision and Control series (SSDC, volume 325). Springer, 2021.
- Theory of Reinforcement Learning Boot Camp, Aug 31, 2020 to Sep 4, 2020. Video available online: https://simons.berkeley.edu/workshops/rl-2020-bc
- S. Chen, A. M. Devraj, F. Lu, A. Busic, and S. Meyn. Zap Q-Learning with nonlinear function approximation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, and arXiv e-prints 1910.05405, volume 33, pages 16879–16890, 2020.
Pre-requisites
Reinforcement learning has several foundations, including algorithm design and Markov Decision Processes (which is roughly equivalent to stochastic control theory). The course will present a fresh look at these foundations and how they lead to standard algorithms for reinforcement learning, as well as recent techniques designed to improve reliability. The control theory will be reviewed in lecture. It is most important that the students come with a good grasp of stochastic process fundamentals, as well as basics of ordinary differential equations and matrix algebra.
Short bio
Sean Meyn was raised by the beach in Southern California. Following his BA in mathematics at UCLA, he moved on to pursue a PhD with Peter Caines at McGill University. After about 20 years as a professor of ECE at the University of Illinois, in 2012 he moved to beautiful Gainesville. He is now Professor and Robert C. Pittman Eminent Scholar Chair in the Department of Electrical and Computer Engineering at the University of Florida, and director of the Laboratory for Cognition and Control. He also holds an Inria International Chair to support research with colleagues in France. His interests span many aspects of stochastic control, stochastic processes, information theory, and optimization. For the past decade, his applied research has focused on engineering, markets, and policy in energy systems.