DeepLearn 2022 Summer
6th International Gran Canaria School
on Deep Learning
Las Palmas de Gran Canaria, Spain · July 25-29, 2022
Registration
Downloads
  • Call DeepLearn 2022 Summer
  • Poster DeepLearn 2022 Summer
  • Lecture Materials
  • Home
  • Schedule
  • Lecturers
  • News
  • Accommodation
  • Info
    • Sponsoring
    • Code of conduct
    • Visa
  • Home
  • Schedule
  • Lecturers
  • News
  • Accommodation
  • Info
    • Sponsoring
    • Code of conduct
    • Visa
Tor Lattimore

Tor Lattimore

DeepMind

[intermediate/advanced] Tools and Techniques of Reinforcement Learning to Overcome Bellman’s Curse of Dimensionality

Summary

The key idea of Bellman to use value functions to organize the search for good policies is at the heart of all successful reinforcement learning algorithms, no matter whether they are derived from principles of dynamic programming, or are based on climbing on the performance surface by following the gradient of the surface. The value functions themselves need to be approximated and when their domain, such as the state-space, is high-dimensional, naive approaches will suffer from the curse of dimensionality. In this lecture I will focus on when and how can we develop clever computational tools and techniques (and what are these) that can avoid the curse.

Syllabus

  • Markov decision processes  (MDPs) and value functions
  • Basic results on the complexity of learning and planning in MDPs
  • Using function approximation: From policy iteration to policy gradient
  • How does entropy regularization fit into the picture?
  • Low rank MDPs and relatives
  • On the limits of efficient algorithms that use function approximation

References

* Csaba Szepesvári: Reinforcement Learning Theory: Lecture Notes. 2021-2022. https://rltheory.github.io/

* Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press

* Tor Lattimore, Csaba Szepesvári, and Gellért Weisz. 2020. Learning with Good Feature Representations in Bandits and in RL with a Generative Model. ICML and arXiv:1911.07676.

* Weisz, Gellert, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, and Csaba Szepesvári. 2021. On Query-Efficient Planning in MDPs under Linear Realizability of the Optimal State-Value Function.

* G Weisz, C Szepesvári, A György: Tensorplan and the few actions lower bound for planning in mdps under linear realizability of optimal value functions, 2022.

* Abbasi-Yadkori, Y., Lazic, N., and Szepesvári, C. Modelfree linear quadratic control via reduction to expert prediction. In AISTATS, 2019.

* POLITEX: Regret Bounds for Policy Iteration using Expert Prediction. Abbasi-Yadkori, Y.; Bartlett, P.; Bhatia, K.; Lazic, N.; Szepesvári, C.; and Weisz, G. In ICML, pages 3692–3702, May 2019.

* Puterman, Martin L. 2005. Markov Decision Processes (Discrete Stochastic Dynamic Programming). Wiley-Interscience.

* Wang, Ruosong, Dean P. Foster, and Sham M. Kakade. 2020. What Are the Statistical Limits of Offline RL with Linear Function Approximation? arXiv [cs.LG].

* Even-Dar, E., Kakade, S. M., and Mansour, Y. Online Markov decision processes. Mathematics of Operations Research, 34(3):726–736, 2009.

* Chen, Y., & Wang, M. (2017). Lower bound on the computational complexity of discounted Markov decision problems. arXiv preprint arXiv:1705.07312.

* Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3), 227-233.

* Scherrer, B. (2016). Improved and generalized upper bounds on the complexity of policy iteration. Mathematics of Operations Research, 41(3), 758-774.

* Kearns, M., Mansour, Y., & Ng, A. Y. (2002). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine learning, 49(2), 193-208.

* Remi Munos (2014). From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. Foundations and Trends in Machine Learning: Vol. 7: No. 1, pp 1-129.

* Richard Bellman, Robert Kalaba and Bella Kotkin. 1963. Polynomial Approximation–A New Computational Technique in Dynamic Programming: Allocation Processes. Mathematics of Computation, 17 (82): 155-161.

* Schweitzer, Paul J., and Abraham Seidmann. 1985. Generalized Polynomial Approximations in Markovian Decision Processes. Journal of Mathematical Analysis and Applications 110 (2): 568–82.

* Antos, Andras, Csaba Szepesvári, and Rémi Munos. 2007. Learning near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path. Machine Learning 71 (1): 89–129.

* Bertsekas, Dimitri P. 2009. Chapter 6: Approximate Dynamic Programming, January, 1–118.

* Lagoudakis, M. G. and Parr, R. Least-squares policy iteration. The Journal of Machine Learning Re-search, 4:1107–1149, 2003.

* Munos, R. 2003. Error Bounds for Approximate Policy Iteration. ICML.

* Scherrer, Bruno. 2013. On the Performance Bounds of Some Policy Search Dynamic Programming Algorithms.

* Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts, 1996.

* Whitt, Ward. 1979. Approximations of Dynamic Programs, II. Mathematics of Operations Research 4 (2): 179–85.

* Powell, Warren B. 2011. Approximate Dynamic Programming. Solving the Curses of Dimensionality. Hoboken, NJ, USA: John Wiley & Sons, Inc.

* Lewis, Frank L., and Derong Liu. 2013. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken, NJ, USA: John Wiley & Sons, Inc.

* Sutton, R. S., McAllester, D. A., Singh, S. P., and Man-sour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Neural Information Processing Systems 12, pages 1057–1063.

* Silver, David, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic Policy Gradient Algorithms. In ICML.

* Bhandari, Jalaj, and Daniel Russo. 2019. Global Optimality Guarantees For Policy Gradient Methods, June.

* Agarwal, Alekh, Sham M. Kakade, Jason D. Lee, and Gaurav Mahajan. 2019. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift. arXiv.

* Mei, Jincheng, Chenjun Xiao, Csaba Szepesvari, and Dale Schuurmans. 2020. On the Global Convergence Rates of Softmax Policy Gradient Methods.

* Zhang, Junyu, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, and Mengdi Wang. 2020. Variational Policy Gradient Method for Reinforcement Learning with General Utilities.

* Bhandari, Jalaj, and Daniel Russo. 2020. A Note on the Linear Convergence of Policy Gradient Methods. arXiv.

* Chung, Wesley, Valentin Thomas, Marlos C. Machado, and Nicolas Le Roux. 2020. Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization.

* Li, Gen, Yuting Wei, Yuejie Chi, Yuantao Gu, and Yuxin Chen. 2021. Softmax Policy Gradient Methods Can Take Exponential Time to Converge.

Pre-requisites

Linear algebra (matrices, vectors, tensors, norms), calculus (functions over vector spaces, derivative, Lipschitz continuity, Banach’s fixed point theorem), probability theory (probabilities, expectations, concentration of measure for averages, martingales), basics of Markov decision processes (definitions, value functions, optimal value functions).

We will review some of these, but only very briefly.

Short bio

Tor Lattimore is a staff research scientist at DeepMind working mostly on decision-making algorithms. He was previously an assistant professor at Indiana University, Bloomington and before that a postdoc at the University of Alberta. He received a PhD from the Australian National University under the supervision of Marcus Hutter. Together with Csaba Szepesvári, he is the author of the book “Bandit Algorithms”, which is published by Cambridge University Press and freely available at http://banditalgs.com.

Other Courses

Wahid BhimjiWahid Bhimji
zyro-imageJoachim M. Buhmann
deeplearn-kate-saenkoKate Saenko
Arindam BanerjeeArindam Banerjee
deeplearn-pierre-baldiPierre Baldi
Mikhail BelkinMikhail Belkin
deeplearn-arthur-grettonArthur Gretton
deeplearn-philip-isolaPhillip Isola
Mohit IyyerMohit Iyyer
Irwin King 2Irwin King
Vincent LepetitVincent Lepetit
Dimitris N. MetaxasDimitris N. Metaxas
Sean MeynSean Meyn
deeplearn-louis-philippe-morencyLouis-Philippe Morency
Wojciech SamekWojciech Samek
Clara I. SánchezClarisa Sánchez
Björn W. SchullerBjörn W. Schuller
Jonathon ShlensJonathon Shlens
deeplearn-johan-suykensJohan Suykens
deeplearn-murat-tekalpA. Murat Tekalp
deeplearn-tkatchenkoAlexandre Tkatchenko
Li XiongLi Xiong
deeplearn-ming-yuanMing Yuan

DeepLearn 2022 Spring

CO-ORGANIZERS

Universidad de Las Palmas de Gran Canaria

Universitat Rovira i Virgili

Institute for Research Development, Training and Advice – IRDTA, Brussels/London

Active links
  • DeepLearn 2023 Winter– 8th International School on Deep Learning
  • DeepLearn 2022 Autumn – 7th International School on Deep Learning
Past links
  • DeepLearn 2022 Spring
  • DeepLearn 2021 Summer
  • DeepLearn 2019
  • DeepLearn 2018
  • DeepLearn 2017
© IRDTA 2021. All Rights Reserved.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSIDsessionThis cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_91 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo