DeepLearn 2023 Spring
9th International School
on Deep Learning
Bari, Italy · April 03-07, 2023
Registration
Downloads
  • Call DeepLearn 2023 Spring
  • Poster DeepLearn 2023 Spring
  • Lecture Materials
  • Home
  • Schedule
  • Lecturers
  • Sponsoring
  • News
  • Info
    • Accommodation
    • Travel to Bari
    • Code of conduct
    • Visa
    • Testimonials
  • Home
  • Schedule
  • Lecturers
  • Sponsoring
  • News
  • Info
    • Accommodation
    • Travel to Bari
    • Code of conduct
    • Visa
    • Testimonials
Holger Rauhut

‪Holger Rauhut

RWTH Aachen University

[intermediate] Gradient Descent Methods for Learning Neural Networks: Convergence and Implicit Bias

Summary

Gradient descent and stochastic gradient descent methods are at the core of training deep neural networks. Due to non-convexity of the loss functional and overparameterization convergence properties of these methods are not yet well-understood. This lecture series aims at introducing to mathematical aspects of learning deep neural networks and presenting initial results for simplified cases.

After a general introduction to (stochastic) gradient descent methods for deep learning, we will focus on linear neural networks (i.e., with linear activation function) for the theoretical analysis. While linear networks are not expressive enough for most applications, their mathematical analysis still poses significant challenges which should be understood before passing to nonlinear networks. Instead of (stochastic) gradient descent (SGD) methods, it is also beneficial to first study the corresponding gradient flow, which avoids a discussion of step size choices. We will show convergence to critical points of the loss functional, and for the square loss show convergence to global minima (both for the gradient flow and gradient descent). Moreover, the factorization structure of linear networks induces a Riemannian geometry so that the flow of the network can be interpreted as a Riemannian gradient flow.

In many learning scenarios one uses significantly more neural network parameters than training data. Despite the fact that then many networks exist which interpolate the data exactly so that the loss functional has many global minimizers, learned neural networks generalize very well to unseen data, which is in contrast to intuition from classical statistics that such a scenario would lead to overfitting. The used learning algorithms, i.e., (stochastic) gradient descent (SGD) methods (together with their initialization) impose an implicit bias on which minimizer is computed. The implied bias of (S)GD seems to be very favorable in practice. A working hypothesis is that (S)GD with small initialization promotes low complexity in a suitable sense. We will present first mathematical results in this direction for linear networks, where sparsity and/or low rank is promoted.

Syllabus

  • Introduction to training deep networks
  • Convergence theory for gradient flow and gradient descent for linear neural networks
  • Mathematical analysis of the implicit bias of gradient flow and gradient descent for learning linear neural networks in overparameterized scenarios

References

S. Arora, N. Cohen, N. Golowich, and W. Hu. A convergence analysis of gradient descent for deep linear neural networks, ICLR, 2019. arXiv:1810.02281.

S. Azulay, E. Moroshko, M. S. Nacson, B. Woodworth, N. Srebro, A. Globerson, and D. Soudry. On the implicit bias of initialization shape: Beyond infinitesimal mirror descent, 2021. arXiv:2102.09769.

B. Bah, H. Rauhut, U. Terstiege, M. Westdickenberg. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Information and Inference, Volume 11, Issue 1, 2022, pp 307–353.

G. M. Nguegnang, H. Rauhut, U. Terstiege. Convergence of gradient descent for learning linear neural networks. Preprint, 2021. arXiv:2108.02040.

H.-H. Chou, C. Gieshoff, J. Maly, H. Rauhut. Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank. Preprint, 2020. arXiv:2011.13772.

H.-H. Chou, J. Maly, H. Rauhut. More is Less: Inducing Sparsity via Overparameterization. Preprint, 2022.

B. Neyshabur, R. Tomioka, and N. Srebro. In search of the real inductive bias: On the role of implicit regularization in deep learning. ICLR, 2015.

F. Wu and P. Rebeschini. Implicit regularization in matrix sensing via mirror descent. Advances in Neural Information Processing Systems, 34, 2021.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. ICLR, 2017.

Pre-requisites

Multivariate analysis. Linear algebra. Basic knowledge on optimization helpful, but not necessary.

Short bio

1996 – 2001 Study of Mathematics at Technical University of Munich

2002 – 2004 Doctoral studies in Mathematics, Technical University of Munich (Supervisor: Prof. Dr. Rupert Lasser)

2005 – 2008 Postdoc at University of Vienna, Numerical Harmonic Analysis Group (Mentor: Prof. Dr. Hans Feichtinger)

2008 Habilitation in Mathematics

2008 – 2013 Professor of Mathematics (“Bonn Junior Fellow”) at University of Bonn, Hausdorff Center for Mathematics

Since 2013 Professor of Mathematics, RWTH Aachen University, Chair for Mathematics of Information Processing

2016 – 2018 Head of Department of Mathematics, RWTH Aachen University

2018 – 2022 Member of the Senate, RWTH Aachen University

Since 2022 Spokesperson of Collaborative Research Center “Sparsity and Singular Structures” (SFB 1481)

Other Courses

Babak Ehteshami BejnordiBabak Ehteshami Bejnordi
speakers-gleyzerSergei V. Gleyzer
speakers-kumarVipin Kumar
speakers-goldbergerJacob Goldberger
Christoph LampertChristoph Lampert
speakers-jingbianYingbin Liang
Xiaoming LiuXiaoming Liu
Michael MahoneyMichael Mahoney
Liza MijovicLiza Mijovic
William S. NobleWilliam S. Noble
Bhiksha RajBhiksha Raj
Bart ter Haar RomenyBart ter Haar Romeny
Tara SainathTara Sainath
Martin SchultzMartin Schultz
Adi Laurentiu TarcaAdi Laurentiu Tarca
Emma TolleyEmma Tolley
Michalis VazirgiannisMichalis Vazirgiannis
Atlas WangAtlas Wang
Guo-Wei WeiGuo-Wei Wei
Lei XingLei Xing
Xiaowei XuXiaowei Xu

DeepLearn 2023 Spring

CO-ORGANIZERS

Department of Computer Science
University of Bari “Aldo Moro”

Institute for Research Development, Training and Advice – IRDTA, Brussels/London

Active links
  • DeepLearn 2023 Summer – 10th International Gran Canaria School on Deep Learning
  • BigDat 2023 Summer – 7th International School on Big Data

Photos by: Ph. Eufemia Lella

Past links
  • DeepLearn 2023 Winter
  • DeepLearn 2022 Autumn
  • DeepLearn 2022 Summer
  • DeepLearn 2022 Spring
  • DeepLearn 2021 Summer
  • DeepLearn 2019
  • DeepLearn 2018
  • DeepLearn 2017
© IRDTA 2021. All Rights Reserved.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSIDsessionThis cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_91 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo