Quanquan Gu

University of California Los Angeles

[intermediate/advanced] Benign Overfitting in Machine Learning: From Linear Models to Neural Networks

Summary

In modern machine learning, complex models such as deep neural networks have received increasing popularity. These complicated models are known to be able to fit noisy training data sets, while at the same time achieving small test errors. This benign overfitting phenomenon is not a unique feature of deep learning. Even for linear models and kernel methods, recent work has demonstrated that interpolators on the noisy training data can still perform near optimally on the test data. In this short course, I will talk about a series of recent results on benign overfitting, ranging from minimum-norm interpolator, constant step-size stochastic gradient descent (SGD) to two-layer convolutional neural networks. I will also briefly discuss benign overfitting in adversarial training.

Syllabus

Benign overfitting in linear regression/ridge regression
Benign overfitting in stochastic gradient descent
Benign overfitting in two-layer convolutional neural networks
Benign overfitting in adversarial training

References

Belkin, M., Ma, S., & Mandal, S. (2018, July). To understand deep learning we need to understand kernel learning. In International Conference on Machine Learning (pp. 541-549).

Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. J. (2019). Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560.

Belkin, M., Hsu, D., & Xu, J. (2020). Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science, 2(4), 1167-1180.

Muthukumar, V., Vodrahalli, K., Subramanian, V., & Sahai, A. (2020). Harmless interpolation of noisy data in regression. IEEE Journal on Selected Areas in Information Theory, 1(1), 67-83.

Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48), 30063-30070.

Tsigler, A. & Bartlett, (2020). Benign overfitting in ridge regression. arXiv preprint arXiv:2009.14286.

Chatterji, N. S., & Long, P. M. (2021). Finite-sample analysis of interpolating linear classifiers in the overparameterized regime. Journal of Machine Learning Research, 22(129), 1-30.

Cao, Y., Gu, Q., & Belkin, M. (2021). Risk bounds for over-parameterized maximum margin classification on sub-gaussian mixtures. Advances in Neural Information Processing Systems, 34.

Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2021). Benign Overfitting of Constant-Stepsize SGD for Linear Regression. In COLT.

Wu, J., Zou, D., Braverman, V., Gu, Q., & Kakade, S. M. (2021). Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression. arXiv preprint arXiv:2110.06198.

Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2021). The Benefits of Implicit Regularization from SGD in Least Squares Problems. In NeurIPS.

Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. M. (2022). Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime. arXiv preprint arXiv:2203.03159.

Zou, D., Cao, Y., Li, Y., & Gu, Q. (2021). Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization. arXiv preprint arXiv:2108.11371.

Cao, Y., Chen, Z., Belkin, M., & Gu, Q. (2022). Benign Overfitting in Two-layer Convolutional Neural Networks. arXiv preprint arXiv:2202.06526.

Chen, J., Cao, Y., & Gu, Q. (2021). Benign Overfitting in Adversarially Robust Linear Classification. arXiv preprint arXiv:2112.15250.

Pre-requisites

Linear algebra, calculus, probability theory and statistics, machine learning, optimization.

Short bio

Quanquan Gu is an Assistant Professor of Computer Science at UCLA. His research is in the area of artificial intelligence and machine learning, with a focus on developing and analyzing nonconvex optimization algorithms for machine learning to understand large-scale, dynamic, complex, and heterogeneous data and building the theoretical foundations of deep learning and reinforcement learning. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He received Alfred P. Sloan Research Fellowship and NSF CAREER Award, among many other research awards from industrial companies. He has published 100+ peer-reviewed papers on top machine learning venues such as JMLR, MLJ, COLT, ICML, NeurIPS, AISTATS, UAI. He also serves as an associate/section editor for Journal of Artificial Intelligence and PLOS One, area chair/senior program committee member for ICML, NeurIPS, ICLR, AISTATS, AAAI, UAI and IJCAI.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.