Yingbin Liang

Ohio State University

[intermediate/advanced] Theory on Training Dynamics of Transformers

Summary

Transformers, as foundation models, have recently revolutionized many machine learning (ML) applications. Alongside their tremendous experimental successes, theoretical studies have also emerged to explain why transformers can be trained to achieve fantastic performance. This tutorial aims to provide an overview of these recent theoretical investigations that have characterized the training dynamics of transformer-based ML models. Additionally, the tutorial will explain the primary techniques and tools employed for such analyses, which leverage various information theoretical concepts and tools in addition to learning theory, stochastic optimization, dynamical systems, probability, etc.

Syllabus

The tutorial will begin with an introduction to basic transformer models, and then delve into several ML problems where transformers have found extensive applications, such as in-context learning, next token prediction, and self-supervised learning. For each learning problem, the tutorial will go over the problem formulation, the main theoretical techniques for characterizing the training process, the convergence guarantee and the optimality of the attention models at the time of convergence, the implications to the learning problem, and the insights and guidelines to practical solutions. Finally, the tutorial will discuss future directions and open problems in this actively evolving field.

References

Yu Huang, Yuan Cheng, Yingbin Liang. “In-context convergence of transformers”, Proc. International Conference on Machine Learning (ICML), 2024.

Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi. “In-context learning with representations: Contextual generalization of trained transformers”, Proc. Advances in Neural Information Processing Systems (NeurIPS), 2024.

Ruiquan Huang, Yingbin Liang, Jing Yang. “Non-asymptotic convergence of training transformers for next-token prediction”, Proc. Advances in Neural Information Processing Systems (NeurIPS), 2024.

Yu Huang, Zixin Wen, Yuejie Chi, Yingbin Liang. “How transformers learn diverse attention correlations in masked vision pretraining”, arXiv 2403.02233, 2024.

Pre-requisites

Basics of deep learning, language model (preferred), basics of optimization, probability theory.

Short bio

Dr. Yingbin Liang is currently a Professor at the Department of Electrical and Computer Engineering at the Ohio State University (OSU), and a core faculty of the Ohio State Translational Data Analytics Institute (TDAI). She also serves as the Deputy Director of the AI-EDGE Institute at OSU. Dr. Liang received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 2005, and served on the faculty of University of Hawaii and Syracuse University before she joined OSU. Dr. Liang’s research interests include machine learning, optimization, information theory, and statistical signal processing. Dr. Liang received the National Science Foundation CAREER Award and the State of Hawaii Governor Innovation Award in 2009. She also received EURASIP Best Paper Award in 2014. She is an IEEE fellow.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.