[intermediate/advanced] Bilevel Optimization and Applications in Deep Learning
Many modern machine learning (ML) problems such as meta-learning, hyperparameter optimization, neural architecture search, and reinforcement learning naturally exhibit bilevel optimization (BO) structure, where the inner problem solves an optimization problem to a certain extent, which then serves as part of an ultimate objective to be further optimized in an outer problem. Thus, BO has arisen as a powerful paradigm for providing principled algorithm design and performance characterization for bilevel ML problems. Extensive interest has been inspired recently in advancing BO algorithms and further leveraging these techniques to improve the efficiency and scalability of bilevel deep learning. This lecture aims to introduce the basic concept and algorithm design principles of BO, and present recent research advances of BO and their applications in several major bilevel ML problems.
Specifically, we will first introduce the formulation of BO and the types of ML problems that BO can model. We will then introduce several popular BO algorithms, including AID and ITD type algorithms and their stochastic variants, and provide the performance comparison among these algorithms with respect to the convergence rate, computational cost, and scalability. We will further discuss several important implementation issues such as the impact of loops, second-order computations, and Hessian-free design, and how they will affect the performance of BO algorithms in deep learning. We will then present the applications of BO algorithms in meta-learning, hyperparameter optimization, and representation learning, and the experimental validations of these algorithms. We will finally conclude the talk with remarks on open problems and future directions.
- Introduction of bilevel optimization and applications in ML
- Bilevel optimization algorithms and performance
- Implementation issues in deep learning
- Application examples: meta-learning, hyperparameter optimization, etc.
Kaiyi Ji, Junjie Yang, Yingbin Liang. “Bilevel optimization: Convergence analysis and enhanced design”, Proc. International Conference on Machine Learning (ICML), 2021.
Junjie Yang, Kaiyi Ji, Yingbin Liang. “Provably faster algorithms for bilevel optimization”, Proc. Advances in Neural Information Processing Systems (NeurIPS), 2021.
Daouda Sow, Kaiyi Ji, Yingbin Liang. “On the convergence theory for Hessian-free bilevel algorithms”, Proc. Advances in Neural Information Processing Systems (NeurIPS), 2022.
Kaiyi Ji, Jason D. Lee, Yingbin Liang, H. Vincent Poor. “Convergence of meta-learning with task-specific adaptation over partial parameters”, Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020.
Ankur Sinha, Pekka Malo, and Kalyanmoy Deb. “A review on bilevel optimization: from classical to evolutionary approaches and applications”, IEEE Transactions on Evolutionary Computation, 22(2):276–295, 2017.
Familiarity with the optimization concept, (stochastic) gradient descent approaches. Knowledge of basic machine learning problems such as classification, meta-learning.
Dr. Yingbin Liang is currently a Professor at the Department of Electrical and Computer Engineering at The Ohio State University (OSU), and a core faculty of the Ohio State Translational Data Analytics Institute (TDAI). She also serves as the Deputy Director of the AI-EDGE Institute at OSU. Dr. Liang received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 2005, and served on the faculty of University of Hawaii and Syracuse University before she joined OSU. Dr. Liang’s research interests include machine learning, optimization, information theory, and statistical signal processing. Dr. Liang received the National Science Foundation CAREER Award and the State of Hawaii Governor Innovation Award in 2009. She also received the EURASIP Best Paper Award in 2014. She is an IEEE fellow.