
Cho-Jui Hsieh
[intermediate/advanced] Optimizers for Large Language Model Training
Summary
These lectures will cover the theoretical foundations and practical algorithms of widely used deep-learning optimizers, along with the key challenges of large-scale model training.
Syllabus
- Introduction to (continuous) optimization
- Gradient descent and stochastic gradient descent
- Adaptive optimizers and momentum
- Second-order optimizers
- Muon optimizer
- Distributed and large batch training
- Scale-invariance optimizers for LLM finetuning
- Challenges in large-scale LLM training
References
Pre-requisites
Calculus. Linear Algebra. Mathematical analysis. Machine Learning.
Short bio
Cho-Jui Hsieh is an associate professor in the Computer Science Department at UCLA. His work primarily focuses on enhancing the efficiency and robustness of machine learning systems, and he has made significant contributions to multiple widely-used machine learning packages. He has been honored with the NSF Career Award, Samsung AI Researcher of the Year, and Google Research Scholar Award, and his work has been acknowledged with several paper awards in ICLR, KDD, ICDM, ICPP, and SC.
















