Atlas Wang
[intermediate] Sparse Neural Networks: From Practice to Theory
Summary
A sparse neural network (NN) has most of its parameters set to zero and is traditionally considered as the product of NN compression (i.e., pruning). Yet recently, sparsity has exposed itself as an important bridge for modeling the underlying low dimensionality of NNs, for understanding their generalization, optimization dynamics, implicit regularization, expressivity, and robustness. Deep NNs learned with sparsity-aware priors have also demonstrated significantly improved performances through a full stack of applied work on algorithms, systems, and hardware. In this talk, I plan to cover some of our recent progress on the practical, theoretical, and scientific aspects of sparse NNs. I will try scratching the surface of three aspects: (1) practically, why one should love a sparse NN, beyond just a post-training NN compression tool; (2) theoretically, what are some guarantees that one can expect from sparse NNs; and (3) what is future prospect of exploiting sparsity.
Syllabus
- “Old School” Pruning in Neural Networks
- Towards End-to-End Sparsity: Before and During Training
- Lottery Ticket Hypothesis and Variants
- Dynamic Sparse Training
- Sparse Transfer Learning
- Sparse Mixture-of-Experts
- Blessings of Sparsity beyond Efficiency
- Theoretical Foundations of Sparse Neural Networks
- Challenges Ahead: Scaling up Sparsity
References
Hoefler, Torsten, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. “Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.” J. Mach. Learn. Res. 22, no. 241 (2021): 1-124.
Frankle, Jonathan, and Michael Carbin. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks.” In International Conference on Learning Representations. 2019.
Chen, Tianlong, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, and Michael Carbin. “The lottery ticket hypothesis for pre-trained BERT networks.” Advances in Neural Information Processing Systems 33 (2020): 15834-15846.
Evci, Utku, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. “Rigging the lottery: Making all tickets winners.” In International Conference on Machine Learning, pp. 2943-2952. PMLR, 2020.
Liu, Shiwei, Lu Yin, Decebal Constantin Mocanu, and Mykola Pechenizkiy. “Do we actually need dense over-parameterization? In-time over-parameterization in sparse training.” In International Conference on Machine Learning, pp. 6989-7000. PMLR, 2021.
Fedus, William, Jeff Dean, and Barret Zoph. “A review of sparse expert models in deep learning.” arXiv preprint arXiv:2209.01667 (2022).
Chen, Tianlong, Zhenyu Zhang, Jun Wu, Randy Huang, Sijia Liu, Shiyu Chang, and Zhangyang Wang. “Can You Win Everything with a Lottery Ticket?” Transactions of Machine Learning Research. 2022.
Malach, Eran, Gilad Yehudai, Shai Shalev-Schwartz, and Ohad Shamir. “Proving the lottery ticket hypothesis: Pruning is all you need.” In International Conference on Machine Learning, pp. 6682-6691. PMLR, 2020.
Yang, Hongru, and Zhangyang Wang. “On the Neural Tangent Kernel Analysis of Randomly Pruned Wide Neural Networks.” arXiv preprint arXiv:2203.14328 (2022).
Pre-requisites
Familiarity with linear algebra, probability, calculus, and deep learning.
Short bio
Prof. Wang is currently the Jack Kilby/Texas Instruments Endowed Assistant Professor in the ECE Department of UT Austin. He is also a faculty member of the UT Computer Science and the Oden Institute CSEM program. Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning. At present, his core research mission is to leverage, understand and expand the role of sparsity, from classical optimization to modern neural networks, whose impacts span over many important topics such as efficient training/inference/transfer (especially, of large foundation models), robustness and trustworthiness, learning to optimize (L2O), generative AI, and graph learning. His research has received extensive funding supports, high-profile media coverage, as well as numerous awards and recognitions. More details could be found at: https://vita-group.github.io/.