Atlas Wang

University of Texas Austin

[intermediate] Sparse Neural Networks: From Practice to Theory

Summary

A sparse neural network (NN) has most of its parameters set to zero and is traditionally considered as the product of NN compression (i.e., pruning). Yet recently, sparsity has exposed itself as an important bridge for modeling the underlying low dimensionality of NNs, for understanding their generalization, optimization dynamics, implicit regularization, expressivity, and robustness. Deep NNs learned with sparsity-aware priors have also demonstrated significantly improved performances through a full stack of applied work on algorithms, systems, and hardware. In this talk, I plan to cover some of our recent progress on the practical, theoretical, and scientific aspects of sparse NNs. I will try scratching the surface of three aspects: (1) practically, why one should love a sparse NN, beyond just a post-training NN compression tool; (2) theoretically, what are some guarantees that one can expect from sparse NNs; and (3) what is future prospect of exploiting sparsity.

Syllabus

“Old School” Pruning in Neural Networks
Towards End-to-End Sparsity: Before and During Training
Lottery Ticket Hypothesis and Variants
Dynamic Sparse Training
Sparse Transfer Learning
Sparse Mixture-of-Experts
Blessings of Sparsity beyond Efficiency
Theoretical Foundations of Sparse Neural Networks
Challenges Ahead: Scaling up Sparsity

References

Hoefler, Torsten, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. “Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.” J. Mach. Learn. Res. 22, no. 241 (2021): 1-124.

Frankle, Jonathan, and Michael Carbin. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks.” In International Conference on Learning Representations. 2019.

Chen, Tianlong, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, and Michael Carbin. “The lottery ticket hypothesis for pre-trained BERT networks.” Advances in Neural Information Processing Systems 33 (2020): 15834-15846.

Evci, Utku, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. “Rigging the lottery: Making all tickets winners.” In International Conference on Machine Learning, pp. 2943-2952. PMLR, 2020.

Liu, Shiwei, Lu Yin, Decebal Constantin Mocanu, and Mykola Pechenizkiy. “Do we actually need dense over-parameterization? In-time over-parameterization in sparse training.” In International Conference on Machine Learning, pp. 6989-7000. PMLR, 2021.

Fedus, William, Jeff Dean, and Barret Zoph. “A review of sparse expert models in deep learning.” arXiv preprint arXiv:2209.01667 (2022).

Chen, Tianlong, Zhenyu Zhang, Jun Wu, Randy Huang, Sijia Liu, Shiyu Chang, and Zhangyang Wang. “Can You Win Everything with a Lottery Ticket?” Transactions of Machine Learning Research. 2022.

Malach, Eran, Gilad Yehudai, Shai Shalev-Schwartz, and Ohad Shamir. “Proving the lottery ticket hypothesis: Pruning is all you need.” In International Conference on Machine Learning, pp. 6682-6691. PMLR, 2020.

Yang, Hongru, and Zhangyang Wang. “On the Neural Tangent Kernel Analysis of Randomly Pruned Wide Neural Networks.” arXiv preprint arXiv:2203.14328 (2022).

Pre-requisites

Familiarity with linear algebra, probability, calculus, and deep learning.

Short bio

Prof. Wang is currently the Jack Kilby/Texas Instruments Endowed Assistant Professor in the ECE Department of UT Austin. He is also a faculty member of the UT Computer Science and the Oden Institute CSEM program. Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning. At present, his core research mission is to leverage, understand and expand the role of sparsity, from classical optimization to modern neural networks, whose impacts span over many important topics such as efficient training/inference/transfer (especially, of large foundation models), robustness and trustworthiness, learning to optimize (L2O), generative AI, and graph learning. His research has received extensive funding supports, high-profile media coverage, as well as numerous awards and recognitions. More details could be found at: https://vita-group.github.io/.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.