Atlas Wang

University of Texas Austin

[intermediate] Low Rank Strikes Back in the Era of Large Language Models

Summary

This tutorial explores the growing importance of low-rank approximation techniques in the context of large language models (LLMs). The session covers theoretical foundations, empirical observations, and practical applications of low-rank structures to enhance efficiency, interpretability, and robustness in LLMs. Topics include attention approximation, weight compression, gradient projection, and fine-tuning. Participants will gain insights into how low-rank methods reduce computational complexity and improve mechanistic understanding of LLMs.

Syllabus

Session I: Low-Rank Attention Approximation

Overview of attention mechanisms in LLMs.
Computational challenges and low-rank approximation solutions.
Recent advances connecting low-rank attention with state-space models and efficient inference.

Session II: Low-Rank Gradient Structures

Emergent low-rank structures in gradients during training.
Gradient low-rank projection (GaLore) for memory-efficient training.
Convergence analysis and empirical evaluations.

Session III: Low-Rank Structures in Weights and Features

Matrix and tensor decomposition for compression and fine-tuning.
Phenomena of low-rank collapse in token spaces.
Generalization and safety implications of low-rank modifications.

Open Research Questions

Interplay of low-rankness, sparsity, and quantization.
Mechanistic interpretability and theoretical understanding.

References

John Wright and Yi Ma. High-dimensional data analysis with low-dimensional models: Principles, computation, and applications. Cambridge University Press, 2022.

Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM, 58(3):1–37, 2011.

Ehsan Elhamifar and René Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765–2781, 2013.

Mingxue Xu, Yao Lei Xu, and Danilo P. Mandic. Tensorgpt: Efficient compression of the embedding layer in LLMs based on the tensor-train decomposition. https://arxiv.org/pdf/2307.00526, 2023.

Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, and Guangyu Sun. ASVD: Activation-aware singular value decomposition for compressing large language models. https://arxiv.org/pdf/2312.05821, 2023.

Ayush Kaushal, Tejas Vaidhya, and Irina Rish. LORD: Low rank decomposition of monolingual code LLMs for one-shot compression. https://arxiv.org/pdf/2309.14021, 2023.

Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma. Linformer: Self-attention with linear complexity. https://arxiv.org/pdf/2006.04768, 2020.

Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, and Christopher Ré. Scatterbrain: Unifying sparse and low-rank attention. NeurIPS, 34:17413–17426, 2021.

Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, and Beidi Chen. Get more with less: Synthesizing recurrence with KV cache compression for efficient LLM inference. ICML, 2024.

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. ICLR, 2022.

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. LoRA+: Efficient low rank adaptation of large models. ICML, 2024.

Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, and Tong Zhang. LISA: Layerwise importance sampling for memory-efficient large language model fine-tuning. https://arxiv.org/pdf/2403.17919, 2024.

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. DORA: Weight-decomposed low-rank adaptation. ICML, 2024.

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde, and Anna Rumshisky. ReLoRA: High-rank training through low-rank updates. ICLR, 2024.

Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, and Yuandong Tian. GaLore: Memory-efficient LLM training by gradient low-rank projection. ICML, 2024.

Zi Yang, Samridhi Choudhary, Xinfeng Xie, Cao Gao, Siegfried Kunzmann, and Zheng Zhang. CoMERA: Computing-and memory-efficient training via rank-adaptive tensor optimization. https://arxiv.org/pdf/2405.14377, 2024.

Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson. Assessing the brittleness of safety alignment via pruning and low-rank modifications. ICML, 2024.

Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Reddi, and Sanjiv Kumar. Low-rank bottleneck in multi-head attention models. ICML, PMLR, 119:864–873, 2020.

Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, and Pratik Chaudhari. The training process of many deep networks explores the same low-dimensional manifold. PNAS, 121(12):e2310002121, 2024.

Vardan Papyan, X.Y. Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training. PNAS, 117(40):24652–24663, 2020.

Yihe Dong, Jean-Baptiste Cordonnier, and Andreas Loukas. Attention is not all you need: Pure attention loses rank doubly exponentially with depth. ICML, PMLR, 139:2793–2803, 2021.

Pratyusha Sharma, Jordan T. Ash, and Dipendra Misra. The truth is in there: Improving reasoning in language models with layer-selective rank reduction. ICLR, 2024.

Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, et al. LoRA learns less and forgets less. https://arxiv.org/pdf/2405.09673, 2024.

Pre-requisites

Basic understanding of machine learning principles, including neural networks and language models. Familiarity with attention mechanisms and optimization techniques. Foundational knowledge in linear algebra and matrix decompositions is helpful but not mandatory.

Short bio

Professor Zhangyang “Atlas” Wang is a tenured Associate Professor at The University of Texas at Austin, holding the Temple Foundation Endowed Faculty Fellowship. He is currently on leave to serve as Research Director for XTX Markets, leading AI innovations in algorithmic trading. His research spans machine learning, optimization, generative AI, and neurosymbolic AI, with a focus on low-dimensional representations for efficient and reliable learning. Prof. Wang has received numerous awards, including the NSF CAREER Award and IEEE AI’s 10 To Watch, and has mentored students who have won many prestigious fellowships. He is an ACM Distinguished Speaker and IEEE Senior Member. See his full bio at: https://vita-group.github.io/research.html.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.