Xiaowei Xu
[intermediate/advanced] From Transformer to ChatGPT and beyond: How Large Language Models Revolutionize AI?
Summary
In recent years, the development of large language models, such as the transformer architecture, has led to a paradigm shift in the field of artificial intelligence. These models have transformed natural language processing tasks, including machine translation, question answering, and language generation. One such model, ChatGPT, is trained on vast amounts of text data to generate human-like responses to textual prompts.
This lecture tutorial will provide a comprehensive overview of the journey from the transformer architecture to ChatGPT and beyond. We will explore the training methodologies and architectures used to build these models and how they have paved the way for future research in AI. Two major paradigms of training large language models, fine-tuning and in-context learning, will also be covered. Additionally, we will demonstrate how to use pre-trained large language models for causal inference and other cutting-edge machine learning tasks.
Furthermore, we will discuss the ethical implications of these models and their impact on society. By the end of this tutorial, participants will gain a comprehensive understanding of the evolution of large language models and their significance in shaping the future of AI. Whether you are a researcher or a practitioner in the field, this tutorial will provide valuable insights into the latest developments in natural language processing and the potential applications of large language models.
Syllabus
- Introduction
- Generative models
- Language model
- Transformer model
- Scaling law of large language models
- BERT: Bidirectional Encoder Representations from Transformers
- Pre-training and fine-tuning
- GPT: Generative Pre-trained Transformer
- In-context learning
- ChatGPT
- Language model powered causal inference: the art of discovery of cause-and-effect relationships from text
- Conclusion and future directions
References
Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).
Radford, Alec, et al. “Improving language understanding by generative pre-training.” (2018).
Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.
Kaplan, Jared, et al. “Scaling laws for neural language models.” arXiv preprint arXiv:2001.08361 (2020).
Brown, Tom, et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, Denny Zhou. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022).
Wang, Yizhong, et al. “Self-Instruct: Aligning Language Model with Self Generated Instructions.” arXiv preprint arXiv:2212.10560 (2022).
Alpaca: A Strong, Replicable Instruction-Following Model.
Cheng, Daixuan, et al. “UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation.” arXiv preprint arXiv:2303.08518 (2023).
Shinn, Noah, Beck Labash, and Ashwin Gopinath. “Reflexion: an autonomous agent with dynamic memory and self-reflection.” arXiv preprint arXiv:2303.11366 (2023).
Wang, Xingqiao, et al. “InferBERT: a transformer-based causal inference framework for enhancing pharmacovigilance.” Frontiers in Artificial Intelligence 4 (2021): 659622.
Lambert, Nathan, et al. “Illustrating Reinforcement Learning from Human Feedback (RLHF)”, Hugging Face Blog, 2022.
Ouyang, Long, et al. “Training language models to follow instructions with human feedback.” arXiv preprint arXiv:2203.02155 (2022).
Wang, Xingqiao, et al. “DeepCausality: A general AI-powered causal inference framework for free text: A case study of LiverTox.” Frontiers in Artificial Intelligence 5 (2022).
Qin, Chengwei, et al. “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?.” arXiv preprint arXiv:2302.06476 (2023).
OpenAI. “How should AI systems behave, and who should decide?”
Pre-requisites
Mathematics and machine learning at the level of an undergraduate degree in computer science: basic multivariate calculus, probability theory, linear algebra, probabilistic graphical models, and neural networks.
Short bio
Xiaowei Xu, a professor of Information Science at the University of Arkansas, Little Rock (UALR), received his Ph.D. degree in Computer Science at the University of Munich in 1998. Before his appointment in UALR, he was a senior research scientist in Siemens, Munich, Germany. His research spans data mining, machine learning and artificial intelligence. Dr. Xu is a recipient of 2014 ACM SIGKDD Test of Time award for his contribution to the density-based clustering algorithm DBSCAN, which is one of the most commonly used clustering algorithms.