Xiaowei Xu

University of Arkansas Little Rock

[intermediate/advanced] From Transformer to ChatGPT and beyond: How Large Language Models Revolutionize AI?

Summary

In recent years, the development of large language models, such as the transformer architecture, has led to a paradigm shift in the field of artificial intelligence. These models have transformed natural language processing tasks, including machine translation, question answering, and language generation. One such model, ChatGPT, is trained on vast amounts of text data to generate human-like responses to textual prompts.

This lecture tutorial will provide a comprehensive overview of the journey from the transformer architecture to ChatGPT and beyond. We will explore the training methodologies and architectures used to build these models and how they have paved the way for future research in AI. Two major paradigms of training large language models, fine-tuning and in-context learning, will also be covered. Additionally, we will demonstrate how to use pre-trained large language models for causal inference and other cutting-edge machine learning tasks.

Furthermore, we will discuss the ethical implications of these models and their impact on society. By the end of this tutorial, participants will gain a comprehensive understanding of the evolution of large language models and their significance in shaping the future of AI. Whether you are a researcher or a practitioner in the field, this tutorial will provide valuable insights into the latest developments in natural language processing and the potential applications of large language models.

Syllabus

Introduction
Generative models
Language model
Transformer model
Scaling law of large language models
BERT: Bidirectional Encoder Representations from Transformers
Pre-training and fine-tuning
GPT: Generative Pre-trained Transformer
In-context learning
ChatGPT
Language model powered causal inference: the art of discovery of cause-and-effect relationships from text
Conclusion and future directions

References

Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).

Radford, Alec, et al. “Improving language understanding by generative pre-training.” (2018).

Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.

Kaplan, Jared, et al. “Scaling laws for neural language models.” arXiv preprint arXiv:2001.08361 (2020).

Brown, Tom, et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.

Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, Denny Zhou. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022).

Wang, Yizhong, et al. “Self-Instruct: Aligning Language Model with Self Generated Instructions.” arXiv preprint arXiv:2212.10560 (2022).

Alpaca: A Strong, Replicable Instruction-Following Model.

Cheng, Daixuan, et al. “UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation.” arXiv preprint arXiv:2303.08518 (2023).

Shinn, Noah, Beck Labash, and Ashwin Gopinath. “Reflexion: an autonomous agent with dynamic memory and self-reflection.” arXiv preprint arXiv:2303.11366 (2023).

Wang, Xingqiao, et al. “InferBERT: a transformer-based causal inference framework for enhancing pharmacovigilance.” Frontiers in Artificial Intelligence 4 (2021): 659622.

Lambert, Nathan, et al. “Illustrating Reinforcement Learning from Human Feedback (RLHF)”, Hugging Face Blog, 2022.

Ouyang, Long, et al. “Training language models to follow instructions with human feedback.” arXiv preprint arXiv:2203.02155 (2022).

Wang, Xingqiao, et al. “DeepCausality: A general AI-powered causal inference framework for free text: A case study of LiverTox.” Frontiers in Artificial Intelligence 5 (2022).

Qin, Chengwei, et al. “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?.” arXiv preprint arXiv:2302.06476 (2023).

OpenAI. “How should AI systems behave, and who should decide?”

Pre-requisites

Mathematics and machine learning at the level of an undergraduate degree in computer science: basic multivariate calculus, probability theory, linear algebra, probabilistic graphical models, and neural networks.

Short bio

Xiaowei Xu, a professor of Information Science at the University of Arkansas, Little Rock (UALR), received his Ph.D. degree in Computer Science at the University of Munich in 1998. Before his appointment in UALR, he was a senior research scientist in Siemens, Munich, Germany. His research spans data mining, machine learning and artificial intelligence. Dr. Xu is a recipient of 2014 ACM SIGKDD Test of Time award for his contribution to the density-based clustering algorithm DBSCAN, which is one of the most commonly used clustering algorithms.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.