Mohit Iyyer
[intermediate/advanced] Natural Language Generation
Summary
Natural language generation has seen increased research and industry interest since the advent of large-scale pretrained neural language models (NLMs) such as GPT-3. In addition to improving the state of the art for tasks such as machine translation and text summarization, these models have opened up research opportunities for open-ended text generation tasks such as story generation and long-form question answering. Furthermore, they have also spurred a new line of research on “prompt-based learning” that aims to unify many disparate NLP tasks (e.g., text classification, generation, and question answering) into a text-to-text format that can be solved by a single backbone model. In this course, we will begin with a short overview of NLM architectures, training datasets, learning objectives, and scaling. Then, we will dive into NLM applications to text generation tasks, followed by an exploration of prompt-based learning.
Syllabus
- Introduction to neural language models: training objectives, architectures, datasets, evaluation, scaling
- Applications to text generation tasks, looking at both output quality as well as training/inference efficiency: machine translation, long-form question answering
- Approaches to prompt-based learning, and their successes / failures across tasks: discrete prompts, learned continuous prompts, prefix/prompt tuning
References
Jurafsky & Martin, Ch. 3.1-3.5 (language modeling)
Jurafsky & Martin, Ch. 7 (neural language models)
Vaswani et al., “Attention is All You Need”, NeurIPS 2017 (paper that introduced Transformers)
Peters et al., “Deep contextualized word representations”, NAACL 2018 (“ELMo”)
Brown et al., “Language Models are Few-Shot Learners”, NeurIPS 2020 (“GPT-3”)
Xue et al., “ByT5: Towards a token-free future with Pre-Trained Byte-to-Byte Models”, 2021
Celikyilmaz et al., “Evaluation of Text Generation: a Survey”, 2020
Krishna et al., “Hurdles to Progress in Long-form Question Answering”, NAACL 2021
Lester et al., “The Power of Scale for Parameter-Efficient Prompt Tuning”, EMNLP 2021
Pre-requisites
Basic knowledge of machine learning, linear algebra, probability.
Short bio
Mohit Iyyer is an assistant professor in computer science at the University of Massachusetts Amherst. His research focuses broadly on designing machine learning models for discourse-level language generation (e.g., for story generation and machine translation), and his group also works on tasks involving creative language understanding (e.g., modeling fictional narratives and characters). He is the recipient of best paper awards at NAACL (2016, 2018) and a best demo award at NeurIPS 2015. He received his PhD in computer science from the University of Maryland, College Park in 2017, advised by Jordan Boyd-Graber and Hal Daumé III, and spent the following year as a researcher at the Allen Institute for Artificial Intelligence.