Mohit Iyyer

University of Massachusetts Amherst

[intermediate/advanced] Natural Language Generation

Summary

Natural language generation has seen increased research and industry interest since the advent of large-scale pretrained neural language models (NLMs) such as GPT-3. In addition to improving the state of the art for tasks such as machine translation and text summarization, these models have opened up research opportunities for open-ended text generation tasks such as story generation and long-form question answering. Furthermore, they have also spurred a new line of research on “prompt-based learning” that aims to unify many disparate NLP tasks (e.g., text classification, generation, and question answering) into a text-to-text format that can be solved by a single backbone model. In this course, we will begin with a short overview of NLM architectures, training datasets, learning objectives, and scaling. Then, we will dive into NLM applications to text generation tasks, followed by an exploration of prompt-based learning.

Syllabus

Introduction to neural language models: training objectives, architectures, datasets, evaluation, scaling
Applications to text generation tasks, looking at both output quality as well as training/inference efficiency: machine translation, long-form question answering
Approaches to prompt-based learning, and their successes / failures across tasks: discrete prompts, learned continuous prompts, prefix/prompt tuning

References

Jurafsky & Martin, Ch. 3.1-3.5 (language modeling)

Jurafsky & Martin, Ch. 7 (neural language models)

Vaswani et al., “Attention is All You Need”, NeurIPS 2017 (paper that introduced Transformers)

Peters et al., “Deep contextualized word representations”, NAACL 2018 (“ELMo”)

Raffel et al., Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, JMLR 2020 (“T5”)

Brown et al., “Language Models are Few-Shot Learners”, NeurIPS 2020 (“GPT-3”)

Xue et al., “ByT5: Towards a token-free future with Pre-Trained Byte-to-Byte Models”, 2021

Celikyilmaz et al., “Evaluation of Text Generation: a Survey”, 2020

Krishna et al., “Hurdles to Progress in Long-form Question Answering”, NAACL 2021

Liu et al., “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing”, 2021

Lester et al., “The Power of Scale for Parameter-Efficient Prompt Tuning”, EMNLP 2021

Pre-requisites

Basic knowledge of machine learning, linear algebra, probability.

Short bio

Mohit Iyyer is an assistant professor in computer science at the University of Massachusetts Amherst. His research focuses broadly on designing machine learning models for discourse-level language generation (e.g., for story generation and machine translation), and his group also works on tasks involving creative language understanding (e.g., modeling fictional narratives and characters). He is the recipient of best paper awards at NAACL (2016, 2018) and a best demo award at NeurIPS 2015. He received his PhD in computer science from the University of Maryland, College Park in 2017, advised by Jordan Boyd-Graber and Hal Daumé III, and spent the following year as a researcher at the Allen Institute for Artificial Intelligence.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.