Tin Kam Ho
[introductory/intermediate] Deep Learning Applications in Natural Language Understanding
Summary
Deep learning methods have brought about many new and highly effective methods for natural language processing and understanding. In this tutorial, we start with a brief review of the conventional methods for processing natural languages, and continue with an introduction of the use of deep learning and a discussion of the differences. We then describe several common tasks in natural language understanding that are formulated to leverage the deep learning framework (information extraction, summarization, search, sentiment analysis, text classification and clustering, question answering, translation, text generation, …). We finish with a high level introduction of some advanced topics, including new opportunities with multiple languages, non-natural languages, foundation models, and the open challenges.
Syllabus
- Introduction to natural language processing (NLP), common application tasks, and solutions with conventional methods (0.5 hour).
- Basic methods for learning representations of text: dense vector embedding of text; their construction and applications (1 hour).
- Advanced methods of deep learning for text analysis: transfer learning, transformers, pre-trained models, fine-tuning, and their uses (1 hour).
- Combined solutions and practical concerns in real-world applications, case studies (1 hour).
- NLP methods for multiple languages and non-natural languages (0.25 hour).
- New directions and open challenges (0.25 hour).
References
Conventional NLP methods:
Steven Bird, Ewan Klein, and Edward Loper.
Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit
https://www.nltk.org/book/
Deep learning methods for NLP:
Stanford University, CS224n: Natural Language Processing with Deep Learning
Publicly available lecture videos:
Other related course material:
http://web.stanford.edu/class/cs224n/
Hands-on code examples:
https://github.com/graykode/nlp-tutorial
NLP progress:
Repository to track the progress in Natural Language Processing, including the datasets and the current state-of-the-art for the most common NLP tasks.
https://nlpprogress.com/
The NLP Index:
A website tracking many tasks, datasets, papers, and GitHub repos.
https://index.quantumstat.com/
The ACL Anthology:
Large archive of papers on the study of computational linguistics and natural language processing.
https://aclanthology.org/
Pre-requisites
Basic knowledge of machine learning and neural networks, knowledge of programming with Python.
Short bio
Tin Kam Ho is a senior AI scientist with rich experience in basic and applied research in pattern recognition and machine learning. She joined IBM Watson in 2014, where she has led projects in semantic modeling of natural languages, knowledge discovery, text summarization, question answering, and conversational systems (chatbots). From 1992 to 2014, she was with Bell Labs at Murray Hill, first as a research scientist and later as the Head of Statistics and Learning Research Department. In the core machine learning methodology, she pioneered research in multiple classifier systems and ensemble learning, random decision forests, and data complexity analysis. She received a PhD in Computer Science from State University of New York at Buffalo in 1992.