Tin Kam Ho

IBM Thomas J. Watson Research Center

[introductory/intermediate] Deep Learning Applications in Natural Language Understanding

Summary

Deep learning methods have brought about many new and highly effective methods for natural language processing and understanding. In this tutorial, we start with a brief review of the conventional methods for processing natural languages, and continue with an introduction of the use of deep learning and a discussion of the differences. We then describe several common tasks in natural language understanding that are formulated to leverage the deep learning framework (information extraction, summarization, search, sentiment analysis, text classification and clustering, question answering, translation, text generation, …). We finish with a high level introduction of some advanced topics, including new opportunities with multiple languages, non-natural languages, foundation models, and the open challenges.

Syllabus

Introduction to natural language processing (NLP), common application tasks, and solutions with conventional methods (0.5 hour).
Basic methods for learning representations of text: dense vector embedding of text; their construction and applications (1 hour).
Advanced methods of deep learning for text analysis: transfer learning, transformers, pre-trained models, fine-tuning, and their uses (1 hour).
Combined solutions and practical concerns in real-world applications, case studies (1 hour).
NLP methods for multiple languages and non-natural languages (0.25 hour).
New directions and open challenges (0.25 hour).

References

Conventional NLP methods:

Steven Bird, Ewan Klein, and Edward Loper.
Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit
https://www.nltk.org/book/

Deep learning methods for NLP:

Stanford University, CS224n: Natural Language Processing with Deep Learning
Publicly available lecture videos:

Other related course material:
http://web.stanford.edu/class/cs224n/

Hands-on code examples:

https://github.com/graykode/nlp-tutorial

NLP progress:

Repository to track the progress in Natural Language Processing, including the datasets and the current state-of-the-art for the most common NLP tasks.
https://nlpprogress.com/

The NLP Index:

A website tracking many tasks, datasets, papers, and GitHub repos.
https://index.quantumstat.com/

The ACL Anthology:

Large archive of papers on the study of computational linguistics and natural language processing.
https://aclanthology.org/

Pre-requisites

Basic knowledge of machine learning and neural networks, knowledge of programming with Python.

Short bio

Tin Kam Ho is a senior AI scientist with rich experience in basic and applied research in pattern recognition and machine learning. She joined IBM Watson in 2014, where she has led projects in semantic modeling of natural languages, knowledge discovery, text summarization, question answering, and conversational systems (chatbots). From 1992 to 2014, she was with Bell Labs at Murray Hill, first as a research scientist and later as the Head of Statistics and Learning Research Department. In the core machine learning methodology, she pioneered research in multiple classifier systems and ensemble learning, random decision forests, and data complexity analysis. She received a PhD in Computer Science from State University of New York at Buffalo in 1992.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.