Othmane Rifki

Spectrum Labs

[introductory/advanced] Speech and Language Processing in Modern Applications

Summary

Language is the fundamental building block of human communication. We live in an age where communication is dominated by digital representation of language. In this course, I will explore the fundamentals of Natural Language Processing (NLP) that aims at not only understanding single words or utterances, but be able to understand them in context and even generate them. This basic technology is fundamental to the language functions we use everyday: web search, social networks, translations, speech-to-text, and many more other functions. We will learn how to analyze massive quantities of unstructured text and speech data, how to build models that uncover contextual patterns, and produce insights from text and audio. We will focus on state-of-the-art Transformer-based models such as BERT for text and wav2vec for speech. We will explore how modern data-intensive applications serve and support these models at scale while maintaining reliability and reducing latency. By the end of this course, you will be ready to design, build, and serve an NLP application and interact with it via the web.

Syllabus

The lectures will discuss various aspects of building a NLP application. We will cover how to build text embeddings, how Transformer architecture and self-attention work, how to leverage self-supervision and pre-trained models to achieve superior NLP results, and finally how to manage inference challenges and deploy your models for live applications.

References

Speech and Language Processing, by Dan Jurafsky and James H. Martin: https://web.stanford.edu/~jurafsky/slp3/

Transformers for Natural Language Processing, by Denis Rothman.

Natural Language Processing with Transformers, by Lewis Tunstall, Leandro von Werra, & Thomas Wolf.

Designing Data-Intensive Applications, by Martin Kleppmann.

Pre-requisites

Experience with Python coding and use of libraries, functions and parameters. Basic understanding of a deep learning framework such as Tensorflow, PyTorch, or Keras. Basic knowledge of machine learning.

Short bio

Othmane Rifki received his PhD in 2017 from the Department of Physics at the University of Oklahoma. His research was focused on how to search for rare signals of new particles and new forces at the Large Hadron Collider at CERN in Geneva, Switzerland (cern.ch). During this work, he relied on modern machine learning techniques to find signal among huge amounts of background noise. For the last two years, Othmane was the principal applied scientist at Spectrum Labs (spectrumlabsai.com), a startup that aims at understanding language in the internet using contextual AI. His main focus was in building NLP and speech applications that are served at scale to detect disruptive and toxic behaviors in real-time.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.