Hermann Ney

RWTH Aachen University / AppTek

[intermediate/advanced] Machine Learning and Deep Learning for Speech & Language Technology: A Probabilistic Perspective

Summary

Today data-driven methods like machine learning and artificial neural networks (ANN) are widely used for speech and language processing, e.g. for automatic speech recognition (ASR) and machine translation. We will re-visit the evolution of these methods over the last 50 years and will present a unifying view of their principles from a probabilistic perspective.

Specifically, we will address the following aspects of probabilistic modelling:

– What is the probabilistic interpretation of ANN outputs?
– What is the relation between the task performance (e.g. word error rate in ASR) and the decision rule for generating the output sequence (e.g. Bayes decision rule)?
– What are the relations between training criteria (like cross-entropy) and task performance?
– How do we model the dependencies between input and output sequences in sequence-to-sequence processing?
– What are synchronization mechanisms between input and output sequences (e.g. hidden Markov models, finite-state transducers, cross-attention)?
– What role does the language model play in the context of end-to-end models?

Syllabus

Part 1: Probabilistic foundations, Bayes decision theory, probabilistic interpretation of neural networks, training criteria.
Part 2: Sequence processing and specific ANN structures (hidden Markov models, finite-state transducers, cross-attention).
Part 3: Deep Learning and HLT tasks (speech recognition, language modelling, machine translation).

References

H. Bourlard, N. Morgan: Connectionist Speech Recognition – A Hybrid Approach, Kluwer Academic Publishers, 1994.

D. Yu, L. Deng: Automatic Speech Recognition: A Deep Learning Approach. Springer, 2014.

D. Jurafsky, J. H. Martin: Speech and Language Processing. 3rd edition draft, 2017, https://web.stanford.edu/~jurafsky/slp3/

Y. Goldberg: Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers, 2016.

K.P. Murphy: Probabilistic Machine Learning: An Introduction. MIT Press, 2022.

K.P. Murphy: Probabilistic Machine Learning: Advanced Topics. MIT Press, 2023.

Pre-requisites

Linear algebra, numerical mathematics, probability and statistics, elementary machine learning.

Short bio

Hermann Ney is director of science at AppTek, McLean, VA and senior professor of computer science at RWTH Aachen University, Germany. His main research interests lie in the area of machine learning, neural networks and applications to speech recognition, machine translation and other tasks in natural language processing.

He and his team contributed to a large number of European (e.g. TC-STAR, QUAERO, TRANSLECTURES, EU-BRIDGE) and American (e.g. GALE, BOLT, BABEL) large-scale joint projects. His work has resulted in more than 700 conference and journal papers with an h index of 113 and 64,000 citations (based on Google Scholar). More than 50 of his former PhD students work for IT companies like Amazon, Apple, Cerence, Ebay, Google and Nuance.

The results of his research contributed to various operational research prototypes and commercial systems. In 1993, Philips Dictation Systems Vienna introduced a large-vocabulary continuous-speech recognition product for medical applications. In 1997, Philips Dialogue Systems Aachen introduced a spoken dialogue system for traintable information via the telephone. In the German project VERBMOBIL, his team introduced the phrase-based approach to data-driven machine translation, which in 2008 was used by his former PhD students at Google as starting point for the service Google Translate. In the EU project TC-STAR, the first research prototype system for spoken language translation of real-life domains was built.

Awards: 2005 Technical Achievement Award of the IEEE Signal Processing Society; 2013 Award of Honour of the International Association for Machine Translation; 2019 IEEE James L. Flanagan Speech and Audio Processing Award; 2021 ISCA Medal for Scientific Achievements (ISCA = Int. Speech Communication Ass.).

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.