Hermann Ney
[intermediate/advanced] Machine Learning and Deep Learning for Speech & Language Technology: A Probabilistic Perspective
Summary
Today data-driven methods like machine learning and artificial neural networks (ANN) are widely used for speech and language processing, e.g. for automatic speech recognition (ASR) and machine translation. We will re-visit the evolution of these methods over the last 50 years and will present a unifying view of their principles from a probabilistic perspective.
Specifically, we will address the following aspects of probabilistic modelling:
– What is the probabilistic interpretation of ANN outputs?
– What is the relation between the task performance (e.g. word error rate in ASR) and the decision rule for generating the output sequence (e.g. Bayes decision rule)?
– What are the relations between training criteria (like cross-entropy) and task performance?
– How do we model the dependencies between input and output sequences in sequence-to-sequence processing?
– What are synchronization mechanisms between input and output sequences (e.g. hidden Markov models, finite-state transducers, cross-attention)?
– What role does the language model play in the context of end-to-end models?
Syllabus
- Part 1: Probabilistic foundations, Bayes decision theory, probabilistic interpretation of neural networks, training criteria.
- Part 2: Sequence processing and specific ANN structures (hidden Markov models, finite-state transducers, cross-attention).
- Part 3: Deep Learning and HLT tasks (speech recognition, language modelling, machine translation).
References
H. Bourlard, N. Morgan: Connectionist Speech Recognition – A Hybrid Approach, Kluwer Academic Publishers, 1994.
D. Yu, L. Deng: Automatic Speech Recognition: A Deep Learning Approach. Springer, 2014.
D. Jurafsky, J. H. Martin: Speech and Language Processing. 3rd edition draft, 2017, https://web.stanford.edu/~jurafsky/slp3/
Y. Goldberg: Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers, 2016.
K.P. Murphy: Probabilistic Machine Learning: An Introduction. MIT Press, 2022.
K.P. Murphy: Probabilistic Machine Learning: Advanced Topics. MIT Press, 2023.
Pre-requisites
Linear algebra, numerical mathematics, probability and statistics, elementary machine learning.
Short bio
Hermann Ney is director of science at AppTek, McLean, VA and senior professor of computer science at RWTH Aachen University, Germany. His main research interests lie in the area of machine learning, neural networks and applications to speech recognition, machine translation and other tasks in natural language processing.
He and his team contributed to a large number of European (e.g. TC-STAR, QUAERO, TRANSLECTURES, EU-BRIDGE) and American (e.g. GALE, BOLT, BABEL) large-scale joint projects. His work has resulted in more than 700 conference and journal papers with an h index of 113 and 64,000 citations (based on Google Scholar). More than 50 of his former PhD students work for IT companies like Amazon, Apple, Cerence, Ebay, Google and Nuance.
The results of his research contributed to various operational research prototypes and commercial systems. In 1993, Philips Dictation Systems Vienna introduced a large-vocabulary continuous-speech recognition product for medical applications. In 1997, Philips Dialogue Systems Aachen introduced a spoken dialogue system for traintable information via the telephone. In the German project VERBMOBIL, his team introduced the phrase-based approach to data-driven machine translation, which in 2008 was used by his former PhD students at Google as starting point for the service Google Translate. In the EU project TC-STAR, the first research prototype system for spoken language translation of real-life domains was built.
Awards: 2005 Technical Achievement Award of the IEEE Signal Processing Society; 2013 Award of Honour of the International Association for Machine Translation; 2019 IEEE James L. Flanagan Speech and Audio Processing Award; 2021 ISCA Medal for Scientific Achievements (ISCA = Int. Speech Communication Ass.).