Karen Livescu
[intermediate/advanced] Speech Processing: Automatic Speech Recognition and beyond
Summary
Spoken language interfaces such as smart speakers and voice dictation systems have become commonplace. This course will give a tour of the several decades of progress that have made this possible, starting from the core task of automatic speech recognition but also including additional tasks involved in enabling computers to use speech in all of the ways that humans do. The course will describe in detail some of the most successful approaches, including both established methods and more recent advances such as deep representation learning.
Syllabus
- Historical overview of automatic speech recognition (ASR): signal processing, hidden Markov models, and deep learning
- Deep dive: Models and learning for ASR
- Beyond ASR: Speech retrieval, synthesis, translation, spoken language understanding, and more
- Recent advances: Representation learning for speech
References
- D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd edition, Dec. 30 2020 draft, Chapter 26, https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf.
- H. Bourlard and N. Morgan, Connectionist Speech Recognition – A Hybrid Approach, Kluwer Academic Publishers, 1994.
- G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Processing Magazine, November 2012.
- W. Chan et al., “Listen, attend, and spell,” arXiv:1508.01211.
- A. Hannun, “Sequence modeling with CTC,” https://distill.pub/2017/ctc/.
- A. Baevski et al., “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” NeurIPS 2020.
Pre-requisites
Familiarity with linear algebra, probability, and basic machine learning.
Short bio
Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD in electrical engineering and computer science at MIT. Her main research interests are in speech and language processing, as well as related problems in machine learning. Her recent work includes unsupervised and multi-view representation learning, acoustic word embeddings, visually grounded speech and language models, and automatic sign language recognition. She is a 2021 IEEE SPS Distinguished Lecturer and an ISCA Fellow. Other recent professional activities include serving as a program chair of ICLR 2019 and ASRU 2015/2017/2019, and as an Associate Editor for IEEE T-PAMI and IEEE OJ-SP.