Karen Livescu

Toyota Technological Institute at Chicago

[intermediate/advanced] Speech Processing: Automatic Speech Recognition and beyond

Summary

Spoken language interfaces such as smart speakers and voice dictation systems have become commonplace. This course will give a tour of the several decades of progress that have made this possible, starting from the core task of automatic speech recognition but also including additional tasks involved in enabling computers to use speech in all of the ways that humans do. The course will describe in detail some of the most successful approaches, including both established methods and more recent advances such as deep representation learning.

Syllabus

Historical overview of automatic speech recognition (ASR): signal processing, hidden Markov models, and deep learning
Deep dive: Models and learning for ASR
Beyond ASR: Speech retrieval, synthesis, translation, spoken language understanding, and more
Recent advances: Representation learning for speech

References

D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd edition, Dec. 30 2020 draft, Chapter 26, https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf.
H. Bourlard and N. Morgan, Connectionist Speech Recognition – A Hybrid Approach, Kluwer Academic Publishers, 1994.
G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Processing Magazine, November 2012.
W. Chan et al., “Listen, attend, and spell,” arXiv:1508.01211.
A. Hannun, “Sequence modeling with CTC,” https://distill.pub/2017/ctc/.
A. Baevski et al., “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” NeurIPS 2020.

Pre-requisites

Familiarity with linear algebra, probability, and basic machine learning.

Short bio

Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD in electrical engineering and computer science at MIT. Her main research interests are in speech and language processing, as well as related problems in machine learning. Her recent work includes unsupervised and multi-view representation learning, acoustic word embeddings, visually grounded speech and language models, and automatic sign language recognition. She is a 2021 IEEE SPS Distinguished Lecturer and an ISCA Fellow. Other recent professional activities include serving as a program chair of ICLR 2019 and ASRU 2015/2017/2019, and as an Associate Editor for IEEE T-PAMI and IEEE OJ-SP.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.