Björn Schuller

Imperial College London

[introductory/intermediate] Deep Multimedia Processing

Summary

This course will deal with deep learning for unimodal, multimodal, and multisensorial signal analysis and synthesis. Modalities mainly include audio, video, text, or physiological signals. Methods shown will, however, be applicable to a broad range of further signal types. We will first deal with pre-processing for denoising or dereverberation or package loss concealment. This will be followed by representation learning such as by convolutional neural networks or sequence-to-sequence encoder-decoder architectures as basis for end-to-end learning from raw signals or symbolic representation. Then, we shall discuss modelling for decision making such as by recurrent neural networks with long-short-term memory or gated recurrent units including handling dynamics by connectionist temporal classification. This will also include discussion of the usage of attention on different levels. From there, we will move to transformers and different types thereof. We will further elaborate on the impact of topologies including multiple targets with shared layers, and how to move towards self-shaping networks in the sense of Automatic Machine Learning. In a last part, we will deal with some practical questions. These include data efficiency, such as by weak supervision with the human in the loop, data augmentation, e.g., by diffusion models, active and semi-supervised learning, transfer learning, self-learning, or generative adversarial networks. Further, we will have a glance at modelling efficiency such as by squeezing networks. Privacy, trustability, fairness, and explainability enhancing solutions will include federated learning, confidence measurement, and diverse means of sonification and visualisation. The content shown will be accompanied by open-source implementations of according toolkits available on github. Application examples will mainly come from the domains of Computer Audition, Affective Computing, and mHealth.

Syllabus

Pre-Processing and Representation Learning (Signal Enhancement, Package Loss Concealment, CNNs, S2S, end-to-end).
Modelling for Decision Making (Attention, Feature Space Optimisation, RNNs, LSTM, GRUs, CTC, Transformers, AutoML).
Data and Model Efficiency (GANs, Diffusion Models, Transfer Learning, Data Augmentation, Weak Supervision, Cooperative Learning, Self-Learning, Squeezing).
Privacy, Trustability, Explainability (e.g., Federated Learning, Confidence Measurement, Visualization).

References

The Handbook of Multimodal-Multisensor Interfaces. Vol. 2, S. Oviatt, B. Schuller, P.R. Cohen, D. Sonntag, G. Potamianos, A. Krüger (eds.), 2018.

https://github.com/N-HANS/N-HANS

https://github.com/end2you/end2you

https://github.com/openXBOW/openXBOW

https://github.com/auDeep/auDeep

https://github.com/DeepSpectrum/DeepSpectrum

https://www.audeering.com/opensmile/

https://www.ihearu-play.eu/

Pre-requisites

Basic Machine Learning and Signal Processing knowledge.

Short bio

Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin/Germany, and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. Previous stays include Full Professor at the University of Passau/Germany, and Researcher at Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the ISCA, Fellow of the BCS, Fellow and President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 1300+ publications (50k+ citations, h-index=100+), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community. His 50+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 15+ European projects, is an ERC Starting Grantee, and consultant of companies such as Barclays, GN, Huawei, or Samsung.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.