Björn Schuller
[introductory/intermediate] Deep Multimedia Processing
Summary
This course will deal with deep learning for unimodal, multimodal, and multisensorial signal analysis and synthesis. Modalities mainly include audio, video, text, or physiological signals. Methods shown will, however, be applicable to a broad range of further signal types. We will first deal with pre-processing for denoising or dereverberation or package loss concealment. This will be followed by representation learning such as by convolutional neural networks or sequence-to-sequence encoder-decoder architectures as basis for end-to-end learning from raw signals or symbolic representation. Then, we shall discuss modelling for decision making such as by recurrent neural networks with long-short-term memory or gated recurrent units including handling dynamics by connectionist temporal classification. This will also include discussion of the usage of attention on different levels. From there, we will move to transformers and different types thereof. We will further elaborate on the impact of topologies including multiple targets with shared layers, and how to move towards self-shaping networks in the sense of Automatic Machine Learning. In a last part, we will deal with some practical questions. These include data efficiency, such as by weak supervision with the human in the loop, data augmentation, e.g., by diffusion models, active and semi-supervised learning, transfer learning, self-learning, or generative adversarial networks. Further, we will have a glance at modelling efficiency such as by squeezing networks. Privacy, trustability, fairness, and explainability enhancing solutions will include federated learning, confidence measurement, and diverse means of sonification and visualisation. The content shown will be accompanied by open-source implementations of according toolkits available on github. Application examples will mainly come from the domains of Computer Audition, Affective Computing, and mHealth.
Syllabus
- Pre-Processing and Representation Learning (Signal Enhancement, Package Loss Concealment, CNNs, S2S, end-to-end).
- Modelling for Decision Making (Attention, Feature Space Optimisation, RNNs, LSTM, GRUs, CTC, Transformers, AutoML).
- Data and Model Efficiency (GANs, Diffusion Models, Transfer Learning, Data Augmentation, Weak Supervision, Cooperative Learning, Self-Learning, Squeezing).
- Privacy, Trustability, Explainability (e.g., Federated Learning, Confidence Measurement, Visualization).
References
The Handbook of Multimodal-Multisensor Interfaces. Vol. 2, S. Oviatt, B. Schuller, P.R. Cohen, D. Sonntag, G. Potamianos, A. Krüger (eds.), 2018.
https://github.com/N-HANS/N-HANS
https://github.com/end2you/end2you
https://github.com/openXBOW/openXBOW
https://github.com/auDeep/auDeep
https://github.com/DeepSpectrum/DeepSpectrum
https://www.audeering.com/opensmile/
Pre-requisites
Basic Machine Learning and Signal Processing knowledge.
Short bio
Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin/Germany, and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. Previous stays include Full Professor at the University of Passau/Germany, and Researcher at Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the ISCA, Fellow of the BCS, Fellow and President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 1300+ publications (50k+ citations, h-index=100+), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community. His 50+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 15+ European projects, is an ERC Starting Grantee, and consultant of companies such as Barclays, GN, Huawei, or Samsung.