DeepLearn 2022 Summer
6th International Gran Canaria School
on Deep Learning
Las Palmas de Gran Canaria, Spain · July 25-29, 2022
Registration
Downloads
  • Call DeepLearn 2022 Summer
  • Poster DeepLearn 2022 Summer
  • Lecture Materials
  • Home
  • Schedule
  • Lecturers
  • News
  • Accommodation
  • Info
    • Sponsoring
    • Code of conduct
    • Visa
  • Home
  • Schedule
  • Lecturers
  • News
  • Accommodation
  • Info
    • Sponsoring
    • Code of conduct
    • Visa
deeplearn-louis-philippe-morency

Louis-Philippe Morency

Carnegie Mellon University

[intermediate/advanced] Multimodal Machine Learning

Summary

Multimodal machine learning is a vibrant multi-disciplinary research field that addresses some of the original goals of AI by integrating and modeling multiple communicative modalities, including linguistic, acoustic, and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, visual question answering, and language-guided reinforcement learning, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities. This course will teach fundamental mathematical concepts related to Multimodal Machine Learning including multimodal representations, alignment, fusion, reasoning and quantification. We will also review recent papers describing state-of-the-art multimodal models and computational algorithms addressing these technical challenges.

Syllabus

Introduction

  • What is Multimodal? Historical view, multimodal, and multimedia.
  • Multimodal applications and datasets: image captioning, video description, AVSR, affect recognition, multimodal RL.
  • Core technical challenges: representation, alignment, reasoning, generation, co-learning, and quantification.

Unimodal representations

  • Language representations: Distributional hypothesis and text embeddings.
  • Visual representations: Convolutional networks, self-attention models.
  • Acoustic representations: Spectrograms, auto-encoders.

Multimodal representations

  • Representation fusion: visuo-linguistic spaces, multimodal auto-encoder, fusion strategies.
  • Representation coordination: similarity metrics, canonical correlation analysis, multimodal transformers.
  • Representation fission: factorization, component analysis, disentanglement.

Modality alignment

  • Latent alignment approaches: Attention models, multimodal transformers, multi-instance learning.
  • Explicit alignment: Dynamic time warping.

Multimodal reasoning

  • Hierarchical and graphical representations.
  • Leveraging external data: external knowledge bases, commonsense reasoning.

Multimodal co-learning & generation

  • Modality transfer: Cross-modal domain adaptation, few-shot learning.
  • Compression (multimodal summarization), transduction (multimodal style transfer), and creation (multimodal conditional generation).

Multimodal quantification

  • Dataset biases: social biases, spurious correlations.
  • Model biases: modality collapse, robustness, optimization challenges, interpretability.

Future directions and conclusion

References

15-week course on Multimodal Machine Learning, including all video lectures:
https://cmu-multicomp-lab.github.io/mmml-course/fall2020/

Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423-443.
https://arxiv.org/abs/1705.09406

Reading list, and lecture slides of Fall 2021 edition of the CMU Multimodal Machine Learning course:
https://piazza.com/cmu/fall2021/11777/resources

Pre-requisites

We expect the audience to have an introductory background in machine learning and deep learning, including a basic familiarity of commonly-used unimodal building blocks such as convolutional, recurrent, and self-attention models. We also expect an understanding of math, CS, and programming at an introductory graduate level.

Short bio

Louis-Philippe Morency is Associate Professor in the Language Technology Institute at Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He was formerly research faculty in the Computer Sciences Department at University of Southern California and received his Ph.D. degree from MIT Computer Science and Artificial Intelligence Laboratory. His research focuses on building the computational foundations to enable computers with the abilities to analyze, recognize and predict subtle human communicative behaviors during social interactions. He received diverse awards including AI’s 10 to Watch by IEEE Intelligent Systems, NetExplo Award in partnership with UNESCO and 10 best paper awards at IEEE and ACM conferences. His research was covered by media outlets such as Wall Street Journal, The Economist and NPR. He is currently chair of the advisory committee for ACM International Conference on Multimodal Interaction and associate editor at IEEE Transactions on Affective Computing.

Other Courses

Wahid BhimjiWahid Bhimji
zyro-imageJoachim M. Buhmann
deeplearn-kate-saenkoKate Saenko
Arindam BanerjeeArindam Banerjee
deeplearn-pierre-baldiPierre Baldi
Mikhail BelkinMikhail Belkin
deeplearn-arthur-grettonArthur Gretton
deeplearn-philip-isolaPhillip Isola
Mohit IyyerMohit Iyyer
Irwin King 2Irwin King
Tor LattimoreTor Lattimore
Vincent LepetitVincent Lepetit
Dimitris N. MetaxasDimitris N. Metaxas
Sean MeynSean Meyn
Wojciech SamekWojciech Samek
Clara I. SánchezClarisa Sánchez
Björn W. SchullerBjörn W. Schuller
Jonathon ShlensJonathon Shlens
deeplearn-johan-suykensJohan Suykens
deeplearn-murat-tekalpA. Murat Tekalp
deeplearn-tkatchenkoAlexandre Tkatchenko
Li XiongLi Xiong
deeplearn-ming-yuanMing Yuan

DeepLearn 2022 Spring

CO-ORGANIZERS

Universidad de Las Palmas de Gran Canaria

Universitat Rovira i Virgili

Institute for Research Development, Training and Advice – IRDTA, Brussels/London

Active links
  • DeepLearn 2023 Winter– 8th International School on Deep Learning
  • DeepLearn 2022 Autumn – 7th International School on Deep Learning
Past links
  • DeepLearn 2022 Spring
  • DeepLearn 2021 Summer
  • DeepLearn 2019
  • DeepLearn 2018
  • DeepLearn 2017
© IRDTA 2021. All Rights Reserved.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSIDsessionThis cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_91 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo