Arthur Gretton

University College London

[intermediate/advanced] Probability Divergences and Generative Models

Summary

Probability divergences are at the heart of much modern machine learning, from training generative adversarial networks, to obtaining disentangled representations of complex scenes, to self-supervised learning. We will introduce two major classes of probability divergences: the integral probability metrics, and phi (or f-) divergences. We then go on to apply these divergences in machine learning settings. Our first application will be in two-sample testing, where we determine whether two samples are from the same distribution: this is a helpful diagnostic when evaluating whether the dataset used to train a model is from the same distribution as the one on which it is being deployed. Our second application will be in training generative adversarial networks, where the divergence serves as a critic function. We will go on to explore some advanced applications of divergences: measuring and testing statistical dependence, and evaluating goodness-of-fit for probabilistic models.

Syllabus

Introduction to probability divergences: phi (f-) divergences and integral probability metrics
A deep dive into integral probability metrics: varieties of IPM, with emphasis on the maximum mean discrepancy (MMD)
MMD for two-sample testing, using learned neural net features: application to testing CIFAR10 vs CIFAR10.1
Probability divergences as critic functions in a generative adversarial network. Generalised energy-based models
Advanced topics: measuring and testing statistical dependence, evaluating goodness-of-fit for probabilistic models

References

Maximum mean discrepancy and two-sample testing:

https://jmlr.csail.mit.edu/papers/v13/gretton12a.html
https://arxiv.org/abs/2002.09116

GANs and generalized energy-based models:

https://arxiv.org/abs/1606.00709
https://arxiv.org/abs/2003.05033

Evaluating statistical dependence:

https://papers.nips.cc/paper/2007/hash/d5cfead94f5350c12c322b5b664544c1-Abstract.html
https://arxiv.org/abs/2106.08320

Evaluating model goodness-of-fit:

https://arxiv.org/abs/1602.02964

Pre-requisites

Linear algebra and statistics, ideally at an advanced undergraduate level or better.

Short bio

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit, and director of the Centre for Computational Statistics and Machine Learning (CSML) at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University. Arthur’s recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (exponential family and energy-based models), causal modeling, and nonparametric hypothesis testing. Arthur was a program chair for AISTATS in 2016, a tutorials chair for ICML 2018, a workshops chair for ICML 2019, a program chair for the Dali workshop in 2019, and an organiser of the Machine Learning Summer School 2019 in London.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.