Thomas Breuel
[intermediate/advanced] Large Scale Deep Learning and Self-Supervision in Vision and NLP
Summary
Labeled training data has been the basis for many successful applications of deep learning, but such data is limited or unavailable in many applications. Furthermore, learning in natural systems also requires the learner to build models from unlabeled training data and with minimal prior domain knowledge. In this lecture, we first examine the statistical foundations of unsupervised learning and identify techniques and principles of how these foundations are implemented in deep learning systems. In the second part of the lecture, we look at successful deep learning approaches and techniques for self-supervised learning in vision and NLP applications.
Syllabus
- Concepts and tasks: self-supervised learning, weakly supervised learning, active learning, zero shot learning, one shot learning.
- Statistical theory and approaches to self-supervised learning (priors, clustering, latent variables, metric learning, subspaces, cross-domain learning, EM training).
- Information theoretic analysis of self-supervised learning (information sources, MDL, compression).
- Deep learning techniques: representation learning, pseudolabels, masking, prediction, contrastive learning, generative models, transformations, latent variables.
- Sample applications: Siamese networks, BERT, DINO, GroupVIT.
- Practical considerations and scaling.
References
Y. LeCun and I. Misra: “Self-supervised learning: The dark matter of intelligence” https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P. and Joulin, A., 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9650-9660).
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D. and Makedon, F., 2020. A survey on contrastive self-supervised learning. Technologies, 9(1), p.2.
Pre-requisites
An understanding of common deep learning models and supervised training. Basic familiarity with statistical models for pattern recognition.
Short bio
Thomas Breuel works on deep learning and computer vision at NVIDIA Research. Prior to NVIDIA, he was a full professor of computer science at the University of Kaiserslautern (Germany) and worked as a researcher at Google, Xerox PARC, the IBM Almaden Research Center, IDIAP Switzerland, as well as a consultant to the US Bureau of the Census. He is an alumnus of Massachusetts Institute of Technology and Harvard University.