Vincent Lepetit
[intermediate] Deep Learning and 3D Reasoning for 3D Scene Understanding
Summary
3D scene understanding is a fundamental problem in Computer Vision, where one wants to not only recognise the objects present in a scene from captured images, but also retrieve their 3D properties including their poses and shapes. With the development of deep learning approaches, this field has made a remarkable progress.
In this lecture, we will first review methods for 3D pose prediction, 3D shape estimation, and complete 3D scene inference using Deep Learning. We will also present and discuss self-supervised approaches, more exactly auto-labelling methods for automatically creating 3D annotations, which will probably be one of the main research directions in the future of 3D scene understanding.
Syllabus
- 3D object pose estimation
- 3D hand pose estimation
- 3D scene understanding
- Self-supervised learning
- Monte Carlo Tree Search
References
Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image. Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, Jian Jun Zhang. CVPR 2020.
AtlasNet: A Papier-Mache Approach to Learning 3D Surface Generation. T. Groueix et al. CVPR 2018.
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. Park et al. CVPR 2019.
Monte Carlo Scene Search for 3D Scene Understanding. Shreyas Hampali, Sinisa Stekovic, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, and Vincent Lepetit. CVPR 2021.
HOnnotate: A Method for 3D Annotation of Hand and Object Poses. Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. CVPR 2020.
Pre-requisites
Basic knowledge of Deep Learning applied to computer vision and 3D Geometry.
Short bio
Vincent Lepetit is a director of research at ENPC ParisTech since 2019. Prior to being at ENPC, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology, Austria, and before that, a senior researcher at the Computer Vision Laboratory (CVLab) of EPFL, Switzerland. His research interest are at the interface between Machine Learning and 3D Computer Vision, and currently focus on 3D scene understanding from images. He often serves as an area chair for the major computer vision conferences (CVPR, ICCV, ECCV) and is an associate editor for PAMI, IJCV, and CVIU.