Yao Wang

New York University

[introductory/intermediate] Deep Learning for Computer Vision

Summary

This course is targeted for an audience who are relative beginners in using deep learning to solve computer vision problems. We will start with basics of supervised learning, and then focus of convolutional networks for a variety of computer vision applications, and will end with self-supervised learning for overcoming the challenge with limited annotated data. We will introduce various fundamental concepts in convolutional networks along the way.

Syllabus

Supervised learning basics: Neural Net classifier, training a neural net through minimizing a loss function, gradient descent through back-propagation, stochastic gradient descent, data preprocessing and regularization, training/validation/testing pipelines.
Convolutional networks for image recognition: Why using 2D convolutions and many layers, multichannel 2D convolution, spatial dimension reduction through pooling, evolution of network structures (VGG, ResNet, DenseNet, Attention, Nonlocal networks, vision Transformer). Data augmentation and transfer learning to handle limited data.
Convolutional networks for video and medical volumetric data: using 3D convolution layers.
Interpretation of trained networks: gradient-based, class activation map (CAM).
Fully convolutional networks for image to image mapping: auto-encoder, multi-resolution auto-encoder (U-Net, V-Net). Applications in image denoising, segmentation, super resolution.
Convolutional networks for object detection (Faster R-CNN, Yolo), instance segmentation (mask R-CNN), and object tracking.
Other computer vision tasks: body pose estimation (generating body skeleton), depth estimation from binocular and monocular images, motion estimation, video prediction and interpolation.
Video processing through recurrent convolutional networks: convolutional LSTM, applications for action recognition, object tracking, video prediction.
Overcoming limited data through self-supervision: contrastive energy based, non-contrastive energy based, masked auto-encoders, multi-modality supervisions (image text, and audio).

References

Pre-requisites

Enrolled students should have basic knowledge in linear algebra, statistics and probability. Prior exposure to classical image processing and computer vision will be a plus but not required.

Short bio

Yao Wang is a Professor at New York University Tandon School of Engineering (formerly Polytechnic University, Brooklyn, NY), with joint appointment in Departments of Electrical and Computer Engineering and Biomedical Engineering. She is also Associate Dean for Faculty Affairs for NYU Tandon since June 2019. Her research areas include video coding and streaming, multimedia signal processing, computer vision, and medical imaging. She is the leading author of a textbook titled Video Processing and Communications, and has published over 250 papers in journals and conference proceedings. She received New York City Mayor’s Award for Excellence in Science and Technology in the Young Investigator Category in year 2000. She was elected Fellow of the IEEE in 2004 for contributions to video processing and communications. She received the IEEE Communications Society Leonard G. Abraham Prize Paper Award in the Field of Communications Systems in 2004, and the IEEE Communications Society Multimedia Communication Technical Committee Best Paper Award in 2011. She was a keynote speaker at the 2010 International Packet Video Workshop, INFOCOM Workshop on Contemporary Video in 2014, the 2018 Picture Coding Symposium, and the 2020 ACM Multimedia Systems Conference (MMSys’20). She received the NYU Tandon Distinguished Teacher Award in 2016.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.