EECS 542: Winter 2022

Schedule

Lecture	Date	Topic	Materials
Lec. 1	Wed, Jan. 5	Introduction
Lec. 2	Mon, Jan. 10	Optical flow	Jonschkowski et al., What Matters in Unsupervised Optical Flow Sun et al., AutoFlow: Learning a Better Training Set for Optical Flow Optional: Stone et al., SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping (this was originally listed as a "main" paper, so you may review it if you'd like)
Lec. 3	Wed, Jan 12	Grouping	Araslanov et al., Dense Unsupervised Learning for Video Segmentation Jabri et al., Space-Time Correspondence as a Contrastive Random Walk
	Mon, Jan 17	No class
Lec. 4	Wed, Jan. 19	Transformers	Dosovitskiy et al., An image is worth 16x16 words: Transformers for image recognition at scale Liu et al., Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Optional: Vaswani et al, Attention Is All You Need
Lec. 5	Mon, Jan 24	More transformers	Jaegle et al., Perceiver IO: A General Architecture for Structured Inputs & Outputs Tolstikhin et al., MLP-Mixer: An all-MLP Architecture for Vision
Lec. 6	Wed, Jan. 26	CNNs	Brock et al., High-Performance Large-Scale Image Recognition Without Normalization Designing Network Design Spaces Optional: Goyal et al. Non-Deep Networks Optional: Bello et al., Revisiting ResNets: Improved Training and Scaling Strategies
Lec. 7	Mon, Jan. 31	Video architectures	Fan et al., Multiscale Vision Transformers Feichtenhofer et al., SlowFast Networks for Video Recognition
Lec. 8	Wed, Feb 2	Object detection models	Carion et al., End-to-End Object Detection with Transformers Kirillov et al., PointRend: Image Segmentation as Rendering Optional: Zhou et al., Objects as Points
Lec. 9	Mon, Feb. 7	Long-tailed object detection	Gupta et al., LVIS: A Dataset for Large Vocabulary Instance Segmentation Dave et al., Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details
Lec. 10	Wed, Feb. 9	Tracking	Liu et al., Opening up Open-World Tracking Zhou et al., Tracking Objects as Points Optional: Do Different Tracking Tasks Require Different Appearance Models?
Lec. 11	Mon, Feb 14	Neural radiance fields	Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Sitzmann et al., Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering Optional: Xie et al., Neural Fields in Visual Computing and Beyond
Lec. 12	Wed, Feb. 16	Neural fields	Tancik et al., Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains Yu et al., Plenoxels: Radiance Fields without Neural Networks
Lec. 13	Mon, Feb. 21	Viewpoint prediction	Rockwell et al., PixelSynth: Generating a 3D-Consistent Experience from a Single Image Rombach et al., Geometry-Free View Synthesis: Transformers and no 3D Priors
Lec. 14	Wed, Feb 23	Pose estimation	Teed and Deng, DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras Schonberger and Fram, Structure-from-Motion Revisited
	Mon, Feb 28	No class
	Wed, Mar 2	No class
	Mon, Mar 7	No class
Lec. 15	Wed, Mar. 9	Self-supervision without augmentation	He et al., Masked Autoencoders Are Scalable Vision Learners Chen et al., Generative pretraining from pixels
Lec. 16	Mon, Mar. 14	Sound	Afouras et al., Self-supervised object detection from audio-visual correspondence Chen et al., SoundSpaces: Audio-Visual Navigation in 3D Environments Nagrani et al., Attention bottlenecks for multimodal fusion (optional) Lambeta et al., DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation (optional)
Lec. 17	Wed, Mar 16	Language	Radford et al., Learning Transferable Visual Models From Natural Language Supervision Nichol et al., GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Ramesh et al., Zero-Shot Text-to-Image Generation (optional)
Lec. 18	Mon, Mar. 21	Image generation	Karras et al., Alias-Free Generative Adversarial Networks Razavi et al., Generating Diverse High-Fidelity Images with VQ-VAE-2
Lec. 19	Wed, Mar. 23	Image manipulation	Peebles et al., GAN-Supervised Dense Visual Alignment Shen and Zhou, Closed-Form Factorization of Latent Semantics in GANs
Lec. 20	Mon, Mar 28	Optimization	Choi et al., On Empirical Comparisons of Optimizers for Deep Learning LeCun, Efficient Backprop Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent
Lec. 21	Wed, Mar. 30	Datasets	Grauman et al., Ego4D: Around the World in 3,000 Hours of Egocentric Video Recht et al., Do ImageNet Classifiers Generalize to ImageNet? Engstrom et al., Identifying Statistical Bias in Dataset Replication (optional)
Lec. 22	Mon, Apr. 4	Contrastive learning	He et al., Momentum contrast for unsupervised visual representation learning Grill et al. Bootstrap your own latent: A new approach to self-supervised Learning
Lec. 23	Wed, Apr 6	Lighting	Li and Snavely, Learning Intrinsic Image Decomposition from Watching the World Liu et al., Learning to Factorize and Relight a City
Lec. 24	Mon, Apr. 11	Robotics	Fu et al., Coupling Vision and Proprioception for Navigation of Legged Robots Loquercio et al., Learning High-Speed Flight in the Wild
Lec. 25	Wed, Apr. 13	Learning from (or not from) exploration	Du et al., Curious Representation Learning for Embodied Intelligence Chen et al., Learning to drive from a world on rails
Lec. 26	Mon, Apr 18	Ethics	Chai et al., What makes fake images detectable? Understanding properties that generalize Bender et al., On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Optional: Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

Course information

This is a seminar-style graduate-level class covering very recent advances in computer vision. The main focus of the class will be on reading and critiquing recent research papers. In each lecture, you will present and critique several recent research papers. You will also explore ideas covered in the course via a final self-directed project. This class is not intended to be an introduction to computer vision or deep learning. There will be no problem sets.

Lectures: Lectures will take place on Monday and Wednesday, 3:00 - 4:30pm, either over Zoom or in person. Since this is a discussion-based class, your attendance is required. Missing more than two classes without an excuse will negatively affect your grade. Recordings are available on Canvas. We will take attendance on randomly chosen days.

Attending: You are welcome to participate in the class either in person or over Zoom. Lecture recordings will be available until the end of the course.

Prerequisites: This is an advanced vision course. Students are expected to have taken an introductory vision course before enrolling (EECS 442, 504, or equivalent), so that they will be prepared to read and discuss recent research.

Paper reviews: You'll be required to submit short paper reviews each week (one per class), beginning the week of Lecture 2. Your review should be based on the paper itself, rather than the discussion. It is therefore due before the paper is presented in class (i.e. at 3pm on Monday or Wednesday).

Summarize the paper. For most papers, this means explaining technical contributions, such as key mathematical insights, algorithms, and architectures.
Briefly explain how the paper relates to previous work, and why its contributions might be (or might not be) important.
Summarize the key experiments.
Discuss the paper's shortcomings: e.g. limitations to the methods, unconvincing aspects of experiments, presentation issues.

Reviews will be graded as: ✓+, ✓, ✓−, 0. We will not accept late submissions without a valid excuse. However, we will drop your 5 lowest review scores.

Q&A: This course has a Piazza forum, where you can ask public questions. We also appreciate it when you respond to questions from other students. If you have an important question that you would prefer to discuss over email, you may email the course staff (eecs542-w22-staff@umich.edu), or you can contact the instructor by email directly.

Textbooks: In this class, we'll mostly be reading research papers, rather than textbooks. The following might be useful as reference, though:

Goodfellow, Bengio, Courville. Deep Learning. (available for free online)
Szeliski. Computer Vision: Algorithms and Applications, 2nd edition draft (available for free online)

If you have feedback for the author of the Szeliski book draft, please submit it here, and we'll pass it along!

Presentations: You will give 1-2 presentations throughout the course (depending on enrollment). We will distribute one list of papers during the first week of class, and another a few weeks into the semester. You will rank the list of papers, and we will assign one to you. Most classes will also contain an background talk, which will provide background material to help you understand the papers.

Peer reviews: Your presentation grade will in part be based on peer reviews from other students. You will be randomly assigned to do peer reviews for 3 papers.

GPU computing: For the project, you may require GPUs to train deep learning models. One common option is to use Google Colab. Please note, however, that the free version comes with significant limitations

Grading: Final grades will be computed as follows:

Final project	45%
Reviews	25%
Class presentations	20%
Participation and peer reviews	10%

Academic integrity: While you are encouraged to discuss homework assignment with other students, your programming work must be completed individually. You must also write up your solution on your own. You may not search for solutions online, or use existing implementations of the algorithms. Please see the Michigan engineering honor code for more information.

Support: The counseling and psychological services center (CAPS) provides support for a variety of issues, including mental health and stress.

Presentation guidelines

You will be in charge of teaching one class, as part of a group of 3 people (starting Lec. 4). Each class will be organized around a topic of ongoing research. We'll send up a sign-up sheet after the first class, where you will rank

Organization: We suggest organizing most classes as follows:

Background (20 mins)
Paper 1 (20 mins)
Paper 2 (20 mins)
Discussion (20 mins)

The background section is usually the most important part of the class. It should resemble a mini-lecture, covering the "basics" that students will need to understand the paper presentations. For example, if the class is covering papers about recent transformer papers, this section should review what a transformer is, and it should touch on any relevant findings that are necessary to understand the papers. Often, this will involve also describing prior attempts to solve the problems that the (much more recent) papers address.

Each paper section should be a critical presentation the work in the paper. You should explain what problem the researchers were addressing, their motivation for what their solution was, and how well they succeeded at that goal. Unlike introductory courses, where methods are largely well-understood and have passed the "test of time", the papers in this class will often have important limitations. We therefore encourage you to take a critical approach to reading the papers, and to describe possible shortcomings. We also encourage you to discuss things in the paper that you do not think were well-justified, and choices by the authors that you did not understand.

Finally, for the (optional) discussion, you will lead a brief interactive session, where students can debate the issues at stake in the papers. For example, you might run a Q&A session where you ask: should we really consider language-based supervision to be "unsupervised", or do we need to interact with the world to learn good representations?.

Slides: You are allowed to use existing slides and figures, but please clearly credit the authors. Please submit your slides to us in PDF form. By default, we will post your slides only on Canvas, so that they are only visible to those enrolled in the class. Howeve, we'd also be happy to post them publicly if you'd like.

Signing up: We'll assign people to presentation timeslots in two phases (i.e. the first and second halves of the class). You'll fill out a questionaire indicating which classes you'd like to participate in. If you happen to have a group of 3 in mind already, please indicate this on the form, and we will try to assign you to a single topic (we unfortunately cannot accommodate groups with other sizes).

Project guidelines

You'll do a self-directed group project, due at the very end of the course. Groups should be at most 4 students, unless you are given permission from the instructor. Deliverables include:

Project proposal
Report
Presentation (a 5-min talk)

Please sign up for a presentation timeslot here.

Staff & Office Hours

Andrew Owens

Instructor

Max Hamilton

GSI

Name	Office hours time
Andrew Owens	Tuesday 4:30 - 5:00pm
Max Hamilton	Thursday 3:00 - 3:30pm

Office hours will take place over video chat, using the same link as lecture.

EECS 542: Advanced Topics in Computer Vision

Instructor: Andrew Owens Winter 2022

Schedule

Course information

Presentation guidelines

Project guidelines

Staff & Office Hours