Jason J. Corso; EECS @ U of Michigan

Jason J. Corso

Professor
Electrical Engineering
and Computer Science
University of Michigan

Email:	jjcorso@eecs.umich.edu
Office:	4238 EECS
Phone:	734-647-8833
Bio:	[txt]
Vita:	[pdf]
Hours:	M-R 1330-1500 when door open Appt preferred: BOOK When all else fails: eecs-corso-va@umich.edu
Cal:	Availability
Job Openings Prospective Students
Email Policy

Index Page Anchors

Selected Publications

Code and Data

Publication Tag Cloud

VQA action detection action prediction action segmentation active clustering activity recognition artificial intelligence attribute augmented reality autonomous driving belief propagation bioinformatics biomarkers biometrics braintumor cognitive systems computational finance computer forensics computer graphics computer vision computer-aided diagnosis control cosegmentation data mining deep learning deep reinforcement learning deformable dictionary transfer digitial humanities document imaging domain adaptation dynamic linear models endoscopy evaluation event recognition facade detection face detection face recognition feature extraction frame interpolation fusion gesture recognition gpu grammar graph cuts graph-based graphical models haptics hierarchical higher-order human pose estimation human-computer interaction human-in-the-loop hybrid intelligence image captioning image denoising image processing image retrieval image understanding inference information fusion inpainting language grounding localization lung imaging machine learning mapping max-margin medical imaging metric learning mobile manipulation mobile robotics mosaicking motion estimation mrf multimedia natural language navigation neuroimaging object detection object grounding object-object interaction ontology particle filters pretraining probabilistic ontology protein structure prediction random forest reconstruction robotics segmentation semantic segmentation semi-supervised single-view depth estimation sketch generation slam spectral clustering spine imaging stereo streaming supervoxel surgical robotics tomographic reconstruction tracking video inpainting video object segmentation video prediction video saliency video segmentation video summarization video to text video understanding viewpoint estimation vision and language vision-based control visual psychophysics visual servo control volume rendering voxel maps weak supervision

Dr. Jason J. Corso is currently a Professor of Electrical Engineering and Computer Science at the University of Michigan. He received his Ph.D. in Computer Science at The Johns Hopkins University in 2005. He is a recipient of the NSF CAREER award (2009), ARO Young Investigator award (2010), Google Faculty Research Award (2015) and on the DARPA CSSG.

He is also the Co-Founder and CEO of Voxel51, a computer vision tech startup that is building the state of the art platform for video and image based applications.

His main research thrust is high-level computer vision and its relationship to human language, robotics and data science. He primarily focuses on problems in video understanding such as video segmentation, activity recognition, and video-to-text. From biomedicine to recreational video, imaging data is ubiquitous. Yet, imaging scientists and intelligence analysts are without an adequate language and set of tools to fully tap the information-rich image and video. He works to provide such a language; specifically, he primarily studies the coupled problems of segmentation and recognition from a Bayesian perspective emphasizing the role of statistical models in efficient visual inference. His long-term goal is a comprehensive and robust methodology of automatically mining, quantifying, and generalizing information in large sets of projective and volumetric images and video.

Selected Publications [complete list here] [google scholar]

[1]	H. Tang, D. Xu, Y. Yan, Y. Wang, J. J. Corso, and N. Sebe. Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. [ bib \| .pdf ]
[2]	L. Zhou, Y. Kalantidis, X. Chen, J. J. Corso, and M. Rohrbach. Grounded video description. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. [ bib \| .pdf ]
[3]	B. Griffin and J. J. Corso. BubbleNets: Learning to select the guidance frame in video object segmentation by deep sorting frames. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. [ bib \| .pdf ]
[4]	L. Zhou, C. Xu, and J. J. Corso. Towards automatic learning of procedures from web instructional videos. In Proceedings of AAAI Conference on Artificial Intelligence, 2018. [ bib \| code \| data \| http ]
[5]	L. Zhou, Y. Zhou, J. J. Corso, R. Socher, and C. Xiong. End-to-end dense video captioning with masked transformer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. [ bib \| code \| .pdf ]
[6]	L. Zhou, N. Louis, and J. J. Corso. Weakly-supervised video object grounding from text by loss weighting and object interaction. In Proceedings of British Machine Vision Conference, 2018. [ bib \| .pdf ]
[7]	R. Szeto and J. J. Corso. Click-here: Human-localized keypoints as guidance for viewpoint estimation. In Proceedings of IEEE International Conference on Computer Vision, 2017. [ bib \| poster \| code \| project \| data \| .pdf ]
[8]	D. M. Johnson, C. Xiong, and J. J. Corso. Semi-supervised nonlinear distance metric learning via forests of max-margin cluster hierarchies. IEEE Transactions on Knowledge and Data Engineering, 28(4):1035--1046, 2016. [ bib \| DOI \| .pdf ]
[9]	C. Xu and J. J. Corso. LIBSVX: A supervoxel library and benchmark for early video processing. International Journal of Computer Vision, 119:272--290, 2016. [ bib ]
[10]	R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In Proceedings of AAAI Conference on Artificial Intelligence, 2015. [ bib \| .pdf ]
[11]	C. Xu, S.-H. Hsieh, C. Xiong, and J. J. Corso. Can humans fly? Action understanding with multiple classes of actors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. [ bib \| poster \| data \| .pdf ]
[12]	P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2013. [ bib \| poster \| data \| .pdf ]
[13]	C. Xu, S. Whitt, and J. J. Corso. Flattening supervoxel hierarchies by the uniform entropy slice. In Proceedings of the IEEE International Conference on Computer Vision, 2013. [ bib \| poster \| project \| video \| .pdf ]

Code and Data Downloads publication-code is linked from papers in pubs

ViP is a PyTorch-based video software platform for problems like video object detection, activity recognition, event classification that makes working with video models much easier. See the Technical Report for more information.

ActivityNet-Entitiesadd grounded bounding boxes to the ActivityNet dataset for the purposes of grounded video description. This was released with our CVPR 2019 paper.

M-PACT is a general purpose software framework for video understanding, including activity recognition, video classification, and others; it is based on TensorFlow. This technical report describes it in more detail.

YouCook2 is the largest task-oriented, instructional video dataset in the vision community. It contains 2000 long untrimmed videos from 89 cooking recipes. The procedure steps for each video are annotated with temporal boundaries and described by imperative English sentences. ArXiV report for the paper/data.

Click-Here CNNs: This is the project page supplying code and data associated with our ICCV 2017 paper.

A2D: Actor-Action Dataset is a new dataset to support a broad class of video understanding problems: action recognition, actor-class recognition, multi-label actor/action recognition, actor-action semantic segmentation. Data and evaluation code is available. This dataset was released with our CVPR 2015 paper.

Video2Text.net: A website and web-service for automatic conversion of videos to natural language sentences based on the video content. This website showcases our work in the vision+language domain.

YouCook data set: 88 challenging videos of various cooking (third-person viewpoint, different backgrounds, dynamic camera and person movement) with natural language annotations (about 8 per video) and object and action annotations. Includes a benchmark ROUGE scoring evaluation. The data set was published with our CVPR 2013 paper.

Hierarchy Agreement Index: implementation of our AAAI LBP 2013 cross-hierarchy evaluation tool for general use.

Random Forest Distance -- tree-structured metric learning that implicitly adapts the metric over the sample space based on our KDD 2012 paper. (Code updated 2/28/14)

Action Bank full code and processed data sets [direct link to code]

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing. Implements a suite of supervoxel video segmentation methods as well as a quantitative set of 2D and 3D metrics for good supervoxels.

Graph-Shifts Code (Java) and example data.

Video label propagation code and benchmark data set.

UB/College Park stereo building facade dataset. [more information].

Miscellaneous

Camera Calibration Lecture Notes from Computer Vision

News and Events

Tweets by _JasonCorso_
This list is maintained via Twitter:

. [Older pre-twitter news.]

DOWNLOAD PUBLICKEY

-----BEGIN PGP PUBLIC KEY BLOCK-----
 Version: GnuPG v1.4.1 (Darwin)
 
 mQGiBEQq+U0RBACkCVEVG7xeQW2h3kua176/3+Ce4v42OE5205WW5lF+RMqjqZue
 2ZG9ip1QKBaS1qeAs11FJ0DxTDf6J7Yl7csksVBJpDvooJ5ktmOJb0NdvIx2+eqC
 3xOFRItRyts+tV53qtnO8XxOn//HZlReu0uswgSqulqf8ievg0XqPxgW5wCghZ4g
 TBM5klQXZ7axkTsJec//oz8D/ReFK9X9uYBU+8ytFyZA32diacCMvEbmF+cnvYfl
 5MdceJXgiBR/cwRpeCQvrkt4fdr3p1n74i9hwcRl2Uy1X6446qgZod1e9MM8WOKx
 ncStja2vMZvj9ewLE+pHUlf0Ij+hUAFGopU6p++h2kK7HJmHhbQ7WAQU8M68IY8a
 vIopA/4vzPQ0hYsaiUuZ2rAXAkPXFYtOmNEbiFUSwqWdhb3aftRxADaOguo2m1iH
 KoHC0KwKXiHWZHE4O+ge4KXsU+rPn8lgBCYZKvU8OIdIQIZT4Y+tHfSdHNTyX8uJ
 dee/hfSz0LOLTam5CpeMVSWEvrifx3bYEVyBd5coeIu8Aqvw7rQkSmFzb24gSi4g
 Q29yc28gPGpjb3Jzb0BtaWkudWNsYS5lZHU+iFsEExECABsFAkQq+U0GCwkIBwMC
 AxUCAwMWAgECHgECF4AACgkQtxRoTKSn+wJvAwCfX7Jff0w8zkBF9MaGRdLvtoHQ
 9gIAn3k2aoUzO1TUMMsFhqHtWxNZ64j0uQINBEQq+WcQCADYGvCxli/CTfTw5Fjv
 N/AI3x7kr2YI/uQBuwsmYMTYHhSyaxKhY7Xzl83y3PPgMzwX3Dr8Vn9JLrNN7f5d
 E24w+CYwzgRm4rHMx7oxUzj/aQuNNr7YTsrukZczgswy91uz/nBFUUZAZ6Ywc/Fa
 yi6E3a9Y94zWBPJKmA7Mf7+bvCnXwk7WIt2tBhAtVg3fRv1mzXyh9k9SJZ6jKBQK
 NvSzBV2J/+Dfo9MNI/Ar0uD+PsqLJX+qVDX2YSO9q91Vxfuiw1QH8CjyR1pMoRHa
 TD3GTCCXwb13iJPGq9eGjmdF2UlAhPpW7cL6zl0Gf9t7Ktfwt0J/IrmXkhGcU1k7
 z/kXAAQLB/9vquT3OTID9ucIjz50AldGDHYXAhzdsadqMD8PMcQIg6BBn92qrdqp
 PhVNFsNGmCLzpE6XLKDWYvmiJceJ8J/7puTh31HBP7qb9e+9gtVBbTHF03HZn84i
 2vacwJafkKSVDXdJ87w75jzRmOcTzFdSBK5zTwLQO/c+1yI4NFIf4+M1W16TwWEI
 N+9v7f4ptE95DlV4fXPEefD0ZxtNA8yZ2++CW9D4fhJ0dOu0N2igaAs2dNrB/IM+
 h1liDYCHo3ume2qO35kwj81NH4qn0v+RhHdz9PYEFxS3KDhtvTEituWNgN8N1Fud
 tLJR99WKCpeQcvF8ss5goi/xLLcseGTkiEYEGBECAAYFAkQq+WcACgkQtxRoTKSn
 +wIjBQCeMWoB4q0Tfh1eurPx6jZFSawbeJkAn1yx60QT1HQs1IW5VTeSGbQv4xcW
 =kAVQ
 -----END PGP PUBLIC KEY BLOCK-----