A2D: A Dataset and Benchmark for Action Recognition and Segmentation with Multiple Classes of Actors
Authors: Chenliang Xu and Jason J. Corso
Email: {cliangxu,jjcorso}@umich.edu
URL: http://web.eecs.umich.edu/~jjcorso/r/a2d
Date: 06/01/2015
Version: 1.0

If you use the dataset, cite the following reference:

[1] Can Humans Fly? Action Understanding with Multiple Classes of Actors.
    C. Xu, S.-H. Hsieh, C. Xiong and J. J. Corso
    IEEE Conference on Computer Vision and Pattern Recognition, 2015

----------------------------------------
LICENSE AND COPYRIGHT

This dataset has been acquired for scientific research purposes and may be used for these purposes, including processing the data and showing it in publications and presentations.  The data may not be used for training commercial systems.   The dataset may not be republished in any form without the written consent of the authors.

If a different license is requested, please contact the dataset authors.


The videos in this dataset were downloaded from YouTube.  If any copyright is believed to have been infringed, please contact the University of Michigan and the authors. 


----------------------------------------
DATA

videoset.csv
    Each row stands for a video/clip and the columns are organized as follows. The first four columns contain information of how the raw videos from YouTube are being processed. 
    1. VID - YouTube 11 digital video ID.
    2. Label - The primary actor-action label in the video.
    3 & 4. StartTime & EndTime - The timestamps of the original video from where the clip is cut for our use (e.g. computation and annotation).
    The next five columns describe the cut clip in our use (e.g. computation and annotation) and the training/testing split.
    5 & 6. Height & Width - We resize all clips to a fixed height of 320p and keep the original aspect ratio. Thus width is computed accordingly.
    7. Number of frames contained in a cut clip.
    8. Number of annotated frames.
    9. Usage - 0 for training; 1 for testing.

clips320H
    It contains the 3782 cut clips in our use. We use the following command to extract frames "ffmpeg -i Example.mp4 Example/%05d.png". You are encouraged to check the number of frames with the ones in videoset.csv to ensure that you have correct extraction. You can also download our extraction of frames, see pngs320H below. The ordering of our frames start from 00001.

pngs320H (optional)
    It contains all extracted png frames from clips320H. It is an optional download package (~119GB).

----------------------------------------
ANNOTATION

Annotations/mat
    This fold contains all annotation you need in Matlab mat files. They are named by frames and are organized by the corresponding videos.
    class - A cell array contains object instance level actor-action labels on the current frame. It also defines the occlusion priority for objects: latter object is always on top of former object if occluded. Notice that we only keep the actor part for actors performing "none" action. For example, "adult" simply means "adult-none".
    id - An array of the integer labels corresponding to the ordering in "class".
    reBBox - A matrix of the object bounding boxes. Rows are objects corresponding to the ordering in "class". Columns are "x_min, y_min, x_max, y_max", with the coordinate origin at the top-left corner of an image.
    reMask - An array of 2D binary segmentation masks corresponding to the ordering in "class".
    reS_id - Image segmentation ground-truth with each pixel containing its actor-action label. The occlusion is handled.
    reS_col - The RGB color image of reS_id, see below for the mapping.

Annotations/col
    This folder contains images of reS_col.


----------------------------------------
LABEL ID & COLOR MAP

We use the following method to encode label strings to integers. The actor-action labels are represented by two-digit integer numbers, with digit in tens place for actor and digit in ones place for action. For example, adult-crawling is 12. Note that we only have 43 valid actor-action labels. For example, we do not have label adult-flying (14). The none action encodes either no action or actions that are not in the eight actions we consider. The background label is 0. 

Actors: adult (1), baby (2), ball (3), bird (4), car (5), cat (6), and dog (7).
Actions: climbing (1), crawling (2), eating (3), flying (4), jumping (5), rolling (6), running (7), walking (8) and none (9).

Here we list all combinations of labels with their RGB color encoding. Notice that out of all 64 labels, only 44 labels are valid.

    NAME	ID	Valid	R	G	B
Background	0	1	0	0	0
adult-climbing	11	1	52	1	1
adult-crawling	12	1	103	1	1
adult-eating	13	1	154	1	1
adult-flying	14	0	205	1	1
adult-jumping	15	1	255	1	1
adult-rolling	16	1	255	51	51
adult-running	17	1	255	103	103
adult-walking	18	1	255	154	154
adult-none	19	1	255	205	205
baby-climbing	21	1	52	46	1
baby-crawling	22	1	103	92	1
baby-eating	23	0	154	138	1
baby-flying	24	0	205	184	1
baby-jumping	25	0	255	230	1
baby-rolling	26	1	255	235	51
baby-running	27	0	255	240	103
baby-walking	28	1	255	245	154
baby-none	29	1	255	250	205
ball-climbing	31	0	11	52	1
ball-crawling	32	0	21	103	1
ball-eating	33	0	31	154	1
ball-flying	34	1	41	205	1
ball-jumping	35	1	52	255	1
ball-rolling	36	1	92	255	51
ball-running	37	0	133	255	103
ball-walking	38	0	174	255	154
ball-none	39	1	215	255	205
bird-climbing	41	1	1	52	36
bird-crawling	42	0	1	103	72
bird-eating	43	1	1	154	108
bird-flying	44	1	1	205	143
bird-jumping	45	1	1	255	179
bird-rolling	46	1	51	255	194
bird-running	47	0	103	255	210
bird-walking	48	1	154	255	225
bird-none	49	1	205	255	240
car-climbing	51	0	1	21	52
car-crawling	52	0	1	41	103
car-eating	53	0	1	62	154
car-flying	54	1	1	82	205
car-jumping	55	1	1	103	255
car-rolling	56	1	51	133	255
car-running	57	1	103	164	255
car-walking	58	0	154	194	255
car-none	59	1	205	225	255
cat-climbing	61	1	26	1	52
cat-crawling	62	0	52	1	103
cat-eating	63	1	77	1	154
cat-flying	64	0	103	1	205
cat-jumping	65	1	128	1	255
cat-rolling	66	1	154	51	255
cat-running	67	1	179	103	255
cat-walking	68	1	205	154	255
cat-none	69	1	230	205	255
dog-climbing	71	0	52	1	31
dog-crawling	72	1	103	1	62
dog-eating	73	1	154	1	92
dog-flying	74	0	205	1	123
dog-jumping	75	1	255	1	153
dog-rolling	76	1	255	51	174
dog-running	77	1	255	103	194
dog-walking	78	1	255	154	215
dog-none	79	1	255	205	235