A2D: A Dataset and Benchmark for Action Recognition and Segmentation with Multiple Classes of Actors Authors: Chenliang Xu and Jason J. Corso Email: {cliangxu,jjcorso}@umich.edu URL: http://web.eecs.umich.edu/~jjcorso/r/a2d Date: 06/01/2015 Version: 1.0 If you use the dataset, cite the following reference: [1] Can Humans Fly? Action Understanding with Multiple Classes of Actors. C. Xu, S.-H. Hsieh, C. Xiong and J. J. Corso IEEE Conference on Computer Vision and Pattern Recognition, 2015 ---------------------------------------- LICENSE AND COPYRIGHT This dataset has been acquired for scientific research purposes and may be used for these purposes, including processing the data and showing it in publications and presentations. The data may not be used for training commercial systems. The dataset may not be republished in any form without the written consent of the authors. If a different license is requested, please contact the dataset authors. The videos in this dataset were downloaded from YouTube. If any copyright is believed to have been infringed, please contact the University of Michigan and the authors. ---------------------------------------- DATA videoset.csv Each row stands for a video/clip and the columns are organized as follows. The first four columns contain information of how the raw videos from YouTube are being processed. 1. VID - YouTube 11 digital video ID. 2. Label - The primary actor-action label in the video. 3 & 4. StartTime & EndTime - The timestamps of the original video from where the clip is cut for our use (e.g. computation and annotation). The next five columns describe the cut clip in our use (e.g. computation and annotation) and the training/testing split. 5 & 6. Height & Width - We resize all clips to a fixed height of 320p and keep the original aspect ratio. Thus width is computed accordingly. 7. Number of frames contained in a cut clip. 8. Number of annotated frames. 9. Usage - 0 for training; 1 for testing. clips320H It contains the 3782 cut clips in our use. We use the following command to extract frames "ffmpeg -i Example.mp4 Example/%05d.png". You are encouraged to check the number of frames with the ones in videoset.csv to ensure that you have correct extraction. You can also download our extraction of frames, see pngs320H below. The ordering of our frames start from 00001. pngs320H (optional) It contains all extracted png frames from clips320H. It is an optional download package (~119GB). ---------------------------------------- ANNOTATION Annotations/mat This fold contains all annotation you need in Matlab mat files. They are named by frames and are organized by the corresponding videos. class - A cell array contains object instance level actor-action labels on the current frame. It also defines the occlusion priority for objects: latter object is always on top of former object if occluded. Notice that we only keep the actor part for actors performing "none" action. For example, "adult" simply means "adult-none". id - An array of the integer labels corresponding to the ordering in "class". reBBox - A matrix of the object bounding boxes. Rows are objects corresponding to the ordering in "class". Columns are "x_min, y_min, x_max, y_max", with the coordinate origin at the top-left corner of an image. reMask - An array of 2D binary segmentation masks corresponding to the ordering in "class". reS_id - Image segmentation ground-truth with each pixel containing its actor-action label. The occlusion is handled. reS_col - The RGB color image of reS_id, see below for the mapping. Annotations/col This folder contains images of reS_col. ---------------------------------------- LABEL ID & COLOR MAP We use the following method to encode label strings to integers. The actor-action labels are represented by two-digit integer numbers, with digit in tens place for actor and digit in ones place for action. For example, adult-crawling is 12. Note that we only have 43 valid actor-action labels. For example, we do not have label adult-flying (14). The none action encodes either no action or actions that are not in the eight actions we consider. The background label is 0. Actors: adult (1), baby (2), ball (3), bird (4), car (5), cat (6), and dog (7). Actions: climbing (1), crawling (2), eating (3), flying (4), jumping (5), rolling (6), running (7), walking (8) and none (9). Here we list all combinations of labels with their RGB color encoding. Notice that out of all 64 labels, only 44 labels are valid. NAME ID Valid R G B Background 0 1 0 0 0 adult-climbing 11 1 52 1 1 adult-crawling 12 1 103 1 1 adult-eating 13 1 154 1 1 adult-flying 14 0 205 1 1 adult-jumping 15 1 255 1 1 adult-rolling 16 1 255 51 51 adult-running 17 1 255 103 103 adult-walking 18 1 255 154 154 adult-none 19 1 255 205 205 baby-climbing 21 1 52 46 1 baby-crawling 22 1 103 92 1 baby-eating 23 0 154 138 1 baby-flying 24 0 205 184 1 baby-jumping 25 0 255 230 1 baby-rolling 26 1 255 235 51 baby-running 27 0 255 240 103 baby-walking 28 1 255 245 154 baby-none 29 1 255 250 205 ball-climbing 31 0 11 52 1 ball-crawling 32 0 21 103 1 ball-eating 33 0 31 154 1 ball-flying 34 1 41 205 1 ball-jumping 35 1 52 255 1 ball-rolling 36 1 92 255 51 ball-running 37 0 133 255 103 ball-walking 38 0 174 255 154 ball-none 39 1 215 255 205 bird-climbing 41 1 1 52 36 bird-crawling 42 0 1 103 72 bird-eating 43 1 1 154 108 bird-flying 44 1 1 205 143 bird-jumping 45 1 1 255 179 bird-rolling 46 1 51 255 194 bird-running 47 0 103 255 210 bird-walking 48 1 154 255 225 bird-none 49 1 205 255 240 car-climbing 51 0 1 21 52 car-crawling 52 0 1 41 103 car-eating 53 0 1 62 154 car-flying 54 1 1 82 205 car-jumping 55 1 1 103 255 car-rolling 56 1 51 133 255 car-running 57 1 103 164 255 car-walking 58 0 154 194 255 car-none 59 1 205 225 255 cat-climbing 61 1 26 1 52 cat-crawling 62 0 52 1 103 cat-eating 63 1 77 1 154 cat-flying 64 0 103 1 205 cat-jumping 65 1 128 1 255 cat-rolling 66 1 154 51 255 cat-running 67 1 179 103 255 cat-walking 68 1 205 154 255 cat-none 69 1 230 205 255 dog-climbing 71 0 52 1 31 dog-crawling 72 1 103 1 62 dog-eating 73 1 154 1 92 dog-flying 74 0 205 1 123 dog-jumping 75 1 255 1 153 dog-rolling 76 1 255 51 174 dog-running 77 1 255 103 194 dog-walking 78 1 255 154 215 dog-none 79 1 255 205 235