Action Bank™
Information and Code Download
Jason J. Corso
Action Bank™
Information and Code Download
|
Action Bank™: A High-Level Representation of
Activity in Video
Human motion
and activity is extremely complex. Most promising recent approaches
are based on low- and mid-level features (e.g.,
local space-time features, dense point trajectories, and dense 3D
gradient histograms). In contrast, the Action Bank™ method is a new
high-level representation of activity in video. In short, it embeds a
video into an "action space" spanned by various action detector
responses, such as walking-to-the-left, drumming-quickly, etc. The
individual action detectors in our implementation of Action Bank™ are
template based detectors using the action spotting work of Derpanis et
al. CVPR 2010. Each individual action detector correlation video
volume is transformed into a response vector by volumetric max-pooling
(3-levels for a 73-dimension vector); in our library and methods
there are 205 action detector templates in the bank, sampled broadly
in semantic and viewpoint space. Our paper
shows how a simple classifier like an SVM can use this high
dimensional representation to effectively recognition realistic
videos of complex human activities.
On this page, you will find downloads for our source
code, already processed versions of major vision
data sets, and a description about the method and the code in
some more detail.
News / Updates
Code / Download:
Action Bank™ Versions of Data Sets
Benchmark ResultsWe have tested action bank on a variety of activity recognition data sets. See the paper for full details. Here, we include a sampling of the results. UCF Sports UCF 50 HMDB 51Publications:
FAQ / HelpWe try to provide some answers to frequent questions and help below in running the code and/or using the outputted banked vectors.
Question 1:
I am running the software on a video and it hangs; what's going on?
The most likely answer to this question is not that the system is
hanging but that the system is processing through the method, which
is relatively computationally expensive (especially in this pure
python form).
Here, I run through an example to give you an idea of what you should
see...
I am processing through the first video in the UCF50 BaseballPitch
class (named: v_BaseballPitch_g01_c01.avi). This video is 320x240 and
has 107 frames; it is not a big video. I copied and renamed it to
/tmp/input.avi
python actionbank.py -s -c 2 -g 2 /tmp/input.avi /tmp/output The -s means this is a single video and not a directory of videos. The -c 2 means use 2 cores for processing. The -g 2 means reduce the video by a factor of two before applying the bank detectors (but after featurizing).
Question 2:
I get this runtime error when I run the code:
actionbank/code/spotting.py:563: RuntimeWarning: invalid value encountered in divide Z = V / (V.sum(axis=3))[:,:,:,np.newaxis]
This case means that there is no motion energy at all for a pixel in
the video, which is quite possible for typical videos. We
explicitly handle it in the subsequent lines of spotting.py by
checking for NAN and INF. I.e., disregard the runtime warning.
Question 3:
The classify function call in ab_svm.py gives an
AttributeError. For example, when I run ab_kth_svm.py, I get the
following error:
Traceback (most recent call last): File "ab_kth_svm.py", line 99, in
This seems to be a change in the Shogun library interface. Our work
was performed with shogun version libshogun
(x86_64/v0.9.3_r4889_2010-05-27_20:52_4889). In newer
versions of shogun, classify is replaced with apply. Note, we have
not yet tested this in house and results may vary.
We also want to point out that the ab_svm.py module is included
as an example of how to use the action bank output for
classification. One can use other, preferred, classifiers or
platforms, such as Random Forests or Matlab, respectively.
|