Jason J. Corso
|
ACE - Active Clustering for Exploitation and Defense Forensics
People Jason Corso (PI), Caiming Xiong, David Johnson
Past Members: Albert Chen Funding: DARPA Computer Science Study Group (CSSG) (HR0011-09-1-0022 and N10AP20032). This project is kicking off in July 2010. Objectives and Goals:
We propose a revolutionary new approach to data analysis and modeling for computer vision called Active Clustering. Whereas traditional methods in machine learning typically require input from the user before commencing computation and have no subsequent interaction, our approach seeks dynamic input from the user during processing. In comparison to traditional supervised approaches which require extensive up-front effort from the user, in our case, the user will not be required to label large amounts of data. Rather, during processing we will ask simple questions of the user that let us adapt our underlying representation of the sample space. Furthermore, in many defense settings, large amounts of data for a particular target of interest (e.g., the "black Mercedes that is pictured here") may not be available anyway.
In traditional unsupervised, or clustering, methods, the input from the user is
in the form of basic assumptions about the sample space. There are two relevant
problems with these methods. First, the assumptions typically require some
degree of technical know-how on the part of the user. However, many DoD/IC
end-user analysts would lack the necessary training to effectively map mission
sets to clustering assumptions. Second, without the correct feature space,
there is a disparity between the underlying distance function driving the
clustering and the user's semantics in most realistic settings. In other words,
the samples the clustering algorithm says are similar are in no way tied to the
semantics of the user. Our proposed Active Clustering methodology overcomes
both of these issues: simple intuitive questions about grouping are asked of
the user thereby incorporating his or her semantics and requiring no technical
knowledge of how the system works. These high-level questions are tied to the
underlying mathematics rigorously.
More recent methods that incorporate the user dynamically, such as Active
Learning methods, seek a classifier over a predefined set of classes, which
provides convenient mechanisms for selecting which samples to be labeled next
by the user. The same convenience does not exist for the clustering (i.e.,
generative) case because the estimate of uncertainty or information gain is not
as readily computed.
The main objective of this project is to develop the Active Clustering approach to video and image exploitation and forensics. The key questions to be answered in the new field of active clustering are (i) appropriate distance function formulation, (ii) clustering methodology, (iii) active user querying, and (iv) integration of user responses into learning. The inquiry will involve realistic data corpora and validation criteria.
Defense Relevance.
Exploitation and forensics comprise the core defense relevance of our proposal with broad applications such as persistent surveillance and urban C2. The VIMEXF Problem is our focus: given a large corpus of video and image data, we want to allow the analyst (level 1, 2 or 3) to quickly search through the video and image data. Possible queries are to search for standard mission elements, or to select the set of clips containing a particular person or feature. Furthermore, the approach must scale well and adapt to new data on-line without full reindexing. We stress the emphasis is on perceptual and semantic content rather than existing meta content such as geospatial coordinates of the field of view.
Publications:
|