7-7:25 PM  Wednesday

Digital Watermarking

P. Gupta
D. Justice
M. Pfenninger

Digital watermarking deals predominantly with the issue of marking digital multimedia data with the intent to claim ownership and has become important because the internet has increased the amount and ease of public access to copyrighted materials, thus allowing access of protected materials without the consent of the copyright holder.  The focus of this project is on using block-based DCT and spectral methods to embed a white noise watermark into test images, use detection algorithms to verify the presence and to use Wiener methods to attack the watermark. We expect spectral embedding methods will be more robust and the Wiener attack will be more successful at spectrally encoded images and less so with the frequency embedded.


7:30-7:55 PM  Wednesday

Evaluation of Search Algorithms for Block Matching in Motion Estimation

Eun Jung Kim, Yoon Chung Kim, Woojin Jeong

Motion estimation is important technique for motion compensated coding to eliminate temporal redundancy for video compression.  A block matching algorithm, which operates on a block-by-block basis, has been widely used because it is robust, relatively simple to compute.  It has been adopted in the MPEG standards.  The goal of the project is to evaluate a set of different search strategies for block matching used in motion estimation, such as the three step search, four step search, and hierarchical search strategies.  Optimal, but time consuming, full search algorithm will be included for comparison, and each of the above three strategies will be compared to this search method.  In the evaluation, the above algorithms will be implemented and compared based on mean-squared error (MSE) and computational complexity.  The experiments will be carried out for integer-pixel and half-pixel accurate motion compensation using bilinear interpolation for sub-pixel positions.  As test data, we will use several frames of a video sequence.


8-8:25 PM   Wednesday

Motion Estimation

O-Cheng Chang
Irving Chen
Charles Yana

Interested in motion estimation, we chose to study the influence of two parameters used in this technique: the block size B and the pixel accuracy A.  Our aim is to become more familiar with motion estimation and to experimentally study the influence of both parameters on the performance of the global system, given a specific video sequence.  This technique is applied to video compression, using motion-compensated video coding.  Motion-compensated prediction assumes that the current picture can be locally modeled as a translation of the picture of some previous time.  We study the classical coding scheme, where in a given frame, each block has the same size BxB and all motion vectors are encoded with the same pixel accuracy A.  Also, we fix the level of distortion D.  We chose to use frame-adaptive coding, which means that we process one frame at a time.  There exists an optimum B, say B*, and an optimum A (for fixed B), say P*.  The aim of our project is to find those B* and A* for a specific video sequence.


8:30-9:05 PM   Wednesday

Face Detection in Color Images

Halim Elsaadi
Aniket Deepak Joshi
Ajay Paidi
Ankush Partop Soni
Vijay Swaminathan

Face detection in images is the first step in an intelligent system such as a face recognition system.  In our project we propose an algorithm that finds the locations of a finite number of frontal human faces in a color image.  The skin occupies a distinct and narrow spectrum in the YCbCr color domain.  This property is used to segment skin from the non-skin regions (background).  The transformation to the YCbCr domain should also theoretically remove the effects of differences in illumination from one image to another, although in our experience we have found this not to be true in all cases.  The scope of the project is limited to group photographs at a constant distance from the camera.  This ensures the standardization of the size of each face; thus enabling us to use a standard face template. We also limit our database to images with similar illumination.

Using a database (which is our training set) we create an 'average human face template'.  The skin segmented image will contain both face and non-face regions (such as exposed arms and legs).  Standard noise removal techniques such as median filtering are used to improve the output of this stage.  To eliminate the non-face regions, the human face template is correlated with the skin segmented image.  The hands and legs are expected to correlate poorly with the face template and can be eliminated by thresholding the correlated image.  An area of high correlation can thus be hypothesized to be a face.  An added advantage of correlating is that it will allow us to estimate the center of each face by finding the local maxima in each region.  However, if two or more faces are close to each other, the regions of high correlation will be connected and need to be separated.  One solution to this problem is to take into account the size of the connected region.  A size above a threshold can be regarded as "connected faces".  Facial features such as eyes will be checked for in the regions hypothesized as faces, as final check to eliminate false positives (i.e false labeling of non-faces as faces).  The regions that are now confirmed to be faces are mapped back to the original image and a mark is placed on each face.  To reduce the computational load, we down-sample the original image to a standard size.  The accuracy of this system can be measured statistically by comparing our estimate of the number of faces with the true number of faces in each image.


9:10-9:35 PM    Wednesday

Optical Speech Recognition Using Lip Motion

Andrea Lo
Hirak Parikh
Sakina Zabuawala

Audio-visual communication systems are better than either speech or vision alone.  However, both sensory systems are not always readily available. For example, in noisy conditions, one would have to fall back on the visual system and try to recognize words by following the motion of the lips as well as other cues such as facial expression and context.  Lip reading in general improves the intelligibility of speech and can help listeners discern between spoken sounds that are easily confused in the audio domain (e.g., b vs. v and m > vs. n<).  This project exploits this fact in order to perform optical spee ch recognition using lip motion.  Visual speech units, known as 'visemes', can be translated into sound units, or 'phonemes', based on a few factors.  Articulation can be characterized by many parameters, but previous studies have shown that optical speech recognition can be effective by using the following two features - a) the vertical separation between the two lips and b) the horizontal elongation of the mouth.  This proposed system will extract the vertical profile through the center of the lips and separate the optical flow (motion vectors) of the top and bottom lips.  A waveform for mouth elongation as a function of time is generated by finding the corners of the mouth and computing the distance between the two.  A set of these three waveforms, representing the utterance of a word, is treated as a pattern for our classifier.  Principal Component Analysis (PCA) is performed during the experimental training phase to model each class of words in terms of a compact set of orthogonal basis vectors.  In the recognition step, we generate the three characterizing waveforms for each new utterance.  These features are then projected onto the basis and compared to the projection of the trained model, where a best fit is determined.


12:10-12:40   Thursday

Boundary Detection using Snakes

Jason Reynolds
Magnus Ulfarsson
Nicholas Wine
Timo Mayer

Boundary Detection is important when properties like shape, size and orientation of an object are desired.  Since its introduction in 1987, the snake algorithm has proven itself to be one of the most powerful methods for solving this difficult problem.  Geometrically, snakes are parametric contours within an image.  The snake algorithm thereby tries to find, iteratively, boundaries within an image by finding an energy minimum. 

In our project, we plan to experiment with the snake algorithm in order to determine how to maximize its effectiveness in boundary detection.  One of our primary goals is to determine a general strategy that we can apply to images of a specific type, such as medical images, which will be a focus of our project.  Also, we will test the capabilities of the snake algorithm when the image is pretreated using methods such as edge detection and median filtering.


12:45-1:15    Thursday

Fractal Image Compression

Hugo Shi
Srikanth Maddipati
Eric Drews
Rushali Parikh

Fractal image compression is a relatively recent technology based on the assumption that natural images contain considerable self-similarity, i.e. different parts of an image look alike.  In a fractal-coding scheme, the encoder breaks up an image into non-overlapping blocks called range blocks.  The encoder then attempts to represent each range block as the image that is produced when iteratively applying a contractive transform. The decoder then starts with any initial image and then iteratively applies the encoded transformations and the resulting image will converge to the original.  This scheme can be intuitively viewed as creating a collage, in which we start with any initial image and then try to "cut" and "paste" portions of this initial image such that it looks like our original.

This project will investigate various parameters that go into a fractal-coding scheme such a block size of the partition, types of transformations used, convergence of the decoded image and bit allocation in the fractal code. Various initial images will be used for decoding including the famous Dr. Neuhoff image.


1:20-1:45     Thursday

Image Impulse Noise Removal

Chun, Se Young
Nam, Gunwoo
Shin, Jaemin

Images are often corrupted by impulse noise.  Nonlinear filters, such as median filters, have demonstrated good proficiency in the removal of impulse noise.  However, since these filters are typically implemented uniformly across an image, they tend to modify pixels undisturbed by noise.          

New efficient algorithms are developed to improve the performance of impulse noise removal based on a detection-estimation strategy.  A signal-dependent rank-order mean filter (SD-ROM) is one of them.  The nature of filtering operation in SD-ROM is conditioned on a state variable, which indicates how much the given pixel is corrupted.  A simple two-state approach is described in which the algorithm switches between the output of an identity filter and a ROM filter.  This simple strategy is generalized into a multistage approach using weighted combinations of the identity and ROM filter.         

Another interesting algorithm is the Progressive Switching Median (PSM) filter.  We will show how this algorithm performs better than traditional methods, such as a median filter, an adaptive wiener filter and an out-range-smoothing filter, by simulating the algorithm on several images.  The difference between these algorithms will be shown. Finally, we will address optimization methods to determine weights in multistage SD-ROM .

1:50-2:15

Super-Resolution Image Reconstruction

Joel LeBlanc, Travis Smith, and Richard Spangler

We demonstrate how a sequence of low-resolution images with relative motion between the scene and the sensor can be processed into a single, high-resolution image.  Resolution improvement is achieved by iteratively improving a high-resolution estimate of the analog scene from a sequence of degraded, low-resolution frames.  The final high-resolution image is viewed as a maximum likelihood estimate based on the low-resolution frames and a priori knowledge of the imaging process.

The super-resolution process involves estimating the relative motion between each of the low-resolution frames and a user-selected reference frame.  This reference frame is converted into an initial high-resolution estimate, which is then iteratively improved by passing this estimate through the model of the imaging process and updating it based on  the differences between the predicted and measured low-resolution frame.

We demonstrate the super-resolution capability of this method using a movie collected from a digital camera.  In a addition, we quantify the methods performance by evaluating the mean squared error of the high-resolution estimate from a synthetic data set.