University of Michigan
Computer Science & Engineering
Ann Arbor, MI 48109-2121
The CHAI lab focuses on behavior recognition from audio-visual speech. We have two main areas of study:
Emotion profiles (EP) describe the emotions present in an utterance. This makeup is characterized not in black or white semantic labels (e.g, the speaker is angry), but instead through the estimation of the degree of presence or absence of multiple emotional components. These components can either be defined by conventional semantic labels (e.g., angry, happy, neutral, sad) or based on unsupervised clustering of the feature space. These representations, which we refer to as profiles, are a multi-dimensional description of the emotional makeup of an utterance.
Our work focuses on methods to estimate the natural dynamics underlying emotional speech. We study utterance-level patterns and investigate methods to identify salient local dynamics. Our recent work (best student paper at ACM MM 2014) has focused on methods to characterize the dynamics of emotional facial movement in the presence of continuous speech.
Engineering models provide an important avenue through which to develop a greater understanding of human emotion. These techniques enable quantitative analysis of current theories, illuminating features that are common to specific types of emotion perception and the patterns that exist across the emotion classes. Such computational models can inform design of automatic emotion classification systems from speech, and other forms of emotion-relevant data.
Research in emotion recognition seeks to develop insights into the temporal properties of emotion. However, automatic emotion recognition from spontaneous speech is challenging due to non-ideal recording conditions and highly ambiguous ground truth labels. Further, emotion recognition systems typically work with noisy high-dimensional data, rendering it difficult to find representative features and train an effective classifier. We tackle this problem by using Deep Belief Networks, which can model complex and non-linear high-level relationships between low-level features. We create suites of hybrid classifiers based on Hidden Markov Models and Deep Belief Networks.
The proper design of affective agents requires an understanding of human emotional perception. Such an understanding provides designers with a method through which to estimate how an affective interface may be perceived given intended feature modulations. However, human perception of naturalistic expressions is difficult to predict. This difficulty is partially due to the mismatch between the emotional cue generation (the speaker) and cue perception (the observer) processes and partially due to the presence of complex emotions, emotions that contain shades of multiple affective classes.
An understanding of the mapping between signal cue modulation and human perception can facilitate design improvements both for emotionally relevant and emotionally targeted expressions for use in human-computer and human-robot interaction. This understanding will further human-centered design, necessary for wide-spread adoption of this affective technology.
Click here for examples.
Aphasia is a common language disorder which can severely affect an individual’s ability to communicate with others. Aphasia rehabilitation requires intensive practice accompanied by appropriate feedback, the latter of which is difficult to satisfy outside of therapy. We investigate methods to forward the development of intelligent systems capable of providing automatic feedback to patients with aphasia. We have collected (and continue to collect) a speech corpus in collaboration with the University of Michigan Aphasia Program (UMAP). We investigate methods to automate transcription and methods to automatically estimate speech intelligibility based on human perceptual judgment.
Speech patterns are modulated by the emotional and neurophysiological state of the speaker. There exists a growing body of work that examines this modulation in patients suffering from depression, autism, and post-traumatic stress disorder. However, the majority of the work in this area focuses on the analysis of structured speech collected in controlled environments. Here we expand on the existing literature by examining bipolar disorder (BP). BP is characterized by mood transitions, varying from a healthy euthymic state to states characterized by mania or depression. The speech patterns associated with these mood states provide a unique opportunity to study the modulations characteristic of mood variation. We explore methodology to collect unstructured speech continuously and unobtrusively via the recording of day-to-day cellular phone conversations and to model these data to estimate the mood of individuals.