Workshop Chairs

If you have any questions, please contact us!

Talk preparation

Most attendees will give a talk on their current research. Faculty, Postdocs, and PhD students will give 10 minute talks. Masters and Undergraduate students will give 2 minute lightning talks. All talks will be timed, with an iPad displaying your time remaining.

We will be using a single laptop on the day for all talks to avoid losing time between speakers. The computer will be a Macbook Pro with macOS 10.12, Keynote 7.0.5, and Microsoft Office 2016 for Mac. Please set the filename of your presentation to start with your name, e.g. Jane_Doe.key and provide us with BOTH the presentation file and a version that has been exported to PDF. We have prepared several talk templates:

Lightning talk specific guidelines

All lightning talks will be presented as PDFs and contain a title slide and either 1 or 2 content slides.

This will allow us to combine them all into one long presentation, saving time on the day. The advantage of PDF is that you can use any slide creation program you want, but the downside is that you cannot have complex animation. We recommend the three slides be: a title slide, a project description slide, and a results slide. If simple animations cause these three slides to become more than three when exported to PDF that's fine.


Start Time Event Speaker Title
13:10 Welcome
13:20 Faculty talks Dragomir Radev Rada Mihalcea Steve Abney NLP Projects in CLAIR
I will briefly introduce some of the current research projects in the CLAIR (Computational Linguistics And Information Retrieval) lab such as NLP for collective discourse, graph-based NLP, signed network analysis, sentence similarity clustering, NLP for scientometrics, and dialogue systems for student advising. I will also talk about some current educational activities such as the NACLO competition, the Intro to NLP Coursera MOOC, and the All About NLP web site.
Research in the LIT Lab
I will briefly describe the current research projects in the Language and Information Technologies lab, mostly falling under one the three main research thrusts in the group: Computational Social Science, Multimodal Behavior Analysis, and Conversational Technologies.
13:55 PhD talks Shibamouli Lahiri MeiXing Dong Aparna Garimella Catherine Finegan-Dollak Laura Wendlandt Identifying Usage Expression Sentences in Product Reviews
Usage of a product is intimately related to the psychology of a consumer, and it has been observed in Psychology literature that a consumer's self-image correlates with product use. We created a human-annotated gold standard dataset of 565 Amazon reviews spanning five distinct product categories. Our dataset consists of more than 3000 annotated sentences. We introduce a system that classifies sentences according to "usage'' or otherwise. After extensive feature tuning, we came up with a set of simple, robust, scalable, and language-independent features that beat other more complex and language-sensitive features, as well as a hand-crafted decision list baseline by a wide margin. We showed the efficacy of our approach using importance ranking of features.
Connecting Alumni and Initiatives
Alumni donations are an important source of funding for causes and initiatives throughout campus. However, alumni may not be motivated to contribute when they receive too many solicitations and campus groups do not know who are the most likely to contribute. We explore the factors that correlate with alumni giving to help connect alumni and causes.
Identifying cultural differences in word usage
Personal writings have inspired researchers in the fields of linguistics and psychology to study the relationship between language and culture to better understand the psychology of people across different cultures. In this paper, we explore this relation by developing cross-cultural word models to identify words with cultural bias – i.e., words that are used in significantly different ways by speakers from different cultures. Focusing specifically on two cultures: United States and Australia, we identify a set of words with significant usage difference, and further investigate these words through feature analysis and topic modeling, shedding light on the attributes of language that contribute to these differences.
Seq2Seq Semantic Parsing for Natural Language to SQL
A dialog system for student advising can store helpful information in a relational database. How should we translate English questions into SQL queries to access that information? We propose building upon a sequence to sequence semantic parser by adding attention over the database schema.
Inferring User Attributes from Images
In this talk, I will discuss the relationship between image attributes and other latent user dimensions, including personality and gender. From images, we extract a wealth of attributes, including which objects are in the image, how many people are present, what colors are the strongest, and the scene of the image. We then use correlation techniques to analyze the relationship between these attributes and user dimensions, concluding that images do indeed provide us substantial information about the people who post them.
14:50 Tea Break
15:20 Faculty talks Berrin Yanikoglu Vinod Vydiswaran Domain Adaptation of Sentiment Polarities
I will cover two separate previous work: 1) on adapting a general purpose sentiment lexicon to a specific domain 2) exploting sentence-level features in document-level sentiment analysis
Natural Language Processing for Health-related Insights
There is a growing interest and need for deeper natural language processing over health related corpora -- from clinical notes, biomedical research articles, online health resources, to patient-authored messages and peer-to-peer communication. I will briefly motivate the need and introduce some of the recent research projects being done by my research team in health-related text mining and natural language processing over the last couple of years.
15:45 Postdoctoral talk Veronica Perez-Rosas Modeling counselor behavior in psychotherapy interventions through language and speech analysis
Behavioral interventions are a promising approach to address public health issues such as smoking cessation, increasing physical activity and reducing substance abuse. This talk presents an overview of our project on modeling counselor behavior in health care interventions. I will focus in our recent work on the analysis of counselor empathy. The proposed approach includes the use of linguistic and acoustic features as well as measures for verbal and linguistic accommodation during counselor and client interactions.
16:00 PhD talks Charles Welch Cheng Li Steven Wilson AMR Semantic Parsing with World Knowledge
Semantic parsers have recently been used to generate graph representations from sentences using abstract meaning representation (AMR). One of the highest performing systems over the past few years, CAMR, uses a transition based method to convert a dependency parse into an AMR graph. Our work focuses on improving this system in two ways. First, we have changed the learning method from an averaged perceptron to a logistic regression model trained with a max-margin objective function which has been shown to improve performance in previous parsing work. Second, we intend to incorporate features in transition classification that contain world knowledge. This entails the generation of these features from various corpora and a modification to the model to allow it to take advantage of continuous valued features.
Deep Memory Networks for Attitude Identification
We consider the task of identifying attitude towards a given set of entities from text. Conventionally, this task is decomposed into two separate subtasks: target detection that identifies whether each entity is mentioned either explicitly or implicitly in the text, and polarity classification that classifies the exact sentiment towards an identified entity (the target) into either positive, negative, or neutral Instead, we show that attitude identification can be solved with an end-to-end machine learning architecture, with the two subtasks interleaved by a deep memory network. In this way, signals produced in target detection provide clues to polarity classification, and reversely, polarity provides feedback to the first subtask. Moreover, the treatments of the set of targets also influence each other -- the learned representations may share the same semantics for many targets but vary for some targets. The proposed deep memory network outperforms models that do not consider the interactions between the subtasks or among the targets, including conventional methods and the state-of-the-art deep learning models.
Cultural Influences on the Measurement of Personal Values through Words
The ever-growing collection of publicly accessible web data continually provides new and exciting opportunities to study how people are thinking, behaving, and feeling. We can now study psychological traits and their links to behaviors on a larger scale than ever before through the analysis of social media data. However, it is important to consider that when collecting samples from diverse people groups, culture may have a strong influence on the results and their interpretation. Cultural groups differ not only in the way that they use language, but also in their personal values and everyday behaviors. We use a computational content analytic approach to examine the connections between people’s core values and everyday behaviors with the aim of exploring the role a person’s cultural background plays in these value-behavior relationships. Our results show that a topic modeling approach can be used to investigate the role that geographic location plays in value-behavior relationships, and how the cultural background of writers can affect the conclusions being made in computational sociolinguistic studies.
16:35 Undergraduate talks Marissa Inga Joseph Peper Sesh Sadasivam Thomas Searle Yulin Xie Harry Zhang Noriyuki Kojima Mass / Count Problem
This presentation addresses the problems a program has regarding count and mass nouns and some of the methods already used to fix this issue. The human annotation statistics are also addressed.
Exploring the Ubuntu IRC Dataset – Modelling Dialogue
The Ubuntu IRC chat log is a massive dataset consisting of conversations between users of the Ubuntu Linux distribution. The goal of the project is to use supervised learning to model dialogues within this dataset and to generate responses to previous dialogue utterances.
SQL Equivalence
While translating natural language queries to SQL (a part of the IBM Project Sapphire), a method to prove the equivalence of two SQL queries was required to evaluate that the machine-generated SQL queries were equivalent to the correct SQL queries in our training and testing sets of data. Although it may be possible to translate SQL queries into Relational Algebra and then prove its equivalence, it is much more straightforward to simply execute the queries and compare their results. The latter approach is not scalable, but is fast enough for the purposes of this project.
Image Funny Quotes
Image Funny Quotes is a course project for EECS498 taught by Professor Rada Mihalcea. Three other teammates and I work together to explore how to classify humor for image-quote pairs. We collect and annotate image-quote pairs and experiment with different features and classification methods. We look forward to presenting a working system by the end of the semester that could select image for user-provided quote to create funny effect.
ACL Anthology Network (AAN)
The ACL Anthology Network (AAN) is a well organized corpus consisting of decades of ACL papers from conferences, journals, and workshops. Additionally, AAN is being expanded to include learning materials and a hierarchy of topics to expedite learning in NLP and related subtopics. Our work ranges from creating meta-lists for new papers to fixing bugs as we update the website.
(combined with previous talk)
16:55 Group Photo
17:00 Tea Break
17:30 Postdoctoral talks Jonathan Kummerfeld Tanmay Basu Mohamed Abouelenien Crowd Paraphrasing
Multi-Formalism Parsing
This talk covers two projects. First, data collection can be expensive and difficult, particularly when experts are required. We are exploring a way to expand a dataset using non-experts to paraphrase an initial set of data. Second, I will describe work on extending my graph parsing algorithm beyond constituency parsing to cover AMR, which involves novel approaches to learning without an alignment between the sentence and the abstract structure.
Identifying Severity in Neuropsychiatric Clinical Notes
In this work, we studied the problem of classifying symptom severity from neuropsychiatric clinical records using supervised machine learning approaches. The data is released as part of the Neuropsychiatric Genome-Scale NLP challenge 2016 organized by i2b2 (Informatics for Integrating Biology and the Bedside), a NIH-funded national center at Partners Health Care System. The participants of the shared task are provided with the raw clinical notes from the initial psychiatric evaluation of patients. The patient records in the training data are categorized into four classes, viz., absent, mild, moderate, and severe. As part of our investigation, we explored features derived from the template-based text in the clinical notes, a medical thesauri such as UMLS, and corpus-based weighting in the training set. A Random Forest classifier is trained on this feature sets to classify the test samples. The experimental results suggest that the proposed framework achieves high accuracy.
Multimodal Analysis of Human Behavior
Human behavior is a complex matter and difficult to model. In this talk, I will describe our research in this area for a variety of applications such as gender-based deception detection, stress detection, multimodal sensing of thermal discomfort, students’ behavior, and driver’s alertness. We follow a multimodal approach in order to integrate discriminative features that are more capable of indicating the corresponding behavior. I will briefly describe the data collection process as well as some interesting observations and results.
18:05 PhD talks Nikita Bhutani Mahmoud Azab Luke Brandl Rui Zhang Domain-independent framework for extracting, representing and querying knowledge from text
Traditional information retrieval and query mechanisms are insufficient to meet today’s complex information needs and unlock the value in the unprecedented volume of text data on the web today. We need semantic technologies capable of identifying and organizing knowledge encoded in the text to support these complex needs. We need to develop structured representations that go beyond current efforts to curate knowledge about entities and relations. We also need effective and expressive query mechanisms that provide declarative access to this knowledge.
Structured Matching for Phrase Localization
We introduce a new approach to phrase localization: grounding phrases in sentences to image regions. We propose a structured matching of phrases and regions that encourages the semantic relations between phrases to agree with the visual relations between regions. We integrate structured matching with neural networks to enable end-to-end training. We evaluate the model on Flickr30K Entities dataset.
The ACL Anthology Network: A Corpus and Learning Tool
The ACL Anthology Network (AAN) is a well organized corpus consisting of decades of ACL papers from conferences, journals, and workshops. Additionally, AAN is being expanded to include learning materials and a hierarchy of topics to expedite learning in NLP and related subtopics.
Interleaving Speaker Thoughts via RNNs in Multi-Turn Dialogs
We focus on the task of response selection for the retrieval-based conversation agents by modeling unstructured, multi-turn, two-party dialogs. Current response selection approaches are limited to small contexts and view the multi-turn context as a long sequence of words. Motivated by this observation, we propose Interleaving-Thoughts Recurrent Neural Network (IT-RNN). Our model constructs representations for individual utterances via low-level networks, and then high-level layers produces intrinsic dynamics of conversations via intention flows and information exchanges. This enables neural networks to deal with arbitrary lengths of conversation history and catch discourse information and utterance dependency among two speakers. We build IT-RNN on top of vanilla RNN, GRU, and LSTM units. We evaluate our model on two public available multi-turn dialog corpora. Experimental results show that our system significantly outperforms traditional Information Retrieval methods and other neural network baselines.
18:50 Masters talks Karthik Ramanathan Xinyan Zhao A template-based system for converting Natural Language Queries to MySQL queries
IBM Project Sapphire is a joint project project between IBM and the University of Michigan to build an automated course advising system that provides human-like advising to students. The template-based system for converting NL queries to MySQL queries is being built as part of IBM Project Sapphire, for scenarios where the student asks questions whose the answers can be looked up from a database. For example, a question like "What courses are being offered next semester?" can be looked up from a database and the answers given to the student. The system was originally was built by the research group of Jagadish H.V. In this project, we are working on adapting the system for the course advising domain and adding statistical methods to improve the performance of the system in this domain.
Hybrid Clinical Named Entity Recognition System
We are developing a clinical hybrid NER system that identifies Protected Health Information(PHI) with four components: preprocessing component, regular expression component, CRF component, and a combiner.
18:55 Dinner