Graphs naturally represent information ranging from links between webpages, to friendships in social networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we summarize their patterns and understand them? How can we efficiently visualize them? How can we explore multiple graphs at the same time, and understand their differences?
Find Out More About the Current ProjectsPhD, Purdue
Google, Mountain View
Amazon
Palantir
The GEMS Lab is recruiting motivated and hard-working students interested in graph mining, and large-scale data analytics!
If you are interested in joining the group, please apply to CSE at the University of Michigan, Ann Arbor. If you are an undergrad or grad student at the University of Michigan, and you are interested in any of the papers or projects listed on this page, send an email with your interests and CV to
gemslab-opportunities@umich.edu.
Recent advances in computing resources have made processing enormous amounts of data possible, but human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We focus on succinctly describing million-node graphs with just a few "important" or "interesting" structures. We are also exploring connections between summarization and clustering methods and how to summarize complex data consisting of multiple modalities (e.g., graph structure and time series of non-structural node behavior).
Yike Liu, Abhilash Dighe, Tara Safavi, Danai Koutra (2016). A Graph Summarization: A Survey. In: Arxiv.
Yike Liu, Tara Safavi, Neil Shah, Danai Koutra (2016).
Reducing Million-Node Graphs to a Few Structural Patterns: A Unified Approach. In: KDD Workshop on Mining and Learning with Graphs (MLG).
Shah, Neil et al. (2015). TimeCrunch: Interpretable Dynamic Graph Summarization. In: KDD.
Liu, Yike, Neil Shah, and Danai Koutra (2015).
An Empirical Comparison of the Summarization Power of Graph Clustering Methods. In: NIPS Networks in the Social and Information Sciences Workshop.
Yike Liu
Di Jin
Tara Safavi
Neil Shah (CMU)
Past Contributors: Abhilash Dighe, Santosh Mohan
Zhiyuan He (anomaly detection)
Di Jin
Past Contributors: Paige Frederick, George Jiang (UG-summer intern), Xiaochen Yu, David Zhang (UG)
The initial steps of data exploration often include visual and statistical analysis, but the size of the data makes exploration time-consuming and often unrealistic, requiring scalable tools and methods. In many scientific domains the deluge of the collected data often leads to semi-arbitrary selection of a small set of its aspects to explore. We focus on two main directions: (i) designing new, efficient exploratory methods for reducing the dimensionality of graph data by leveraging domain knowledge and expectations; and (ii) developing a visualization-based platform that allows users to interact with their graph data and guides them to explore outlying observations. Another direction of interest is crowd-assisted graph analytics.
Di Jin, Christos Faloutsos, Danai Koutra, Ticha Sethapakdi (2016).
PERSEUS3: Visualizing and Interactively Mining Large-Scale Graphs. In: KDD (MLG).
Danai Koutra, Di Jin, Yuanchi Ning, Christos Faloutsos (2015). Perseus: An Interactive Large-Scale Graph Mining and Visualization Tool. In: VLDB (demo).
Sai Gouravajhala, Danai Koutra, Walter S. Lasecki. (2016).
Towards Crowd-Assisted Data Mining. In: CHI (HCML).
U Kang, Jay-Yoon Lee, Danai Koutra, Christos Faloutsos (2014). Net-Ray: Visualizing and Mining Web-Scale Graphs. In: PAKDD; Tainan, Taiwan.
Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments. Among other directions, we explore the trade-off between computational complexity and accuracy, and extensions of the methods to time-evolving networks.
Koutra, Danai et al. (2011).
Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. In: ECML PKDD.
Wolfgang Gatterbauer, Stephan Guennemann, Danai Koutra, Christos Faloutsos (2015).
Linearized and Single-Pass Belief Propagation. In: VLDB.
Di Jin
Yujun Yan
Past Contributors: Charles Wang
Paige Frederick
Di Jin
Lisa Jin
Tara Safavi
Farah Tahmasbi
Junhao Wang
Advances in magnetic resonance imaging and diffusion tensor imaging, which track neural oscillations and white matter tractography in the brain respectively, have led to the generation of brain maps that describe the brain's network organization. The resulting networks (brain maps or connectomes) consist of hundreds of thousands or even millions of connections. Abnormalities in the connectomes reveal disorders (e.g., depression, Alzheimer's disease). Thus, the detection of abnormalities is a promising direction for defining objective measures for diagnosis and treatment in psychiatric neuroscience, which lacks such measures. We focus on scalable methods for summarizing abrupt transitions in connectivity profiles at micro-scales (minutes or even seconds), which are thought to correspond to configurations of psychological variables involved in cognition, attention, memory, and mood. We are also exploring novel, efficient and robust methods for inferring networks from multiple time series (e.g., fMRI signals).
Danai Koutra, Neil Shah, Joshua T. Vogelstein, Brian Gallagher, Christos Faloutsos (2016).
DeltaCon: A Principled Massive-Graph Similarity Function with Attribution. In: TKDD.
Danai Koutra, Joshua Vogelstein, Christos Faloutsos (2013). DeltaCon: A Principled Massive-Graph Similarity Function. In: SDM.
Danai Koutra, Yu Gong, Sephira Ryman, Rex Jung, Joshua Vogelstein, Christos Faloutsos. (2013).
Are all brains wired equally?. In: OHBM.
Graph similarity or network alignment are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. Graph similarity methods quantify changes in networks (e.g., network traffic in computer network that might signify an attack) or differences between graphs (e.g., differences between connectomes of healthy and non-healthy subjects). Graph alignment aims to find the correspondence bewteen the nodes of two graphs. Our current focus is on scalable methods for aligning multiple networks with side information, such as node and edge attributes.
Danai Koutra, Neil Shah, Joshua T. Vogelstein, Brian Gallagher, Christos Faloutsos (2016).
DeltaCon: A Principled Massive-Graph Similarity Function with Attribution. In: TKDD.
Danai Koutra, Joshua Vogelstein, Christos Faloutsos (2013). DeltaCon: A Principled Massive-Graph Similarity Function. In: SDM.
Danai Koutra, Hanghang Tong, David Lubensky. (2013).
BIG-ALIGN: Fast Bipartite Graph Alignment. In: ICDM.
Michele Berlingerio, Danai Koutra, Tina Elliasi-Rad, Christos Faloutsos (2013). Network Similarity via Multiple Social Theories. In: ASONAM.
Kuan-Yu Chen
Wei Lee
Shengjie Pan
Maryam Davoodi
Jiajian Zhou
Past Contributors: Shuo Chen (UG-summer intern), Chengkai Hu (UG-summer intern), Jiabin Zhu (UG-summer intern), Abhilash Dighe, Fan Yang
The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). We have focused on several directions, such as (i) modeling user preferences with the goal of designing new, successful services or products, and (ii) understanding information consumption for controversial topics, such as gun control. Currently we are interested in scalable methods for exploring changes in employment and predicting job transitions, by aggregating individual career paths in a large transition graph.
Danai Koutra, Abhilash Dighe, Smriti Bhagat, Udi Weinsberg, Stratis Ioannidis, Christos Faloutsos and Jean Bolot (2016). PNP: Fast Path Ensemble Method for Movie Design. In: Arxiv.
Danai Koutra, Paul N. Bennett, Eric Horvitz (2015). Events and Controversies: Influences of a Shocking News Event on Information Seeking. In: WWW; Florence, Italy [14.1% acceptance].
Pravallika Devineni, Danai Koutra, Michalis Faloutsos, Christos Faloutsos (2015). If walls could talk: Patterns and anomalies in Facebook wallposts. In: IEEE/ACM (ASONAM).