GEMSlab Graph Exploration & Mining at Scale

Graphs naturally represent information ranging from links between webpages, to friendships in social networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we summarize their patterns and understand them? How can we efficiently visualize them? How can we explore multiple graphs at the same time, and understand their differences?

Find Out More About the Current Projects

Faculty


Danai Koutra

Danai Koutra

Assistant Professor

Skylo

Skylo

Affiliate Miner

Graduate Students


Haoming Shen

Haoming Shen (MS)

Undergraduate Students


Aristotelis Leventidis

Aristotelis Leventidis

Harry Wang

Harry Wang


Natalia Jenuwine

Natalia Jenuwine

Past Members


Zhaojun Chen

Zhaojun Chen (MS)

Maryam Davoodi

Maryam Davoodi

PhD, Purdue

Abhilash Dighe

Abhilash Dighe (MS)

Google, Mountain View

Zhiyuan He

Zhiyuan He

Brigid Johnson

Brigid Johnson (UROP)

Lisa Jin

Lisa Jin

PhD, Rochester

Yike Liu

Yike Liu (PhD)

Santosh Mohan

Santosh Mohan (BS)

Amazon

Chalse Okorom

Chalse Okorom (UROP)

Charles Wang

Charles Wang

Palantir

Fan Yang

Fan Yang

Xiaochen Yu

Xiaochen Yu

David Zhang

David Zhang (BS)

Kuan-Yu Chen

Kuan-Yu Chen (MS)

Yaohua Shi

Yaohua Shi (MS)

Johnson Cai

Johnson Cai

Paige Frederick

Paige Frederick

Il Jae

Il Jae Lee

Shengjie Pan

Shengjie Pan

Cole Hudson

Cole Hudson (HS)


The GEMS Lab is recruiting motivated and hard-working students interested in graph mining, and large-scale data analytics!

If you are interested in joining the group, please apply to CSE at the University of Michigan, Ann Arbor. If you are an undergrad or grad student at the University of Michigan, and you are interested in any of the papers or projects listed on this page, send an email with your interests and CV to
gemslab-opportunities@umich.edu.


1. Scalable Graph Summarization
for understanding and visualization


Recent advances in computing resources have made processing enormous amounts of data possible, but human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We focus on succinctly describing million-node graphs with just a few "important" or "interesting" structures. We are also exploring connections between summarization and clustering methods and how to summarize complex data consisting of multiple modalities (e.g., graph structure and time series of non-structural node behavior).

pict Yike Liu, Abhilash Dighe, Tara Safavi, Danai Koutra (2016). A Graph Summarization: A Survey. In: Arxiv.

pict Yike Liu, Tara Safavi, Neil Shah, Danai Koutra (2016). Reducing Million-Node Graphs to a Few Structural Patterns: A Unified Approach. In: KDD Workshop on Mining and Learning with Graphs (MLG).

Demo

pict Shah, Neil et al. (2015). TimeCrunch: Interpretable Dynamic Graph Summarization. In: KDD.

Code

pict Liu, Yike, Neil Shah, and Danai Koutra (2015). An Empirical Comparison of the Summarization Power of Graph Clustering Methods. In: NIPS Networks in the Social and Information Sciences Workshop.

Show more


condense

Students

  • Yike Liu

  • Di Jin

  • Tara Safavi

  • Neil Shah (CMU)

  • Past Contributors: Abhilash Dighe, Santosh Mohan



perseus_overview

Students

  • Zhiyuan He (anomaly detection)

  • Di Jin

  • Past Contributors: Paige Frederick, George Jiang (UG-summer intern), Xiaochen Yu, David Zhang (UG)

2. Scalable interactive analytics
for graphs

The initial steps of data exploration often include visual and statistical analysis, but the size of the data makes exploration time-consuming and often unrealistic, requiring scalable tools and methods. In many scientific domains the deluge of the collected data often leads to semi-arbitrary selection of a small set of its aspects to explore. We focus on two main directions: (i) designing new, efficient exploratory methods for reducing the dimensionality of graph data by leveraging domain knowledge and expectations; and (ii) developing a visualization-based platform that allows users to interact with their graph data and guides them to explore outlying observations. Another direction of interest is crowd-assisted graph analytics.

pict Di Jin, Christos Faloutsos, Danai Koutra, Ticha Sethapakdi (2016). PERSEUS3: Visualizing and Interactively Mining Large-Scale Graphs. In: KDD (MLG).

pict Danai Koutra, Di Jin, Yuanchi Ning, Christos Faloutsos (2015). Perseus: An Interactive Large-Scale Graph Mining and Visualization Tool. In: VLDB (demo).

pict Sai Gouravajhala, Danai Koutra, Walter S. Lasecki. (2016). Towards Crowd-Assisted Data Mining. In: CHI (HCML).

pict U Kang, Jay-Yoon Lee, Danai Koutra, Christos Faloutsos (2014). Net-Ray: Visualizing and Mining Web-Scale Graphs. In: PAKDD; Tainan, Taiwan.


Show more

3. Distributed graph methods

Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments. Among other directions, we explore the trade-off between computational complexity and accuracy, and extensions of the methods to time-evolving networks.

pict Koutra, Danai et al. (2011). Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. In: ECML PKDD.

pict Wolfgang Gatterbauer, Stephan Guennemann, Danai Koutra, Christos Faloutsos (2015). Linearized and Single-Pass Belief Propagation. In: VLDB.

Code
distributed computations

Students

  • Di Jin

  • Yujun Yan

  • Past Contributors: Charles Wang



brain_graphs

Students

  • Paige Frederick

  • Di Jin

  • Lisa Jin

  • Tara Safavi

  • Farah Tahmasbi

  • Junhao Wang


4. Mining large-scale brain networks
to understand how the brain works

Advances in magnetic resonance imaging and diffusion tensor imaging, which track neural oscillations and white matter tractography in the brain respectively, have led to the generation of brain maps that describe the brain's network organization. The resulting networks (brain maps or connectomes) consist of hundreds of thousands or even millions of connections. Abnormalities in the connectomes reveal disorders (e.g., depression, Alzheimer's disease). Thus, the detection of abnormalities is a promising direction for defining objective measures for diagnosis and treatment in psychiatric neuroscience, which lacks such measures. We focus on scalable methods for summarizing abrupt transitions in connectivity profiles at micro-scales (minutes or even seconds), which are thought to correspond to configurations of psychological variables involved in cognition, attention, memory, and mood. We are also exploring novel, efficient and robust methods for inferring networks from multiple time series (e.g., fMRI signals).

pict Danai Koutra, Neil Shah, Joshua T. Vogelstein, Brian Gallagher, Christos Faloutsos (2016). DeltaCon: A Principled Massive-Graph Similarity Function with Attribution. In: TKDD.

Code

pict Danai Koutra, Joshua Vogelstein, Christos Faloutsos (2013). DeltaCon: A Principled Massive-Graph Similarity Function. In: SDM.

Code

pict Danai Koutra, Yu Gong, Sephira Ryman, Rex Jung, Joshua Vogelstein, Christos Faloutsos. (2013). Are all brains wired equally?. In: OHBM.

5. Graph similarity and alignment
in networked data

Graph similarity or network alignment are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. Graph similarity methods quantify changes in networks (e.g., network traffic in computer network that might signify an attack) or differences between graphs (e.g., differences between connectomes of healthy and non-healthy subjects). Graph alignment aims to find the correspondence bewteen the nodes of two graphs. Our current focus is on scalable methods for aligning multiple networks with side information, such as node and edge attributes.

pict Danai Koutra, Neil Shah, Joshua T. Vogelstein, Brian Gallagher, Christos Faloutsos (2016). DeltaCon: A Principled Massive-Graph Similarity Function with Attribution. In: TKDD.

Code

pict Danai Koutra, Joshua Vogelstein, Christos Faloutsos (2013). DeltaCon: A Principled Massive-Graph Similarity Function. In: SDM.

Code

pict Danai Koutra, Hanghang Tong, David Lubensky. (2013). BIG-ALIGN: Fast Bipartite Graph Alignment. In: ICDM.

pict Michele Berlingerio, Danai Koutra, Tina Elliasi-Rad, Christos Faloutsos (2013). Network Similarity via Multiple Social Theories. In: ASONAM.



enron_similarity

Students

  • Kuan-Yu Chen

  • Wei Lee

  • Shengjie Pan

tripartite

Students

  • Maryam Davoodi

  • Jiajian Zhou

  • Past Contributors: Shuo Chen (UG-summer intern), Chengkai Hu (UG-summer intern), Jiabin Zhu (UG-summer intern), Abhilash Dighe, Fan Yang

6. User Modeling

The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). We have focused on several directions, such as (i) modeling user preferences with the goal of designing new, successful services or products, and (ii) understanding information consumption for controversial topics, such as gun control. Currently we are interested in scalable methods for exploring changes in employment and predicting job transitions, by aggregating individual career paths in a large transition graph.

pict Danai Koutra, Abhilash Dighe, Smriti Bhagat, Udi Weinsberg, Stratis Ioannidis, Christos Faloutsos and Jean Bolot (2016). PNP: Fast Path Ensemble Method for Movie Design. In: Arxiv.

pict Danai Koutra, Paul N. Bennett, Eric Horvitz (2015). Events and Controversies: Influences of a Shocking News Event on Information Seeking. In: WWW; Florence, Italy [14.1% acceptance].

pict Pravallika Devineni, Danai Koutra, Michalis Faloutsos, Christos Faloutsos (2015). If walls could talk: Patterns and anomalies in Facebook wallposts. In: IEEE/ACM (ASONAM).

List of publications