ICDM 2014 Tutorial
Node and graph similarity: Theory and Applications

Danai Koutra, Tina Eliassi-Rad, and Christos Faloutsos

Abstract

Given a graph how can we find similarities among its nodes? Given a sequence of graphs, can we spot similarities or differences between them and detect temporal anomalies? When the node correspondence between two graphs is unknown, how can we match the nodes in order to reveal similarities between them? In a microscopic level, the nodes can be similar in terms of their proximity or their structural features and roles. In a macroscopic level, two graphs are similar if the structures within them are similar, or equivalently their nodes are connected in similar ways. How can we find the similarity between two graphs when the node correspondence is known or unknown? The objective of this tutorial is to provide a concise and intuitive overview of the most important concepts and tools, which can detect affinities between nodes of a static graph, as well as similarities between different snapshots of dynamic graphs or distinct networks.


Topic Review

We review the state of the art in four related fields: (a) role discovery, (b) node proximity, (c) graph similarity with known node mapping, and (d) graph similarity with unknown node mapping. The emphasis of this tutorial is to give the intuition behind these powerful mathematical concepts and tools, as well as case studies that illustrate their practical use in data mining.


Motivation

We review methods that calculate similarities at both the node- and graph-levels. Computing similarity between two objects is a fundamental data mining problem and has numerous applications that interest the data mining community, including role extraction in static and dynamic networks, community detection, static and temporal anomaly detection, label propagation, classification and clustering, link prediction, protein-protein alignment, de-anonymization and more. The tutorial is based on a variety of papers published in top tier data mining and machine learning conferences including KDD, ICDM, SDM, PKDD, CIKM, WWW, WSDM, as well as top journals such as TKDD, TPAMI and TKDE.


Slides

Part 1:         Node Roles                                  Node Proximity
Part 2:         Graph Similarity (Known corr.)     Graph Similarity (Unknown corr.)


Outline


Presenters' bios

Danai KoutraDanai Koutra is a final-year Ph.D. candidate at the Computer Science Department at Carnegie Mellon University. Her research interests include large-scale graph mining, graph similarity and matching, graph summarization, and anomaly detection. Danai's research has been applied mainly to social, collaboration and web networks, as well as brain connectivity graphs. She holds 1 "rate-1" patent and has 6 (pending) patents on bipartite graph alignment. Danai has multiple papers in top data mining conferences, including 2 award-winning papers, and her work was covered by popular press, such as MIT Technology Review.


Tina Eliassi-Rad Tina Eliassi-Rad is an Associate Professor of Computer Science at Rutgers University. She earned her Ph.D. in Computer Science (with a minor in Mathematical Statistics) at the University of Wisconsin- Madison. Within data mining and machine learning, Tina's research has been applied to the World- Wide Web, text corpora, large-scale scientific simulation data, complex networks, and cyber situational awareness. She has published over 60 peer-reviewed papers (including a best paper runner-up award at ICDM 2009 and a best interdisciplinary paper award at CIKM 2012); and has given over 100 invited presentations. In 2010, she received an Outstanding Mentor Award from the US DOE Office of Science.


Christos Faloutsos Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Research Contributions Award in ICDM 2006, the Innovations award in KDD'10, 18 "best paper" awards, and several teaching awards. He has given over 30 tutorials and over 10 invited distinguished lectures. His research interests include data mining for graphs and streams, fractals, and database performance.