Tutorial - ICDM 2018



Recent advances in computing resources have made possible collecting enormous amounts of interconnected data, such as social media interactions, web activity, knowledge bases, product and service purchases, autonomous vehicle routing, smart home sensor data, and more. However, the massive scale and complexity of the data surpasses not only the human processing power, but also the computing power. There is an urgent need for methods and tools that summarize large interconnected data to enable faster computations, storage reduction, interactive large-scale visualization and understanding, and pattern discovery. Network summarization, which aims to find a small representation of an original, larger graph dataset, features a variety of methods with different goals and for different input data representations (e.g., attributed graphs, time-evolving or streaming graphs, heterogeneous graphs). The objective of this tutorial is to give a systematic overview of methods for summarizing and explaining graphs at different scales: the node-group level, the network level, and the multi-network level. We emphasize the current challenges, present real-world applications, and highlight the open research problems in this vibrant research area. This tutorial is partially based on the survey paper: Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. Graph Summarization Methods and Applications: A Survey. ACM Comput. Surv. 51, 3, Article 62 (June 2018). Target audience: The target audience consists of researchers and practitioners in academia and industry who want to get up to speed with the theory, fast methods and applications in the area of large-scale graph summarization. Prerequisites: Although we will provide a high-level introduction, some knowledge of linear algebra will be helpful. The emphasis will be on the intuition behind all the formal concepts, methods and tools. All the non-trivial concepts will be introduced and defined. Overview In our tutorial, we aim to give a holistic overview of techniques to explain a graph through summarization. We will do so in 3 main thrusts: Summarizing and explaining selected (groups of) nodes; Summarizing a network as a whole; and Summarizing multiple networks simultaneously. Part I. Network-level Summaries [60 min + 5 min Q&A] The second part will focus on methods for summarizing a single graph (e.g., a snapshot or aggregate network). We will start with methods that use solely the structure of a graph and continue with methods that leverage both the structure and side information, such as node and edge attributes. In both cases, we will categorize the approaches based on their key methodological ideas (e.g., group-based vs. influence-based vs. pattern-based) and present representative algorithms. Moreover, we will provide a taxonomy of the methods based on their output (e.g., supergraph vs. sparsified graph vs. compressed graph) and main objective (e.g., storage, efficiency, visualization). Methods for graph structure summarization Methods for collective summarization of structure and side information Part II. Multinetwork-level Summaries [45 min + 5 min Q&A] The third part will start with summarization of large-scale networks that evolve over time. As in Part II, we will categorize the algorithms based on their key technical ideas, their output type and main objective. We will then discuss methods that involve summarizing multiple (disparate) networks simultaneously.This part will conclude with the major open problems. Methods for dynamic graph summarization Methods for summarizing multiple disparate networks Open research problems Part III. Node-level Summaries [30 min + 5 min Q&A] In the first part we will focus on an important, though under-studied topic, namely that of summarizing and explaining a subset of nodes in a larger graph, rather than the graph as a whole. These nodes can either be given us beforehand, e.g. hand-picked or discovered by an independent algorithm, and we are asked to summarize these using the graph. Alternatively, we can also integrate the task of discovering those sets of nodes; that is, we are interested in subgroup or bump discovery. We want to discover descriptions (e.g. graph queries) that identify subsets of nodes in our graph that we can summarize well given the graph structure. Both these approaches, and especially their combination, are of particular interest in interactive systems where users want the system to explain that part of the graph that is currently to their interest. Methods for explaining a given set of nodes Methods for discovering sets of nodes that stand out, and allow for summarization Methods for interactive local graph exploration and summarization Slides [Download Introduction] [Download Part I: Network-level Summaries] [Download Part II: Multinetwork-level Summaries] [Download Part III: Node-level Summaries] For more details, check out our survey paper! Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph Summarization Methods and Applications: A Survey. ACM Comput. Surv. 51, 3, Article 62 (June 2018). (Some) Relevant References Biographical information for the presenters Danai Koutra is an Assistant Professor in Computer Science and Engineering at University of Michigan, where she leads the Graph Exploration and Mining at Scale (GEMS) Lab. Her research interests include large-scale graph mining, analysis of multi-source network data, summarization, similarity and matching, and anomaly detection. She won an ARO Young Investigator award in 2018, the 2016 ACM SIGKDD Dissertation award, and an honorable mention for the SCS Doctoral Dissertation Award (CMU). She holds one 'rate-1' patent and six (pending) patents on bipartite graph alignment; has multiple papers in top data mining conferences, including 5 award-winning papers. She is the Program Director of the SIAG on Data Mining and Analytics, an Associate Editor of ACM TKDD, a Demo co-chair for ICDM'18. She was SIGKDD Cup co-chair in '17, Ph.D. Forum co-chair for ICDM'17, publicity co-chair for SDM'17, and has co-organized 3 tutorials and 3 workshops. Jilles Vreeken leads the independent research group on Exploratory Data Analysis at the DFG cluster-of-excellence on Multimodal Computing and Interaction at Saarland University, and is a Senior Researcher at the Max Planck Institute for Informatics. He is one of the founding members of the new CISPA Helmholtz Center on Information Security, which he will shortly join as tenured Senior Faculty. His current research interests are exploratory data mining and causal inference. He has published over 80 peer-reviewed conference and journal papers (10 of which at ICDM), 3 book chapters, was awarded two best paper awards, and received the ACM SIGKDD 2010 Best Dissertation Runner-Up Award. He is panel chair of SDM 2019, was tutorial chair of SDM 2017, program co-chair of ECML PKDD 2016, publicity co-chair of IUI'15, sponsorship co-chair of ECML PKDD'14, workshop co-chair of ICDM'12, and co-organized 7 workshops and 4 tutorials. Francesco Bonchi is Research Leader at the ISI Foundation in Turin, Italy. He is also Research Director for Big Data and Data Science at Eurecat in Barcelona, Spain. Before, he was Director of Research at Yahoo Labs in Barcelona, where he was leading the Web Mining Research group. He obtained his PhD in 2003 from the University of Pisa, on the topic of pattern mining. Since then, he contributed to privacy-preserving data mining, graph mining and social network analysis, influence maximization, and algorithmic fairness. He published over 180 peer-reviewed papers at top venues, including 12 at ICDM, in these areas. He is the general co-chair of DSAA 2018, program co-chair of ECMLPKDD 2018, and was program co-chair of HT 2017, ICDM 2016, and ECMLPKDD 2010, and co-organized 4 workshops and 5 tutorials.
Copyright © Danai Koutra Last Modified on: 09/21/2018 at 1:30pm