Recent advances in computing resources have made possible collecting enormous amounts of interconnected data, such as social media interactions, web activity, knowledge bases, product and service purchases, autonomous vehicle routing, smart home sensor data, and more. However, the massive scale and complexity of the data surpasses not only the human processing power, but also the computing power. There is an urgent need for methods and tools that summarize large interconnected data to enable faster computations, storage reduction, interactive large-scale visualization and understanding, and pattern discovery.
Network summarization, which aims to find a small representation of an original, larger graph dataset, features a variety of methods with different goals and for different input data representations (e.g., attributed graphs, time-evolving or streaming graphs, heterogeneous graphs). The objective of this tutorial is to give a systematic overview of methods for summarizing and explaining graphs at different scales: the node-group level, the network level, and the multi-network level. We emphasize the current challenges, present real-world applications, and highlight the open research problems in this vibrant research area.
This tutorial is partially based on the survey paper:
Target audience: The target audience consists of researchers and practitioners in academia and industry who want to get up to speed with the theory, fast methods and applications in the area of large-scale graph summarization.
Prerequisites: Although we will provide a high-level introduction, some knowledge of linear algebra will be helpful. The emphasis will be on the intuition behind all the formal concepts, methods and tools. All the non-trivial concepts will be introduced and defined.
In our tutorial, we aim to give a holistic overview of techniques to explain a graph through summarization. We will do so in 3 main thrusts:
[Download Part I: Network-level Summaries]
[Download Part II: Multinetwork-level Summaries]
[Download Part III: Node-level Summaries]
For more details, check out our survey paper!
(Some) Relevant References
Biographical information for the presenters
Danai Koutra is an Assistant Professor in Computer Science and Engineering at University of Michigan, where she leads the Graph Exploration and Mining at Scale (GEMS) Lab. Her research interests include large-scale graph mining, analysis of multi-source network data, summarization, similarity and matching, and anomaly detection. She won an ARO Young Investigator award in 2018, the 2016 ACM SIGKDD Dissertation award, and an honorable mention for the SCS Doctoral Dissertation Award (CMU). She holds one 'rate-1' patent and six (pending) patents on bipartite graph alignment; has multiple papers in top data mining conferences, including 5 award-winning papers. She is the Program Director of the SIAG on Data Mining and Analytics, an Associate Editor of ACM TKDD, a Demo co-chair for ICDM'18. She was SIGKDD Cup co-chair in '17, Ph.D. Forum co-chair for ICDM'17, publicity co-chair for SDM'17, and has co-organized 3 tutorials and 3 workshops.
Jilles Vreeken leads the independent research group on Exploratory Data Analysis at the DFG cluster-of-excellence on Multimodal Computing and Interaction at Saarland University, and is a Senior Researcher at the Max Planck Institute for Informatics. He is one of the founding members of the new CISPA Helmholtz Center on Information Security, which he will shortly join as tenured Senior Faculty. His current research interests are exploratory data mining and causal inference. He has published over 80 peer-reviewed conference and journal papers (10 of which at ICDM), 3 book chapters, was awarded two best paper awards, and received the ACM SIGKDD 2010 Best Dissertation Runner-Up Award. He is panel chair of SDM 2019, was tutorial chair of SDM 2017, program co-chair of ECML PKDD 2016, publicity co-chair of IUI'15, sponsorship co-chair of ECML PKDD'14, workshop co-chair of ICDM'12, and co-organized 7 workshops and 4 tutorials.
Francesco Bonchi is Research Leader at the ISI Foundation in Turin, Italy. He is also Research Director for Big Data and Data Science at Eurecat in Barcelona, Spain. Before, he was Director of Research at Yahoo Labs in Barcelona, where he was leading the Web Mining Research group. He obtained his PhD in 2003 from the University of Pisa, on the topic of pattern mining. Since then, he contributed to privacy-preserving data mining, graph mining and social network analysis, influence maximization, and algorithmic fairness. He published over 180 peer-reviewed papers at top venues, including 12 at ICDM, in these areas. He is the general co-chair of DSAA 2018, program co-chair of ECMLPKDD 2018, and was program co-chair of HT 2017, ICDM 2016, and ECMLPKDD 2010, and co-organized 4 workshops and 5 tutorials.