databases

EECS 598: Special Topics, Fall 2016
Mining Large-scale Graph Data

wiki-graph map-graph friends-graph brain-graph

Graphs naturally represent information ranging from links between webpages to friendships in social networks, to connections between neurons in our brains. These graphs often span millions or even billions of nodes and interactions between them. Within this deluge of interconnected data, how can we extract useful knowledge, understand the underlying processes, and make interesting discoveries?

This course will cover recent methods and algorithms for exploring and analyzing large-scale networks, as well as applications in various domains (e.g., web, social science, computer networks, neuroscience). The focus will be on scalable and practical methods, and the students will have the chance to analyze large datasets. The topics that we will cover include: ranking, classification, clustering and community detection, summarization, similarity, anomaly detection, node representation and deep learning in the graph setting.

Objectives

This course aims to introduce students to graph mining. Students will become familiar with the challenges of processing large amounts of data,state-of-the-art methods and algorithms for analyzing graphs, and applications of graph mining in various domains. We expect that by the end of the course, students:

  • will have a thorough understanding of the graph mining foundations, and
  • will be able to:
    • critique graph mining methods,
    • formulate and solve graph-related problems, and
    • analyze large-scale datasets (in distributed and other settings).

Prerequisites

Students are expected to (1) have basic knowledge of linear algebra, (2) be familiar with probability theory and statistics, and (3) have good programming skills (e.g., Python, JAVA, C, Matlab, R, or any programming language of their preference). Basic knowledge of machine learning is helpful.

** Advanced-standing undergraduates or other students who do not meet the prerequisites may enroll with permission of the instructor.


Instructor: Danai Koutra
Office Hours: by appointment

Teaching Assistant: Yike Liu
Office Hours: Monday 2-3pm @ BBB 4957

Lectures & Discussion:
When? Tue/Thu 1:30-3:30pm
Where? BBB 1690

Email: eecs598mining-f16@umich.edu

Schedule (tentative)

!! The topics and dates of the lectures are subject to change. The following schedule outlines the topics that we will be covering in this course.



Readings Per Topic

Static graphs: laws and patterns

Dynamic graphs: laws and patterns

Random Walks, Pagerank, HITS

Node Classification: Belief Propagation

Node Representation and Classification

Node Similarity

Graph Similarity

Graph Alignment

Graph Clustering and Communities

Graph Summarization

Anomaly Detection

Link Analysis

Deep learning for graphs

Streaming graphs and algorithms

Recommendation Systems

Interpretability

Large-scale social science


Course Structure

Resources

Check the course website on Canvas to find pointers to datasets, code, and tools that will be useful for your assignments and projects.

Assignments

The coursework will comprise two short, practical assignments that will familiarize the students with the challenges of large-scale graph analysis. Each assignment will be done individually.

Semester-long Project

The most important component of this course is a semester-long project (related to topics discussed in class) that will be selected by students. The projects will be done in groups of 3-4 students. We will arrange brainstorming sessions to facilitate group formation. Feel free to use Canvas to pitch ideas and find groupmates.

For the project deliverables, you are required to make only one submission. To post on Canvas on behalf of a group, first go to the "People" tab, then to the Group tab, and then search for the relevant homework or project. Join with your groupmates the same group.

Ideas for Projects:

You might find ideas for your projects by exploring the topics of various data science competitions:

Project Deliverables:

  • Survey (4-5 pages in PDF format, 15% of the project grade).
    You will need to pick a research topic for your project and read 6-8 relevant papers. Ideally the survey will help you identify the specific problem you want to address, and will lead to the project proposal naturally. The survey will be part of your final report. Your survey should provide answers to the following questions:
    • What is the common theme of the papers you read? Give the problem definition(s).
    • What are the challenges of the area?
    • How do the papers relate to each other?
    • Are they solving a new problem or improving an existing method?
    • What are the main techniques that they are using?
    • What are 3 strengths and 3 weaknesses of each paper?
    • What are the limitations of each method?
    • Think about some future directions. What would you do better? Think about scalability issues, generality (e.g., weighted, directed, time-evolving, attributed networks), applicability to various domains.
    >> Don't forget to include the names of all the group members in the pdf. If you want to submit a longer survey, please ask me first.
  • Project Proposal (2-3 pages in PDF format, 15% of the project grade).
    Your proposal should include the following sections:
    • Problem definition
    • Challenges
    • Most related prior work and its shortcomings
    • Proposed approach
    • Data that you will use
    • Evaluation plan
    >> Don't forget to include the names of all the group members in the pdf.
  • Mid-term Report (4-5 pages, ACM format, 20% of the project grade).
    See below for the sections that your final report should have. At this point, for your midterm report, you should start editing the following sections:
    • Section 2. Data: Describe the synthetic and real data that you will use, and explain the data collection process (if applicable).
    • Section 3. Proposed Method: Introduce the method that you propose, give the necessary definitions, potentially give proof of concept.
    • Section 4. Experiments: Give some preliminary experiments (on synthetic or real data).
    • Section 5. Progress and Next Steps (temporary section): Outline your next steps and whether you are on track. Now that you have had time to work on your projects, if anything has changed with respect to your proposal, mention it.
    • Section 6. Division of work (your grade will depend on your contribution to the project)
    >> Don't forget to include the names of all the group members in the pdf.
  • Final Report (10 pages including citations, ACM format + CODE, 50% of the project grade).
    A. Report Structure: Your report should have the form of a paper with (at least) the following sections:
    • Section 0. Abstract
    • Section 1. Introduction
    • Section 2. Data
    • Section 3. Proposed Method
    • Section 4. Experiments
    • Section 5. Related Work
    • Section 6. Conclusions (include what you learned)
    • Section 7. Division of work (your grade will depend on your contribution to the project)
    B. Code: Organize your code in a folder called "CODE". Include a README file and MAKEFILE. Your code should be running on horton.eecs.umich.edu.
    >> Submit a zip file with the pdf and the CODE/ folder. >> Don't forget to include the names of all the group members in the pdf.
For more information, look out for the announcements on Canvas.

Grading

Class Participation 15%
Class Presentations: 1 presentation 15%
Project: 1 in a group of 3-4 students 50%
Short Assignments: 2 assignments, 10% each 20%

Policies

Late Days

For the assignments and project submissions, check out the schedule on the website.

For assignments, you will have 4 late days in total (no questions asked). If needed, you can use all the late days for one assignment or split them between the two assignments. Late days are rounded up to the nearest integer. For example, a submission that is 4 hours late will count as one day. Beyond that, you will get a zero for that assignment.

Since projects require coordination of 3-4 students, you are advised to submit them on time. Even if it's only one student who did not finish their part on time for submission by the deadline, the whole group will be penalized. For the project deliverables, there will be a 5% penalty for each late day (up to 2 days, if possible). The latest that you can submit a report and its corresponding slides is the night (11:55pm) before the class that is dedicated to your work-in-progress presentations. Beyond 2 days or if you fail to submit the deliverable before the in-class presentations, you will get a zero on that component of the project. Please submit at least 30 minutes before the regular deadline as a safety measure. Similarly to assignments, late days are rounded up to the nearest integer.

We have run into situations in the past (rare) where a student misses the regular deadline by 2-3 minutes for a project and incurred a 5% penalty. Sometimes, this is because of last-minute project work or slow servers. We will give a one-time waiver of the 5% penalty if you miss the regular submission deadline for a project by 5 minutes or less (i.e., 12:00 AM or earlier). Beyond that, the 5% penalty will apply, even if you miss the deadline by 1 minute. Don't forget that this is less strict than what happens with conference deadlines; if you miss the deadline even by a few seconds, you will need to submit to another conference or wait for a year until the next submission cycle :)

For extreme circumstances, like medical emergencies, no-penalty extensions will be granted. Email eecs598mining-f16 [AT] umich.edu with written documentation (e.g. doctor's note).


Honor Code

All students (including LS&A and Engineering) are required to observe the Engineering Honor Code in all assignments. A copy of the honor code can be found here. Please make sure that you clearly understand what constitutes cheating. If you are not sure in any specific case, you should ask the teaching staff. The University takes honor code violations seriously, and penalties can be severe. You are not allowed to make use of assignment solutions by others, including solutions from previous semesters.

Any suspected violations of the honor code will be reported.


Disabilities and Conflicts

Students with disabilities that are documented with the Services for Students with Disabilities (SSWD) Office should contact the professor during the first three weeks of class to make appropriate arrangements.