EECS 598-008: Special Topics, Winter 2019
Advanced Data Mining
This course will cover a number of advanced topics in data mining. A mix of lectures and readings will familiarize the students with recent methods and algorithms for exploring and analyzing large-scale data and networks, as well as applications in various domains (e.g., web science, social science, neuroscience). The focus will be on scalable and practical methods, and the students will have the chance to analyze large datasets. The advanced topics will include: ranking, classification, clustering and community detection, summarization, similarity, anomaly detection, node representation and deep learning in the graph setting.
This course aims to introduce students to advanced data mining, with emphasis on interconnected data or graphs or networks. Students will become familiar with the challenges of processing large amounts of data, state-of-the-art methods and algorithms for analyzing them, and applications of data mining in various domains. We expect that by the end of the course, students:
Students are expected to (1) have basic knowledge of linear algebra, (2) be familiar with probability theory and statistics, and (3) have good programming skills (e.g., Python, JAVA, C, Matlab, R, or any programming language of their preference). Basic knowledge of machine learning is helpful.
** Advanced-standing undergraduates or other students who do not meet the prerequisites may enroll with permission of the instructor.
Lectures: Thu 4-7pm @ EECS 1303
!! The topics and dates of the lectures are subject to change. The following schedule outlines the topics that we will be covering in this course. The paper readings have been updated!
Readings Per Topic
Static Graphs: Laws and Patterns
Dynamic Graphs: Laws and Patterns
Link Analysis & Node Classification
Community Detection & Role Discovery
Similarity & Fusion
Computational Social Science
Other topics that may be of interest (not covered in class, but potentially related to your projects)
Streaming Graph Algorithms
Check the course website on Canvas to find pointers to datasets, code, and tools that will be useful for your assignments and projects.
The coursework will comprise at most three short, practical assignments that will familiarize the students with the challenges of large-scale graph analysis. Each assignment will be done individually.
The most important component of this course is a semester-long project (related to topics discussed in class) that will be selected by students. The projects will be done in groups of 3-4 students. We will arrange brainstorming sessions to facilitate group formation. Feel free to use Piazza to pitch ideas and find groupmates.
For the project deliverables, you are required to make only one submission. To post on Canvas on behalf of a group, first go to the "People" tab, then to the Group tab, and then search for the relevant homework or project. Join with your groupmates the same group.
Ideas for Projects:
You might find ideas for your projects by exploring the topics of various data science competitions:
You will need to pick a research topic for your project and read 6-8 relevant papers. Ideally the survey will help you identify the specific problem you want to address, and will lead to the project proposal naturally. The survey will be part of your final report. It should be a well though-out synthesis of the papers that you will read, not just a repetition of the paper's abstracts / introductions. Your survey should provide answers to the following questions:
Project Proposal (2 pages in PDF format, 15% of the project grade).
Your proposal should include the following sections:
Mid-term Report (4-5 pages, ACM format, 20% of the project grade).
See below for the sections that your final report should have. At this point, for your midterm report, you should start editing the following sections:
Final Report (8 pages excluding citations, ACM format + CODE, 50% of the project grade).
A. Report Structure: Your report should have the form of a paper with (at least) the following sections:
>> Submit a zip file with the pdf and the CODE/ folder. >> Don't forget to include the names of all the group members in the pdf.
For the assignments and project submissions, check out the schedule on the website.
For assignments, you will have 4 late days in total (no questions asked). If needed, you can use all the late days for one assignment or split them between the three assignments. Late days are rounded up to the nearest integer. For example, a submission that is 4 hours late will count as one day. Beyond that, you will get a zero for that assignment.
Since the projects require coordination of 3-4 students, there will be NO late days. If you submit AFTER the deadline, you will get a zero on that component of the project. Please submit at least 30 minutes before the regular deadline as a safety measure.
We have run into situations in the past (rare) where students miss the regular deadline by 2-3 minutes for a project. Sometimes, this is because of last-minute project work or slow servers. We will give a one-time waiver of the penalty if you miss the regular submission deadline for a project by 5 minutes or less. Beyond that, your project submission will not be graded and you will receive a zero. Don't forget that this is less strict than what happens with conference deadlines; if you miss the deadline even by a few seconds, you will need to submit to another conference or wait for a year until the next submission cycle :)
For extreme circumstances, like medical emergencies, no-penalty extensions will be granted. Email eecs598dm-w19 [AT] umich.edu with written documentation (e.g. doctor's note).
All students (including LS&A and Engineering) are required to observe the Engineering Honor Code in all assignments. A copy of the honor code can be found here. Please make sure that you clearly understand what constitutes cheating. If you are not sure in any specific case, you should ask the teaching staff. The University takes honor code violations seriously, and penalties can be severe. You are not allowed to make use of assignment solutions by others, including solutions from previous semesters.
Any suspected violations of the honor code will be reported.
Disabilities and Conflicts
Students with disabilities that are documented with the Services for Students with Disabilities (SSWD) Office should contact the professor during the first three weeks of class to make appropriate arrangements.