Mike Cafarella
Associate Professor
Computer Science and Engineering
2260 Hayward St.
University of Michigan
Ann Arbor, MI 48109-2121
Office: 4709 Beyster
Phone: 734-764-9418
Fax: 734-763-8094
Send email to me at michjc, found at umich dot edu
Hi. I am an associate professor in
Computer Science and Engineering at the
University of Michigan. My research interests include databases, information extraction, data integration, and data mining. I'm a member of the
Software Systems Lab and the
Michigan Database Group.
My students and I currently work on projects in four areas of data mangement:
- Tools for Dataset Construction, including information extraction (from tables, spreadsheets, and text of various kinds) and data transformation.
- Data-Intensive Programming and Debugging, such as creating data transformation programs, exploiting code corpora, building large-scale debugging systems or debugging in the face of data-quality tradeoffs.
- Data Management for Economics, such as data systems for managing raw nowcasting evidence, using information extraction to attack trafficking crimes, or investigating data integration issues in macroeconomic statistics.
- System Support for Machine Learning Development, such as systems for feature engineering, efficiently querying image corpora for training set construction, and even some hardware ideas (and, awhile ago, Hadoop).
My group publishes primarily in database conferences, but we also make contributions to systems-, AI-, and Econ-related venues.
You can read more about our research here
You can read the press about about our work here
In addition to our intellectual contributions, we are proud of the practical impact of our work. Some relevant examples include the
Hadoop open-source project;
various search engine features based on the WebTables work; and Lattice Data, a startup based around information extraction. More exciting stuff to come.
Some recent news
Older news:
- I've been appointed the Morris Wellman Faculty Development Assistant Professor of Computer Science and Engineering. Many, many thanks to the Wellman family and to the University of Michigan.
- Our 2008 paper on WebTables has been included as a reading in Chapter 10 of the 5th Edition of Readings in Database Systems, more popularly known as The Red Book. This was the first database text I ever encountered (back in 2005), so I'm gratified our work has become part of it.
- A big ICDE season for us: Three research papers, a demo, and a panel! Special congrats to my students and collaborators Mike Anderson, Yongjoo Park, Prateek Tandon, Faissal Sleiman, Tom Wenisch, and Barzan Mozafari.
- Almost hour-long interview with me about research, Hadoop, and many other topics. Ben Lorica from O'Reilly did a nice job.
- WOW! Our work on DeepDive and the DARPA MEMEX project was on 60 Minutes!!!
- Our paper on Neighbor-Sensitive Hashing was accepted to PVLDB! Congratulations to my student Yongjoo Park and colleague Barzan Mozafari.