Mike Cafarella
Associate Professor
Computer Science and Engineering
2260 Hayward St.
University of Michigan
Ann Arbor, MI 48109-2121

Office: 4709 Beyster
Phone: 734-764-9418
Fax: 734-763-8094
Send email to me at michjc, found at umich dot edu

Research Overview

I do research in three main areas of data management.
  1. Systems and algorithms for "messy" data management includes work on information extraction (from spreadsheets, or from Web pages of different kinds), data integration (whether integrating data from Web pages or more traditional sources), machine learning workloads (such as feature engineering), and top-k ranking.
  2. Novel data applications, especially for social science use cases in economics and fighting human trafficking (technical paper coming soon, but in the meantime, read this article in Scientific American).
  3. Data systems infrastructure includes systems work that can undergird very general-purpose data management methods. My work on Hadoop is the best-known example, but also includes research into optimization for MapReduce programs and hardware support for text analytics (accepted for ICDE 2016).


In addition to writing papers, we build real systems that aim to make large, concrete, real-world impact:

Data, code, and other resources


We are grateful to many different organizations for helping to fund our research:
  • The Census Bureau
  • Dow
  • General Electric
  • Google
  • The National Science Foundation
  • Yahoo!