Mike Cafarella
Assistant Professor
Computer Science and Engineering
2260 Hayward St.
University of Michigan
Ann Arbor, MI 48109-2121

Office: 4709 Beyster
Phone: 734-764-9418
Fax: 734-763-8094
Send email to me at michjc, found at umich dot edu

2016
PDF TK Manish Singh, Michael Cafarella, H.V. Jagadish: DBExplorer: Exploratory Search in Databases. EDBT 2016.
PDF TK Prateek Tandon, Faissal M. Sleiman, Michael Cafarella, Thomas F. Wenisch: HAWK: Hardware Support for Unstructured Log Processing. ICDE 2016.
PDF Zhe Chen, Michael Cafarella, H.V. Jagadish: Long-tail Vocabulary Dictionary Extraction from the Web. WSDM 2016.
PDF TK Dolan Antenucci, Michael R. Anderson, Penghua Zhao, Michael Cafarella: A Query System for Social Media Signals. Demonstration system, ICDE 2016.
PDF TK Michael R. Anderson, Michael Cafarella: Input Selection for Fast Feature Engineering. ICDE 2016.
PDF TK Yongjoo Park, Michael Cafarella, Barzan Mozafari: Visualization-Aware Sampling for Very Large Databases. ICDE 2016.
2015
PDF Christopher Re, Divy Agarwal, Magdalena Balazinska, Michael Cafarella, Michael Jordan, Tim Kraska, Raghu Ramakrishnan: Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? SIGMOD Panel Discussion, 2015.
PDF Dolan Antenucci, Michael Anderson, Michael Cafarella: Raccon: A Query System for Social Media Signals. Symposium on Cloud Computing (SoCC) Poster, 2015.
PDF Zhe Chen, Michael Cafarella, Eytan Adar: DiagramFlyer: A Search Engine for Data-Driven Diagrams. World Wide Web (WWW) Conference Demonstration, 2015.
PDF Jaeho Shin, Christopher Re, Michael Cafarella: A Demonstration of Data Labeling in Knowledge Base Construction. VLDB Demo, 2015.
PDF Yongjoo Park, Michael Cafarella, Barzan Mozafari: Neighbor-Sensitive Hashing. 3rd Workshop on Web-scale Vision and Social Media (VSM) at ICCV 2015.
PDF Yongjoo Park, Michael Cafarella, Barzan Mozafari: Neighbor-Sensitive Hashing. PVLDB 9(3), 2015.
2014
PDF Chun-Hung Hsiao, Michael Cafarella, Satish Narayanasamy: Using Web Corpus Statistics for Program Analysis. OOPSLA 2014.
PDF Michael R. Anderson, Michael Cafarella, Yixing Jiang, Guan Wang, and Bochun Zhang: An Integrated Development Environment for Faster Feature Engineering. VLDB Demo 2014.
PDF Zhe Shirley Chen and Michael Cafarella: Integrating Spreadsheet Data via Accurate and Low-Effort Extraction. KDD 2014.
NBER site, PDF Dolan Antenucci, Michael Cafarella, Margaret C. Levenstein, Christopher Re, and Matthew D. Shapiro: Using Social Media to Measure Labor Market Flows. NBER Working Paper No. 20010. March, 2014

Note: this paper is targeted to an Economics audience, but computer scientists will find most of it easy to understand. This is a so-called "working paper" for which there is no real equivalent in Computer Science. Working papers in Economics are a commonplace method for sharing scholarly information and are usually of a very high standard. However, this document has not gone through a formal peer review process.

2013
PDF Jacob Goldsmith, Antek G. Wong-Foy, Michael J Cafarella, and Donald J. Siegel: Theoretical Limits of Hydrogen Storage in Metal-Organic Frameworks: Opportunities and Trade-Offs. Chemistry of Materials, July 2013
PDF Zhe Chen, Michael Cafarella: Automatic Spreadsheet Data Extraction. Third International Workshop on Semantic Search over the Web (SSW), 2013.
PDF Zhe Chen, Michael Cafarella, Jun Chen, Daniel Prevo, Junfeng Zhuang: Senbazuru: A Prototype Spreadsheet Database Management System. VLDB Demo 2013.
PDF Dolan Antenucci, Erdong Li, Shaobo Liu, Bochun Zhang, Michael J. Cafarella, Chistopher Ré: Ringtail: A Generalized Nowcasting System. VLDB Demo 2013.
PDF Dolan Antenucci, Michael J. Cafarella, Margaret C. Levenstein, Christopher Ré, Matthew D. Shapiro: Ringtail: Feature Selection for Easier Nowcasting. WebDB 2013.
PDF Matthew Burgess, Alessandra Mazzia, Eytan Adar, Michael Cafarella: Leveraging Noisy Lists for Social Feed Ranking. ICWSM 2013.
PDF Prateek Tandon, Michael J. Cafarella, Thomas Wenisch: Minimizing Remote Accesses in MapReduce Clusters. International Workshop on High Performance Data Intensive Computing (HPDIC) 2013.
PDF Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, Ce Zhang: Brainwash: A Data System for Feature Engineering. CIDR Conference 2013.
2012
PDF Li Qian, Michael J. Cafarella, H.V. Jagadish: Sample-Driven Schema Mapping. SIGMOD Conference 2012.
2011
PDF Michael J. Cafarella, Alon Y. Halevy: Web Data Management (tutorial). SIGMOD Conference 2011: 1199-1200.
PDF Eaman Jahani, Michael J. Cafarella, Christopher Re: Automatic Optimization for MapReduce Programs. PVLDB 4(6): 385-296 (2011).
PDF Michael J. Cafarella, Alon Y. Halevy, Jayant Madhavan: Structured Data on the Web. Communications of the ACM 54(2): 72-79.
2010
PDF Michael J. Cafarella, Christopher Re: Relational Optimization for Data-Intensive Programs. WebDB 2010.
2009
PDF Michael J. Cafarella: Extracting and Querying a Comprehensive Web Database. CIDR 2009.
PDF Michael J. Cafarella, Alon Y. Halevy, Nodira Khoussainova: Data Integration for the Relational Web. PVLDB 2(1): 1090-1101 (2009).
2008
PDF Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu: Uncovering the Relational Web. WebDB 2008.
PDF Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu: WebTables: Exploring the Power of Tables on the Web. PVLDB 1(1): 538-549 (2008).
PDF Luke McDowell, Michael J. Cafarella: Ontology-Driven Unsupervised Instance Population. Journal of Web Semantics 6(3): 218-236 (2008).
PDF Michael J. Cafarella, Edward Y. Chang, Andrew Fikes, Alon Y. Halevy, Wilson C. Hsieh, Alberto Lerner, Jayant Madhavan, S. Muthukrishnan: Data Management Projects at Google. SIGMOD Record 37(1): 34-38 (2008).
PDF Michael J. Cafarella, Jayant Madhavan, Alon Y. Halevy: Web-Scale Extraction of Structured Data. SIGMOD Record 37(4): 55-61 (2008).
2007
PDF Michael J. Cafarella, Dan Suciu, Oren Etzioni: Navigating Extracted Data with Schema Discovery. WebDB 2007.
PDF Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko: Structured Querying of Web Text: A Technical Challenge. CIDR 2007.
PDF Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007.
2006
PDF Michael J. Cafarella, Dan Suciu, Oren Etzioni: Structured Queries Over Web Text. IEEE Data Bulletin, December 2006, 29(4).
PDF Luke McDowell, Michael J. Cafarella: Ontology-Driven Information Extraction with OntoSyphon. ISWC 2006.
PDF Oren Etzioni, Michele Banko, Michael J. Cafarella: Machine Reading. Proceedings of AAAI 2006.
2005
PDF Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni: KnowItNow: Fast, Scalable Information Extraction from the Web. HLT/EMNLP 2005.
PDF Michael J. Cafarella, Oren Etzioni: A Search Engine for Natural Language Applications. WWW 2005.
PDF Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(1): 91-134 (2005).
2004
PDF Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison. AAAI 2004: 391-398.
PDF Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Web-scale information extraction in knowitall: (preliminary results). WWW 2004: 100-110
PDF Michael J. Cafarella, Douglas R. Cutting: Building Nutch: Open-Source Search. ACM Queue 2(2): 54-61 (2004).
That's it!