Mike Cafarella
Associate Professor
Computer Science and Engineering
2260 Hayward St.
University of Michigan
Ann Arbor, MI 48109-2121

Office: 4709 Beyster
Phone: 734-764-9418
Fax: 734-763-8094
Send email to me at michjc, found at umich dot edu

2018
PDF Gregory DeAngelo, Jacob N. Shapiro, Jeffrey Borowitz, Michael Cafarella, Christopher Re, Gary Shiffman: Rational Pricing in Prostitution: Evidence from Online Sex Ads. Under submission. Note to computer scientists: Economists take a long time to publish papers formally, so it is common to distribute high-quality drafts. This is such a draft. The gritty details of publishing mean that you will find the figures at the end of the document.
PDF Michael R. Anderson, Michael Cafarella, German Ros, Thomas F. Wenisch: Physical Representation-based Predicate Optimization for a Visual Analytics Database. ICDE 2019
PDF Andrew Quinn, Jason Flinn, Michael Cafarella: Sledgehammer: Cluster-Fueled Debugging. OSDI 2018.
PDF Dolan Antenucci, Michael Cafarella: Constraint-based Explanation and Repair of Filter-Based Transformations. PVLDB 11(9): 947-960 (2018).
PDF Michael Cafarella, Alon Halevy, Hongrae Lee, Jayant Madhavan, Cong Yu, Daisy Zhe Wang, Eugene Wu: Ten Years of WebTables. PVLDB 11(12): 2140-2149 (2018).
PDF Zhongjun Jin, Christopher Baik, Michael Cafarella, H.V. Jagadish: Beaver: Towards a Declarative Schema Mapping. HILDA 2018.
2017
PDF Zhe Chen, Sasha Dadiomov, Richard Wesley, Gang Xiao, Daniel Cory, Michael Cafarella, Jock Mackinlay: Spreadsheet Property Detection With Rule-assisted Acrtive Learning. CIKM 2017.
PDF Yongjoo Park, Ahmad Shabab Tajik, Michael Cafarella, Barzan Mozafari: Database Learning: Toward a Database that Becomes Smarter Every Time. SIGMOD 2017.
PDF Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H.V. Jagadish: Foofah: Transforming Data by Example. SIGMOD 2017.
PDF Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H.V. Jagadish: Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs. SIGMOD Demo 2017.
2016
PDF Matthew Burgess, Eytan Adar, Michael Cafarella: Link-prediction enhanced consensus clustering for complex networks. May 20, 2016, PLoS ONE.
PDF Manish Singh, Michael Cafarella, H.V. Jagadish: DBExplorer: Exploratory Search in Databases. EDBT 2016.
PDF Yongjoo Park, Michael Cafarella, Barzan Mozafari: Visualization-Aware Sampling for Very Large Databases. ICDE 2016.
PDF Prateek Tandon, Faissal M. Sleiman, Michael Cafarella, Thomas F. Wenisch: HAWK: Hardware Support for Unstructured Log Processing. ICDE 2016.
PDF Zhe Chen, Michael Cafarella, H.V. Jagadish: Long-tail Vocabulary Dictionary Extraction from the Web. WSDM 2016.
PDF Dolan Antenucci, Michael R. Anderson, Penghua Zhao, Michael Cafarella: A Query System for Social Media Signals. Demonstration system, ICDE 2016.
PDF Dolan Antenucci, Michael R. Anderson, Michael Cafarella: A Declarative Query Processing System for Nowcasting. VLDB 10(3) 2016.
PDF Michael R. Anderson, Dolan Antenucci, Michael Cafarella: Runtime Support for Human-in-the-Loop Feature Engineering Systems. IEEE Data Engineering 39(4), December 2016.
PDF Michael R. Anderson, Michael Cafarella: Input Selection for Fast Feature Engineering. ICDE 2016.
PDF Michael Chow, Kaushik Veeraraghavan, Jason Flinn, Michael Cafarella: DQBarge: Improving Data Quality Tradeoffs in Large-Scale Internet Services. OSDI 2016
PDF Vaibhav Gogte, Aasheesh Kolli, Michael J. Cafarella, Loris D'Antoni, Thomas F. Wenisch: HARE: Hardware Accelerator for Regular Expressions. MICRO 2016
PDF Ce Zhang, Jaeho Shin, Christopher Re, Michael Cafarella, Feng Niu: : Extracting Databases from Dark Data with DeepDive: SIGMOD 2016.
2015
PDF Christopher Re, Divy Agarwal, Magdalena Balazinska, Michael Cafarella, Michael Jordan, Tim Kraska, Raghu Ramakrishnan: Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? SIGMOD Panel Discussion, 2015.
PDF Dolan Antenucci, Michael Anderson, Michael Cafarella: Raccon: A Query System for Social Media Signals. Symposium on Cloud Computing (SoCC) Poster, 2015.
PDF Zhe Chen, Michael Cafarella, Eytan Adar: DiagramFlyer: A Search Engine for Data-Driven Diagrams. World Wide Web (WWW) Conference Demonstration, 2015.
PDF Jaeho Shin, Christopher Re, Michael Cafarella: A Demonstration of Data Labeling in Knowledge Base Construction. VLDB Demo, 2015.
PDF Yongjoo Park, Michael Cafarella, Barzan Mozafari: Neighbor-Sensitive Hashing. 3rd Workshop on Web-scale Vision and Social Media (VSM) at ICCV 2015.
PDF Yongjoo Park, Michael Cafarella, Barzan Mozafari: Neighbor-Sensitive Hashing. PVLDB 9(3), 2015.
2014
PDF Chun-Hung Hsiao, Michael Cafarella, Satish Narayanasamy: Using Web Corpus Statistics for Program Analysis. OOPSLA 2014.
PDF Michael R. Anderson, Michael Cafarella, Yixing Jiang, Guan Wang, and Bochun Zhang: An Integrated Development Environment for Faster Feature Engineering. VLDB Demo 2014.
PDF Zhe Shirley Chen and Michael Cafarella: Integrating Spreadsheet Data via Accurate and Low-Effort Extraction. KDD 2014.
NBER site, PDF Dolan Antenucci, Michael Cafarella, Margaret C. Levenstein, Christopher Re, and Matthew D. Shapiro: Using Social Media to Measure Labor Market Flows. NBER Working Paper No. 20010. March, 2014

Note: this paper is targeted to an Economics audience, but computer scientists will find most of it easy to understand. This is a so-called "working paper" for which there is no real equivalent in Computer Science. Working papers in Economics are a commonplace method for sharing scholarly information and are usually of a very high standard. However, this document has not gone through a formal peer review process.

2013
PDF Jacob Goldsmith, Antek G. Wong-Foy, Michael J Cafarella, and Donald J. Siegel: Theoretical Limits of Hydrogen Storage in Metal-Organic Frameworks: Opportunities and Trade-Offs. Chemistry of Materials, July 2013
PDF Zhe Chen, Michael Cafarella: Automatic Spreadsheet Data Extraction. Third International Workshop on Semantic Search over the Web (SSW), 2013.
PDF Zhe Chen, Michael Cafarella, Jun Chen, Daniel Prevo, Junfeng Zhuang: Senbazuru: A Prototype Spreadsheet Database Management System. VLDB Demo 2013.
PDF Dolan Antenucci, Erdong Li, Shaobo Liu, Bochun Zhang, Michael J. Cafarella, Chistopher Ré: Ringtail: A Generalized Nowcasting System. VLDB Demo 2013.
PDF Dolan Antenucci, Michael J. Cafarella, Margaret C. Levenstein, Christopher Ré, Matthew D. Shapiro: Ringtail: Feature Selection for Easier Nowcasting. WebDB 2013.
PDF Matthew Burgess, Alessandra Mazzia, Eytan Adar, Michael Cafarella: Leveraging Noisy Lists for Social Feed Ranking. ICWSM 2013.
PDF Prateek Tandon, Michael J. Cafarella, Thomas Wenisch: Minimizing Remote Accesses in MapReduce Clusters. International Workshop on High Performance Data Intensive Computing (HPDIC) 2013.
PDF Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, Ce Zhang: Brainwash: A Data System for Feature Engineering. CIDR Conference 2013.
2012
PDF Li Qian, Michael J. Cafarella, H.V. Jagadish: Sample-Driven Schema Mapping. SIGMOD Conference 2012.
2011
PDF Michael J. Cafarella, Alon Y. Halevy: Web Data Management (tutorial). SIGMOD Conference 2011: 1199-1200.
PDF Eaman Jahani, Michael J. Cafarella, Christopher Re: Automatic Optimization for MapReduce Programs. PVLDB 4(6): 385-296 (2011).
PDF Michael J. Cafarella, Alon Y. Halevy, Jayant Madhavan: Structured Data on the Web. Communications of the ACM 54(2): 72-79.
2010
PDF Michael J. Cafarella, Christopher Re: Relational Optimization for Data-Intensive Programs. WebDB 2010.
2009
PDF Michael J. Cafarella: Extracting and Querying a Comprehensive Web Database. CIDR 2009.
PDF Michael J. Cafarella, Alon Y. Halevy, Nodira Khoussainova: Data Integration for the Relational Web. PVLDB 2(1): 1090-1101 (2009).
2008
PDF Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu: Uncovering the Relational Web. WebDB 2008.
PDF Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu: WebTables: Exploring the Power of Tables on the Web. PVLDB 1(1): 538-549 (2008).
PDF Luke McDowell, Michael J. Cafarella: Ontology-Driven Unsupervised Instance Population. Journal of Web Semantics 6(3): 218-236 (2008).
PDF Michael J. Cafarella, Edward Y. Chang, Andrew Fikes, Alon Y. Halevy, Wilson C. Hsieh, Alberto Lerner, Jayant Madhavan, S. Muthukrishnan: Data Management Projects at Google. SIGMOD Record 37(1): 34-38 (2008).
PDF Michael J. Cafarella, Jayant Madhavan, Alon Y. Halevy: Web-Scale Extraction of Structured Data. SIGMOD Record 37(4): 55-61 (2008).
2007
PDF Michael J. Cafarella, Dan Suciu, Oren Etzioni: Navigating Extracted Data with Schema Discovery. WebDB 2007.
PDF Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko: Structured Querying of Web Text: A Technical Challenge. CIDR 2007.
PDF Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007.
2006
PDF Michael J. Cafarella, Dan Suciu, Oren Etzioni: Structured Queries Over Web Text. IEEE Data Bulletin, December 2006, 29(4).
PDF Luke McDowell, Michael J. Cafarella: Ontology-Driven Information Extraction with OntoSyphon. ISWC 2006.
PDF Oren Etzioni, Michele Banko, Michael J. Cafarella: Machine Reading. Proceedings of AAAI 2006.
2005
PDF Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni: KnowItNow: Fast, Scalable Information Extraction from the Web. HLT/EMNLP 2005.
PDF Michael J. Cafarella, Oren Etzioni: A Search Engine for Natural Language Applications. WWW 2005.
PDF Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(1): 91-134 (2005).
2004
PDF Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison. AAAI 2004: 391-398.
PDF Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Web-scale information extraction in knowitall: (preliminary results). WWW 2004: 100-110
PDF Michael J. Cafarella, Douglas R. Cutting: Building Nutch: Open-Source Search. ACM Queue 2(2): 54-61 (2004).
That's it!