Gregory DeAngelo, Jacob N. Shapiro, Jeffrey Borowitz, Michael Cafarella, Christopher Re, Gary Shiffman: Rational Pricing in Prostitution: Evidence from Online Sex Ads. Under submission. Note to computer scientists: Economists take a long time to publish papers formally, so it is common to distribute high-quality drafts. This is such a draft. The gritty details of publishing mean that you will find the figures at the end of the document. | |
Michael R. Anderson, Michael Cafarella, German Ros, Thomas F. Wenisch: Physical Representation-based Predicate Optimization for a Visual Analytics Database. ICDE 2019 | |
Andrew Quinn, Jason Flinn, Michael Cafarella: Sledgehammer: Cluster-Fueled Debugging. OSDI 2018. | |
Dolan Antenucci, Michael Cafarella: Constraint-based Explanation and Repair of Filter-Based Transformations. PVLDB 11(9): 947-960 (2018). | |
Michael Cafarella, Alon Halevy, Hongrae Lee, Jayant Madhavan, Cong Yu, Daisy Zhe Wang, Eugene Wu: Ten Years of WebTables. PVLDB 11(12): 2140-2149 (2018). | |
Zhongjun Jin, Christopher Baik, Michael Cafarella, H.V. Jagadish: Beaver: Towards a Declarative Schema Mapping. HILDA 2018. |
Zhe Chen, Sasha Dadiomov, Richard Wesley, Gang Xiao, Daniel Cory, Michael Cafarella, Jock Mackinlay: Spreadsheet Property Detection With Rule-assisted Acrtive Learning. CIKM 2017. | |
Yongjoo Park, Ahmad Shabab Tajik, Michael Cafarella, Barzan Mozafari: Database Learning: Toward a Database that Becomes Smarter Every Time. SIGMOD 2017. | |
Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H.V. Jagadish: Foofah: Transforming Data by Example. SIGMOD 2017. | |
Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H.V. Jagadish: Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs. SIGMOD Demo 2017. |
Matthew Burgess, Eytan Adar, Michael Cafarella: Link-prediction enhanced consensus clustering for complex networks. May 20, 2016, PLoS ONE. | |
Manish Singh, Michael Cafarella, H.V. Jagadish: DBExplorer: Exploratory Search in Databases. EDBT 2016. | |
Yongjoo Park, Michael Cafarella, Barzan Mozafari: Visualization-Aware Sampling for Very Large Databases. ICDE 2016. | |
Prateek Tandon, Faissal M. Sleiman, Michael Cafarella, Thomas F. Wenisch: HAWK: Hardware Support for Unstructured Log Processing. ICDE 2016. | |
Zhe Chen, Michael Cafarella, H.V. Jagadish: Long-tail Vocabulary Dictionary Extraction from the Web. WSDM 2016. | |
Dolan Antenucci, Michael R. Anderson, Penghua Zhao, Michael Cafarella: A Query System for Social Media Signals. Demonstration system, ICDE 2016. | |
Dolan Antenucci, Michael R. Anderson, Michael Cafarella: A Declarative Query Processing System for Nowcasting. VLDB 10(3) 2016. | |
Michael R. Anderson, Dolan Antenucci, Michael Cafarella: Runtime Support for Human-in-the-Loop Feature Engineering Systems. IEEE Data Engineering 39(4), December 2016. | |
Michael R. Anderson, Michael Cafarella: Input Selection for Fast Feature Engineering. ICDE 2016. | |
Michael Chow, Kaushik Veeraraghavan, Jason Flinn, Michael Cafarella: DQBarge: Improving Data Quality Tradeoffs in Large-Scale Internet Services. OSDI 2016 | |
Vaibhav Gogte, Aasheesh Kolli, Michael J. Cafarella, Loris D'Antoni, Thomas F. Wenisch: HARE: Hardware Accelerator for Regular Expressions. MICRO 2016 | |
Ce Zhang, Jaeho Shin, Christopher Re, Michael Cafarella, Feng Niu: : Extracting Databases from Dark Data with DeepDive: SIGMOD 2016. |
Christopher Re, Divy Agarwal, Magdalena Balazinska, Michael Cafarella, Michael Jordan, Tim Kraska, Raghu Ramakrishnan: Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? SIGMOD Panel Discussion, 2015. | |
Dolan Antenucci, Michael Anderson, Michael Cafarella: Raccon: A Query System for Social Media Signals. Symposium on Cloud Computing (SoCC) Poster, 2015. | |
Zhe Chen, Michael Cafarella, Eytan Adar: DiagramFlyer: A Search Engine for Data-Driven Diagrams. World Wide Web (WWW) Conference Demonstration, 2015. | |
Jaeho Shin, Christopher Re, Michael Cafarella: A Demonstration of Data Labeling in Knowledge Base Construction. VLDB Demo, 2015. | |
Yongjoo Park, Michael Cafarella, Barzan Mozafari: Neighbor-Sensitive Hashing. 3rd Workshop on Web-scale Vision and Social Media (VSM) at ICCV 2015. | |
Yongjoo Park, Michael Cafarella, Barzan Mozafari: Neighbor-Sensitive Hashing. PVLDB 9(3), 2015. |
Chun-Hung Hsiao, Michael Cafarella, Satish Narayanasamy: Using Web Corpus Statistics for Program Analysis. OOPSLA 2014. | |
Michael R. Anderson, Michael Cafarella, Yixing Jiang, Guan Wang, and Bochun Zhang: An Integrated Development Environment for Faster Feature Engineering. VLDB Demo 2014. | |
Zhe Shirley Chen and Michael Cafarella: Integrating Spreadsheet Data via Accurate and Low-Effort Extraction. KDD 2014. | |
NBER site, PDF | Dolan Antenucci, Michael Cafarella, Margaret C. Levenstein, Christopher Re, and Matthew D. Shapiro: Using Social Media to Measure Labor Market Flows. NBER Working Paper No. 20010. March, 2014 Note: this paper is targeted to an Economics audience, but computer scientists will find most of it easy to understand. This is a so-called "working paper" for which there is no real equivalent in Computer Science. Working papers in Economics are a commonplace method for sharing scholarly information and are usually of a very high standard. However, this document has not gone through a formal peer review process. |
Jacob Goldsmith, Antek G. Wong-Foy, Michael J Cafarella, and Donald J. Siegel: Theoretical Limits of Hydrogen Storage in Metal-Organic Frameworks: Opportunities and Trade-Offs. Chemistry of Materials, July 2013 | |
Zhe Chen, Michael Cafarella: Automatic Spreadsheet Data Extraction. Third International Workshop on Semantic Search over the Web (SSW), 2013. | |
Zhe Chen, Michael Cafarella, Jun Chen, Daniel Prevo, Junfeng Zhuang: Senbazuru: A Prototype Spreadsheet Database Management System. VLDB Demo 2013. | |
Dolan Antenucci, Erdong Li, Shaobo Liu, Bochun Zhang, Michael J. Cafarella, Chistopher Ré: Ringtail: A Generalized Nowcasting System. VLDB Demo 2013. | |
Dolan Antenucci, Michael J. Cafarella, Margaret C. Levenstein, Christopher Ré, Matthew D. Shapiro: Ringtail: Feature Selection for Easier Nowcasting. WebDB 2013. | |
Matthew Burgess, Alessandra Mazzia, Eytan Adar, Michael Cafarella: Leveraging Noisy Lists for Social Feed Ranking. ICWSM 2013. | |
Prateek Tandon, Michael J. Cafarella, Thomas Wenisch: Minimizing Remote Accesses in MapReduce Clusters. International Workshop on High Performance Data Intensive Computing (HPDIC) 2013. | |
Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, Ce Zhang: Brainwash: A Data System for Feature Engineering. CIDR Conference 2013. |
Li Qian, Michael J. Cafarella, H.V. Jagadish: Sample-Driven Schema Mapping. SIGMOD Conference 2012. |
Michael J. Cafarella, Alon Y. Halevy: Web Data Management (tutorial). SIGMOD Conference 2011: 1199-1200. | |
Eaman Jahani, Michael J. Cafarella, Christopher Re: Automatic Optimization for MapReduce Programs. PVLDB 4(6): 385-296 (2011). | |
Michael J. Cafarella, Alon Y. Halevy, Jayant Madhavan: Structured Data on the Web. Communications of the ACM 54(2): 72-79. |
Michael J. Cafarella, Christopher Re: Relational Optimization for Data-Intensive Programs. WebDB 2010. |
Michael J. Cafarella: Extracting and Querying a Comprehensive Web Database. CIDR 2009. | |
Michael J. Cafarella, Alon Y. Halevy, Nodira Khoussainova: Data Integration for the Relational Web. PVLDB 2(1): 1090-1101 (2009). |
Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu: Uncovering the Relational Web. WebDB 2008. | |
Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu: WebTables: Exploring the Power of Tables on the Web. PVLDB 1(1): 538-549 (2008). | |
Luke McDowell, Michael J. Cafarella: Ontology-Driven Unsupervised Instance Population. Journal of Web Semantics 6(3): 218-236 (2008). | |
Michael J. Cafarella, Edward Y. Chang, Andrew Fikes, Alon Y. Halevy, Wilson C. Hsieh, Alberto Lerner, Jayant Madhavan, S. Muthukrishnan: Data Management Projects at Google. SIGMOD Record 37(1): 34-38 (2008). | |
Michael J. Cafarella, Jayant Madhavan, Alon Y. Halevy: Web-Scale Extraction of Structured Data. SIGMOD Record 37(4): 55-61 (2008). |
Michael J. Cafarella, Dan Suciu, Oren Etzioni: Navigating Extracted Data with Schema Discovery. WebDB 2007. | |
Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko: Structured Querying of Web Text: A Technical Challenge. CIDR 2007. | |
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007. |
Michael J. Cafarella, Dan Suciu, Oren Etzioni: Structured Queries Over Web Text. IEEE Data Bulletin, December 2006, 29(4). | |
Luke McDowell, Michael J. Cafarella: Ontology-Driven Information Extraction with OntoSyphon. ISWC 2006. | |
Oren Etzioni, Michele Banko, Michael J. Cafarella: Machine Reading. Proceedings of AAAI 2006. |
Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni: KnowItNow: Fast, Scalable Information Extraction from the Web. HLT/EMNLP 2005. | |
Michael J. Cafarella, Oren Etzioni: A Search Engine for Natural Language Applications. WWW 2005. | |
Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(1): 91-134 (2005). |
Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison. AAAI 2004: 391-398. | |
Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Web-scale information extraction in knowitall: (preliminary results). WWW 2004: 100-110 | |
Michael J. Cafarella, Douglas R. Cutting: Building Nutch: Open-Source Search. ACM Queue 2(2): 54-61 (2004). |