My students and I currently work on projects in four areas of data mangement:
- Tools for Dataset Construction, including information extraction (from tables, spreadsheets, and text of various kinds) and data transformation.
- Data-Intensive Programming and Debugging, such as creating data transformation programs, exploiting code corpora, building large-scale debugging systems or debugging in the face of data-quality tradeoffs.
- Data Management for Economics, such as data systems for managing raw nowcasting evidence, using information extraction to attack trafficking crimes, or investigating data integration issues in macroeconomic statistics.
- System Support for Machine Learning Development, such as systems for feature engineering, efficiently querying image corpora for training set construction, and even some hardware ideas (and, awhile ago, Hadoop).
In addition to writing papers, we build real systems that aim to make large, concrete, real-world impact:
Data, code, and other resources
We are grateful to many different organizations for helping to fund our research:
- The Census Bureau
- General Electric
- The National Science Foundation