I am an assistant professor in the Computer Science and Engineering Division at the University of Michigan, Ann Arbor. My research leverages artificial intelligence (AI) and novel hardware (GPUs) to make database management systems (DBMSs) simpler and more efficient. I obtained my PhD from CMU, fortunately advised by Andy Pavlo. My PhD research focused on the architecture design of autonomous DBMSs, implemented in a DBMS prototype built from CMU. I worked on the Delta Lake/Lakehouse at Databricks before joining UMich.
Depending on the funding availability, I may recruit up to one PhD student to join my group at UMich in fall 2025. If you are interested, please apply to our CSE PhD program and mention my name in your application. You can also email me if you have specific interests/questions. I may not have time to reply to all the emails, but I will read all of them and try to respond.
Haotian (Jack) Gong (co-advised with Barzan Mozafari)
Baoqing Cai (PhD)
Justin So (MS)
Assistant Professor
University of Michigan, Ann Arbor, August 2023 –
Software Engineer
Databricks, Inc., July 2022 – August 2023
Post Doctoral Fellow
Carnegie Mellon University, September 2021 – July 2022
Research Intern
Microsoft Research, Data Management, Exploration, and Mining
Group,
May 2018 – August 2018
UMich EECS 484 Database Management Systems
Instructor [Fall 2024]
UMich EECS 484 Database Management Systems
Instructor [Winter 2024]
UMich EECS 584 Advanced Database Management Systems
Instructor [Fall 2023]
CMU 15-445/645 Database Systems
Instructor [Fall 2021]
CMU 15-721 Advanced Database Systems
Head Teaching Assistant [Spring 2019]
CMU 15-445/645 Database Systems
Teaching Assistant [Fall 2018]
Hit the Gym: Accelerating Query Execution to efficiently Bootstrap Behavior Models for Self-Driving
Database Management Systems
Wan Shen Lim, Lin Ma, William Zhang, Matthew Butrovich, Samuel Arch, and Andrew Pavlo
VLDB 2024
[pdf]
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, and Rada Mihalcea
ACL Findings 2024
[pdf]
Database Gyms
Wan Shen Lim, Matthew Butrovich, William Zhang, Andrew Crotty, Lin Ma, Peijing Xu, Johannes
Gehrke, and Andrew Pavlo
CIDR 2023
[pdf]
Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems
Matthew Butrovich, Wan Shen Lim , Lin Ma, John Rollinson, William Zhang, Yu Xia, Andrew Pavlo
SIGMOD 2022
[pdf]
MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems
Lin Ma, William Zhang, Jie Jiao, Wuwen Wang, Matthew Butrovich,
Wan Shen Lim, Prashanth Menon, Andrew Pavlo
SIGMOD 2021
[pdf]
[short video]
[long video]
[code]
Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation
Andrew Pavlo, Matthew Butrovich, Lin Ma, Prashanth Menon, Wan Shen Lim, Dana Van Aken, William Zhang
VLDB 2021
[pdf]
Filter Representation in Vectorized Query Execution
Amadou Ngom, Prashanth Menon, Matthew Butrovich, Lin Ma, Wan Shen Lim, Todd C Mowry, Andrew Pavlo
DAMON 2021
[pdf]
Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in
Database Management Systems
Ling Zhang, Matthew Butrovich, Tianyu Li, Andrew Pavlo, Yash Nannapaneni, John Rollinson, Huanchen Zhang, Ambarish
Balakumar, Daniel Biales, Ziqi Dong, Emmanuel J Eppinger, Jordi E Gonzalez, Wan Shen Lim, Jianqiao Liu, Lin Ma,
Prashanth Menon, Soumil Mukherjee, Tanuj Nayak, Amadou Ngom, Dong Niu, Deepayan Patra, Poojita Raj, Stephanie Wang,
Wuwen Wang, Yao Yu, William Zhang
CIDR 2021
[pdf]
Active Learning for ML Enhanced Database Systems
Lin Ma, Bailu Ding, Sudipto Das, Adith Swaminathan
SIGMOD 2020
[pdf][slides]
[poster][video]
Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling
Prashanth Menon, Amadou Ngom, Lin Ma, Todd C. Mowry, Andrew Pavlo
VLDB 2020
[pdf]
External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems
Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, Ruslan
Salakhutdinov
TCDE Bulletin 2019
[pdf]
Query-based Workload Forecasting for Self-Driving Database Management Systems
Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon
SIGMOD 2018
[pdf][slides]
[code]
[poster][video]
Self-Driving Database Management Systems
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth
Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al.
CIDR 2017
[pdf]
Larger-than-Memory Data Management on Modern Storage Hardware for In-Memory OLTP Database Systems
Lin Ma, Joy Arulraj, Sam Zhao, Andrew Pavlo, Subramanya R. Dulloor, Michael J. Giardino, Jeff Parkhurst, Jason L. Gardner, Kshitij Doshi, Stanley Zdonik
DAMON 2016
[pdf][slides][code]
Reducing the storage overhead of main-memory OLTP databases with hybrid indexes
Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen
SIGMOD 2016
[pdf]
PAGE: A Partition Aware Engine for Parallel Graph Computation
Yingxia Shao, Bin Cui, Lin Ma
TKDE 2015
[pdf]
Parallel Subgraph Listing in a Large-Scale Graph
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, Ning Xu
SIGMOD 2014
[pdf]
PAGE: A Partition Aware Graph Computation Engine
Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma
CIKM 2013
[pdf]
Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics
Meta, TBD
Microsoft Gray Systems Lab, October 22, 2024
Voltron Data, October 15, 2024
Self-Driving Databases and the Relevant Research Forefront
Salesforce (Distinguished AI Speaker Series), April 1, 2024
Putting Your Database on Autopilot: Self-Driving Database Management Systems
Cornell University, September 26, 2023
Google, June 9, 2022
Microsoft Research, March 29, 2022
University of Michigan, Ann Arbor, March 22, 2022
IBM Research, March 10, 2022
University of Maryland, College Park, March 7, 2022
Columbia University, March 2, 2022
Northwestern University, February 23, 2022
Oracle Labs, February 21, 2022
NoisePage: The Self-Driving Database Management System
ByteDance, December 13, 2021
Ahana, October 19, 2021
University of California, San Diego, October 6, 2021
[video]
Facebook, June 4, 2021
Harvard University, May 28, 2021
Columbia University, April 13, 2021
Stanford University (MLSys Seminar), April 8, 2021
[video]
Oracle, April 6, 2021
Carnegie Mellon University, March 22, 2021
[video]
Centrum Wiskunde & Informatica, March 19, 2021
The University of Chicago, March 17, 2021
University of Washington, March 3, 2021
University of California, Berkeley, February 23, 2021
University of California, Santa Cruz (CSE 215), February 19, 2021
Technical University of Munich, February 18, 2021
Brown University, January 27, 2021
MB2: Decomposed behavior modeling for self-driving database management systems
SIGMOD, June 2021
Active Learning for ML Enhanced Database Systems
SIGMOD, June 2020
Self-Driving Databases: It All Starts with Workload Forecasting
Percona Live, May 2019
Efficiently Leveraging B-Instances for Query Plan Predictions
Microsoft Research, August 2018
Query-based Workload Forecasting for Self-Driving DBMSs
SIGMOD, June 2018
Microsoft Research, May 2018
PDL Retreat, October 2017
Larger-than-Memory Data Management on Modern Storage Hardware for In-Memory
OLTP Database Systems
SIGMOD, June 2016
The Self-Driving DBMS
PDL Retreat, October 2016
Multi-Level Anti-Caching for NVM+SSD in H-Store
PDL Retreat, October 2015
Finalist Presentation of Programming Contest
SIGMOD, June 2014
Using Less to Do More With Anti-Caching in OLTP Database Systems
Carnegie Mellon University, August 2014