Lin Ma

I am an assistant professor in the Computer Science and Engineering Division at the University of Michigan, Ann Arbor. My research leverages artificial intelligence (AI) and novel hardware (GPUs) to make database management systems (DBMSs) simpler and more efficient. I obtained my PhD from CMU, fortunately advised by Andy Pavlo. My PhD research focused on the architecture design of autonomous DBMSs, implemented in a DBMS prototype built from CMU. I worked on the Delta Lake/Lakehouse at Databricks before joining UMich.

Depending on the funding availability, I may recruit up to one PhD student to join my group at UMich in fall 2025. If you are interested, please apply to our CSE PhD program and mention my name in your application. You can also email me if you have specific interests/questions. I may not have time to reply to all the emails, but I will read all of them and try to respond.

Download CV Research Statement Teaching Statement

Students

Current

Visiting

Experience

Working


  • Assistant Professor
    University of Michigan, Ann Arbor, August 2023 –

  • Software Engineer
    Databricks, Inc., July 2022 – August 2023

  • Post Doctoral Fellow
    Carnegie Mellon University, September 2021 – July 2022

  • Research Intern
    Microsoft Research, Data Management, Exploration, and Mining Group, May 2018 – August 2018

Teaching


  • UMich EECS 484 Database Management Systems
    Instructor [Fall 2024]

  • UMich EECS 484 Database Management Systems
    Instructor [Winter 2024]

  • UMich EECS 584 Advanced Database Management Systems
    Instructor [Fall 2023]

  • CMU 15-445/645 Database Systems
    Instructor [Fall 2021]

  • CMU 15-721 Advanced Database Systems
    Head Teaching Assistant [Spring 2019]

  • CMU 15-445/645 Database Systems
    Teaching Assistant [Fall 2018]

Service


To the Profession

To the University

  • UMich CSE PhD Admissions Committee – 2024
  • UMich CSE PhD Admissions Committee – 2023
  • CMU CSD Faculty Search Committee – 2020
  • CMU CSD MS Admissions Committee – 2018
  • CMU CSD Graduate Student Recruitment (Open House) Committee – 2018

Publications


  • Hit the Gym: Accelerating Query Execution to efficiently Bootstrap Behavior Models for Self-Driving Database Management Systems
    Wan Shen Lim, Lin Ma, William Zhang, Matthew Butrovich, Samuel Arch, and Andrew Pavlo
    VLDB 2024 [pdf]

  • Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
    Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, and Rada Mihalcea
    ACL Findings 2024 [pdf]

  • Database Gyms
    Wan Shen Lim, Matthew Butrovich, William Zhang, Andrew Crotty, Lin Ma, Peijing Xu, Johannes Gehrke, and Andrew Pavlo
    CIDR 2023 [pdf]

  • Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems
    Matthew Butrovich, Wan Shen Lim , Lin Ma, John Rollinson, William Zhang, Yu Xia, Andrew Pavlo
    SIGMOD 2022 [pdf]

  • MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems
    Lin Ma, William Zhang, Jie Jiao, Wuwen Wang, Matthew Butrovich, Wan Shen Lim, Prashanth Menon, Andrew Pavlo
    SIGMOD 2021 [pdf] [short video] [long video] [code]

  • Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation
    Andrew Pavlo, Matthew Butrovich, Lin Ma, Prashanth Menon, Wan Shen Lim, Dana Van Aken, William Zhang
    VLDB 2021 [pdf]

  • Filter Representation in Vectorized Query Execution
    Amadou Ngom, Prashanth Menon, Matthew Butrovich, Lin Ma, Wan Shen Lim, Todd C Mowry, Andrew Pavlo
    DAMON 2021 [pdf]

  • Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems
    Ling Zhang, Matthew Butrovich, Tianyu Li, Andrew Pavlo, Yash Nannapaneni, John Rollinson, Huanchen Zhang, Ambarish Balakumar, Daniel Biales, Ziqi Dong, Emmanuel J Eppinger, Jordi E Gonzalez, Wan Shen Lim, Jianqiao Liu, Lin Ma, Prashanth Menon, Soumil Mukherjee, Tanuj Nayak, Amadou Ngom, Dong Niu, Deepayan Patra, Poojita Raj, Stephanie Wang, Wuwen Wang, Yao Yu, William Zhang
    CIDR 2021 [pdf]

  • Active Learning for ML Enhanced Database Systems
    Lin Ma, Bailu Ding, Sudipto Das, Adith Swaminathan
    SIGMOD 2020 [pdf][slides] [poster][video]

  • Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling
    Prashanth Menon, Amadou Ngom, Lin Ma, Todd C. Mowry, Andrew Pavlo
    VLDB 2020 [pdf]

  • External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems
    Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, Ruslan Salakhutdinov
    TCDE Bulletin 2019 [pdf]

  • Query-based Workload Forecasting for Self-Driving Database Management Systems
    Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon
    SIGMOD 2018 [pdf][slides] [code] [poster][video]

  • Self-Driving Database Management Systems
    Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al.
    CIDR 2017 [pdf]

  • Larger-than-Memory Data Management on Modern Storage Hardware for In-Memory OLTP Database Systems
    Lin Ma, Joy Arulraj, Sam Zhao, Andrew Pavlo, Subramanya R. Dulloor, Michael J. Giardino, Jeff Parkhurst, Jason L. Gardner, Kshitij Doshi, Stanley Zdonik
    DAMON 2016 [pdf][slides][code]

  • Reducing the storage overhead of main-memory OLTP databases with hybrid indexes
    Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen
    SIGMOD 2016 [pdf]

  • PAGE: A Partition Aware Engine for Parallel Graph Computation
    Yingxia Shao, Bin Cui, Lin Ma
    TKDE 2015 [pdf]

  • Parallel Subgraph Listing in a Large-Scale Graph
    Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, Ning Xu
    SIGMOD 2014 [pdf]

  • PAGE: A Partition Aware Graph Computation Engine
    Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma
    CIKM 2013 [pdf]

Talks


  • Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics
    Meta, TBD
    Microsoft Gray Systems Lab, October 22, 2024
    Voltron Data, October 15, 2024

  • Self-Driving Databases and the Relevant Research Forefront
    Salesforce (Distinguished AI Speaker Series), April 1, 2024

  • Putting Your Database on Autopilot: Self-Driving Database Management Systems
    Cornell University, September 26, 2023
    Google, June 9, 2022
    Microsoft Research, March 29, 2022
    University of Michigan, Ann Arbor, March 22, 2022
    IBM Research, March 10, 2022
    University of Maryland, College Park, March 7, 2022
    Columbia University, March 2, 2022
    Northwestern University, February 23, 2022
    Oracle Labs, February 21, 2022

  • NoisePage: The Self-Driving Database Management System
    ByteDance, December 13, 2021
    Ahana, October 19, 2021
    University of California, San Diego, October 6, 2021 [video]
    Facebook, June 4, 2021
    Harvard University, May 28, 2021
    Columbia University, April 13, 2021
    Stanford University (MLSys Seminar), April 8, 2021 [video]
    Oracle, April 6, 2021
    Carnegie Mellon University, March 22, 2021 [video]
    Centrum Wiskunde & Informatica, March 19, 2021
    The University of Chicago, March 17, 2021
    University of Washington, March 3, 2021
    University of California, Berkeley, February 23, 2021
    University of California, Santa Cruz (CSE 215), February 19, 2021
    Technical University of Munich, February 18, 2021
    Brown University, January 27, 2021

  • MB2: Decomposed behavior modeling for self-driving database management systems
    SIGMOD, June 2021

  • Active Learning for ML Enhanced Database Systems
    SIGMOD, June 2020

  • Self-Driving Databases: It All Starts with Workload Forecasting
    Percona Live, May 2019

  • Efficiently Leveraging B-Instances for Query Plan Predictions
    Microsoft Research, August 2018

  • Query-based Workload Forecasting for Self-Driving DBMSs
    SIGMOD, June 2018
    Microsoft Research, May 2018
    PDL Retreat, October 2017

  • Larger-than-Memory Data Management on Modern Storage Hardware for In-Memory OLTP Database Systems
    SIGMOD, June 2016

  • The Self-Driving DBMS
    PDL Retreat, October 2016

  • Multi-Level Anti-Caching for NVM+SSD in H-Store
    PDL Retreat, October 2015

  • Finalist Presentation of Programming Contest
    SIGMOD, June 2014

  • Using Less to Do More With Anti-Caching in OLTP Database Systems
    Carnegie Mellon University, August 2014