North Michigan, 2018

Wenjia He
Ph.D. student at University of Michigan

4957 BBB Building
2260 Hayward Street
University of Michigan, Ann Arbor
Ann Arbor, MI 48109

Email: wenjiah [at]

I am a Ph.D. student in Computer Science and Engineering at the University of Michigan, Ann Arbor, advised by Prof. Michael Cafarella. My research interest lies in the database management systems for video streams.

Prior to UMich, I received my B.S. in Mathematics and Applied Mathematics from the School of the Gifted Young, University of Science and Technology of China (USTC), in 2018.

You can find my CV here.

What's New

June 2020
I gave a talk on our paper at the 2020 ACM SIGMOD/PODS Conference.
March 2020
Our paper was accepted to SIGMOD '20: A Method for Optimizing Opaque Filter Queries.
January 2020
I passed the prelim exam and became a Ph.D. candidate.
August 2018
I moved to Ann Arbor and started my Ph.D. life at UMich.
June 2018
I was awarded Excellent Graduation Thesis Award in USTC (top 5%) and Outstanding Graduate of USTC.
April 2018
Our paper was accepted to USENIX ATC '18: Metis: Robustly Optimizing Tail Latencies of Cloud Systems.
September 2017
I was awarded National Scholarship, Ministry of Education of China (top 1% nationwide).
July 2017
I started my internship in Systems and Networking Research Group at Microsoft Research Asia (MSRA), supervised by Lead Researcher Chieh-Jan Mike Liang.
August 2014
I started my college life at University of Science and Technology of China.

Research Projects

Voodoo Indexing
Voodoo indexing is an efficient two-phase mechanism for optimizing queries with selection predicates that are implemented with user-defined functions (UDFs), called opaque filter queries. This method builds a hierarchical index structure that groups similar objects together before any query arrives, then builds a map of how much each group satisfies the predicate and exploits this map to avoid processing irrelevant data.

Metis is an effective service for robustly auto-tuning configurations of modern cloud systems, used by several Microsoft services. It implements a customized Bayesian optimization method, including diagnostic models and novel acquisition functions, to optimize tail latencies.



A Method for Optimizing Opaque Filter Queries SIGMOD '20

Wenjia He, Michael R. Anderson, Maxwell Strome, Michael Cafarella.
2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20)

An important class of database queries in machine learning and data science workloads is the opaque filter query: a query with a selection predicate that is implemented with a UDF, with semantics that are unknown to the query optimizer. Some typical examples would include a CNN-style trained image classifier, or a textual sentiment classifier. Because the optimizer does not know the predicate's semantics, it cannot employ standard optimizations, yielding long query times. We propose voodoo indexing, a two-phase method for optimizing opaque filter queries. Before any query arrives, the method builds a hierarchical "query-independent" index of the database contents, which groups together similar objects. At query-time, the method builds a map of how much each group satisfies the predicate, while also exploiting the map to accelerate execution. Unlike past methods, voodoo indexing does not require insight into predicate semantics, works on any data type, and does not require in-query model training. We describe both standalone and SparkSQL-specific implementations, plus experiments on both image and text data, on more than 100 distinct opaque predicates. We show voodoo indexing can yield up to an 88% improvement over standard scan behavior, and a 79% improvement over the previous best method adapted from research literature.
@inproceedings{he2020method, title={A Method for Optimizing Opaque Filter Queries}, author={He, Wenjia and Anderson, Michael R and Strome, Maxwell and Cafarella, Michael}, booktitle={Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data}, pages={1257--1272}, year={2020} }

Metis: Robustly Optimizing Tail Latencies of Cloud Systems USENIX ATC '18

Zhao Lucis Li, Chieh-Jan Mike Liang, Wenjia He, Lianjie Zhu, Wenjun Dai, Jin Jiang, Guangzhong Sun.
Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC '18)

Tuning configurations is essential for operating modern cloud systems, but the difficulty arises from the cloud system’s diverse workloads, large system scale, and vast parameter space. Building on previous space exploration efforts of searching for the optimal system configuration, we argue that cloud systems introduce challenges to the robustness of auto-tuning. First, performance metrics such as tail latencies can be sensitive to nontrivial noises. Second, while treating target systems as a black box promotes applicability, it complicates the goal of balancing exploitation and exploration. To this end, Metis is an auto-tuning service used by several Microsoft services, and it implements customized Bayesian optimization to robustly improve auto-tuning: (1) diagnostic models to find potential data outliers for re-sampling, and (2) a mixture of acquisition functions to balance exploitation, exploration and re-sampling. This paper uses Bing Ads key-value store clusters as the running example – compared to weeks of manual tuning by human experts, production results show that Metis reduces the overall tuning time by 98.41%, while reducing the 99-percentile latency by another 3.43%.
@inproceedings{li2018metis, title={Metis: Robustly tuning tail latencies of cloud systems}, author={Li, Zhao Lucis and Liang, Chieh-Jan Mike and He, Wenjia and Zhu, Lianjie and Dai, Wenjun and Jiang, Jin and Sun, Guangzhong}, booktitle={2018 $\{$USENIX$\}$ Annual Technical Conference ($\{$USENIX$\}$$\{$ATC$\}$ 18)}, pages={981--992}, year={2018} }