Review for Paper: 18-Exploring Query Execution Strategies for JIT, Vectorization and SIMD

Review 1

This paper proposes a design to evaluate query efficiently. Previous approaches include block-at-a-time execution (vectorization) and data-centric JIT compilation. The proposed approach is to use compact data types, use SIMD to achieve data parallelism, and use in-register aggregation to exploit the benefits of compact data types.

The paper first illustrates TPC Query 1 (Q1) to show defects of combined vectorized and JIT compilation when executing Q1, detailed aspects include efficient evaluation on compact or full data types, hash optimizations, and the aggregation overflow problem.

Then, the paper introduces the proposed Q1 implementation, mainly their in-register aggregation technique. The in-register aggregation prevents conflicts and reduces load/stores compared with array-based aggregation, which improves performance, and it's also adaptive in the sense that when partial shuffle fails, it can use normal vectorized aggregation as back-up.

The evaluation results show that on different hardware and different implementations of Q1, the proposed technique outperforms both Hyper-alike and X100-alike systems.

The part I don't like about this paper is, it seems to focus too much on Q1, which loses some generalization. Since only Q1 performance is compared, it's hard to see an average performance improvement over existing techniques.



Review 2

Single instruction, multiple data (SIMD) capabilities increase the efficiency of query processors. Two approaches are designed towards exploiting SIMD. Block-at-a-time processing reduces interpretation overhead and enables a direct mapping onto SIMD instructions. Just-in-time (JIT) compilation of database query pipelines into low-level instructions provides an alternative way to reduce interpretation overhead. Previous paper proposes Hyper system that implements the second approach. However, the data-centric compilation leads to tight tuple-at-a-time loops, which do not exploit SIMD because multiple tuples typically execute at the same time. This paper aims at exploring several designs and improvements on efficiency query execution strategies for JIT, vectorization and SIMD.

The paper explores the execution options by generating different implementation variants and benchmarking. It discusses TPC-H Q1 execution including operating on compact data types, perfect identity hash optimization and guarding against numerical overflow in aggregates. After that the paper presents different implementation flavors they created for Q1 and then ends with discussion about future works.

Some of the strengths of this paper are:
1. The paper pointed out the fact that compact data types significantly benefits query executors on SIMD instructions by maximizing SIMD data parallelism.
2. In-register aggregation make use of partial aggregates, which may use smaller types and can be vectorized easily, to exploit the benefits of compact data.

Some of the drawbacks of this paper are:
1. The paper is not giving a solution on how a system can both keep most of the data represented tightly and handle updates.
2. Overflow prevention optimization for aggregate calculations is costly in performance.



Review 3

Two of the most well-known query processing architectures, interpreted block-at-a-time execution (a.k.a. “vectorization”) and “data-centric” JIT compilation, do not fully explore the design space in terms of granularity of execution and unit of compilation, in particular when considering exploiting SIMD. Therefore, this paper partially explored the design space for efficient query processors on future hardware rich in SIMD capabilities.

By focusing on restricted set of operations (TPC-H query), this paper explored the execution options that combining vectorization with JIT brings, specifically also trying to use SIMD instructions, by generating many different implementation variants and benchmarking these. It also explained in detail considerations regarding operating on SQL data in compact types, and the system features that could help using as compact data as possible. Besides, this paper also proposed “in-register aggregation” strategy which reduced memory pressure but also allowed to compute on more compact, SIMD-friendly data types. Then this paper gave an in-depth discussion of detecting numeric overflows, where making a case for numeric overflow prevention, rather than detection.

The main contributions of this paper are as follows:
1. It showed that there is a large design space for combined vectorized and JIT compilation.
2. It proved that query executors on SIMD significantly benefit from using compact data types.
3. It discussed design options for database storage that maximize opportunities for using thin data types. Specifically, this involves both compact storage and associated statistics, but also what policy is used for numeric overflow.
4. It proposed a new adaptive “in-register” aggregation strategy, that adaptively triggers in aggregations with highly group locality (few neighboring distinct groups in the tuple stream).

The main advantage of this paper is it emphasized the importance of SIMD capabilities, therefore argued the possible design space to be explored other than the most popular query processing architectures. Hence, it can inspire future database query engine designers to propose innovative vectorized query compilers.

The main drawbacks of this paper are as follows:
1. It did not fully explore all the compact data representations(such as compressed execution) or methods to fully take advantage of compact data representations during query evaluation.
2. It only proposed an idea of how to explore the design space instead of giving a new architecture for data processing.


Review 4

Problem & Motivation:
In the past two decades, SIMD instructions in common x86 have evolved from 64-bit width to the current 512-bit width. Therefore, the traditional DB design should not ignore the possibility to build the entire database infrastructure on SIMD. In addition to that, the current designs of DB utilize mainly two approaches, one is block-at-a-time execution (or “vectorization”), the other is “data-centric” JIT compilation. The space between those two approaches has not been explored well. Hence, the authors research the detail considerations regarding operating on SQL data.

Main Achievement:
It proposed the in-register aggregation. In-register aggregation can be used to speed up the query processing and the reason it is faster is that it allows to operate on compact data types and is SIMD-friendly. Also, it can be used on JIT compilation and vectorization. One important data structure for in-register aggregation is the perfect identity hash optimization. Perfect identity hash optimization’s idea is similar to PSMA. It only reserves 1 + Max – Min slots and only store the delta value (V - min) in the table. One error may be caused by the in-register aggregation is the numeric overflow. The strategy to solve this issue is through prevention the possible overflow rather than check whether the value is overflowed or not.

Drawback:
This paper does not provide the whole overview of the DB design based on SIMD and “in-register aggregation”. Some parts like the indexing, join, are missed, because otherwise, the scope of the paper is too ambitious. So, this paper does not propose some mature or well-established data structure, but rather a stepping stone for later researchers.




Review 5

For many decades, query processing has been based on tuple-at-a-time iterator models, but recently, advances have led to the increased use of block-at-a-time execution or vectorized execution that operate on multiple tuples at a time. This results in lower interpretation overhead as well as directly mapping onto SIMD instructions, making parallelization easier. This introduces a lot of additional design considerations when it comes to using just-in-time (JIT) compilation, vectorization, and SIMD to optimize the performance of query execution. The paper “Exploring Query Execution Strategies for JIT, Vectorization and SIMD” attempts to explore parts of this design space in order to identify potential tradeoffs and draw lessons that can be used in future designs. To do so, they look at Query 1 in the widely used TPC-H benchmark, and measure computational efficiency for different combinations of vectorized and JIT compilation.

They start off by comparing the HyPer system to VectorWise, in order to confirm the previously reported results that JIT tuple-at-a-time compilation on the HyPer system outperformed the VectorWise method of vectorized execution. After confirming this, they then showed a new “in-register” aggregation strategy that put vectorized execution back on top in terms of performance. The authors note, however, that their “in-register” method could also be applied to JIT, and use this to demonstrate the large design space for query compilation yet to be comprehensively explored. Following this examination, the authors show that many of the additions and subtractions done in Query 1 can be done with single-byte logic, which suggests that often, the implementation uses excessively wide data types that only serve to increase SIMD work without improving results. They also identify potential challenges when it comes to recognizing and optimizing for the most compact data types for better performance, like possible overflow. For example, all it takes is one large value to force all computation to use a much wider data type. Following this discussion, the authors present their particular “implementation flavor,” the one based on in-register aggregation. The main difference from array-based aggregation is that their method reorders all active vectors such that all tuples belonging to the same group appear consecutively, which reduces the amount of loads and stores into the register.

The main contribution of this paper is an in-depth look into the theoretical design considerations when it comes to using and/or combining JIT with vectorized execution and SIMD for query processing. Many of the principles that they identify, such as keeping information on registers as long as possible to reduce I/O overhead, as well as using the most compact data types possible, are well-known principles used elsewhere, but this paper is still important in directly showing how it can be applied to database query processing. As their results show, their optimizations did yield some performance gains, and point the way toward further work in this field.

As for weaknesses, this paper probably could have covered a larger scope, though that could have been limited by length restrictions or the fact that this analysis is still in its early stages. They only examined a single query, which is relatively simple and does not include more complex operations such as JOIN. It would probably be a good follow-up topic to investigate, particularly as to how well SIMD works with JOIN operations.


Review 6

This paper’s goal is to educate the reader on data processing in “modern, heterogeneous, computing architectures.” This is important because the reader can potentially make their own future contributions to the spaces outlined in the paper. The paper namely explores columnar execution versus data-centric execution. It’s main contributions are exploring how compact data types greatly influence query executors using SIMD execution, exploring the potential of the space of combining vectorization compilation techniques and JIT, and finally the “in-register” aggregation strategy which actually if implemented will improve over the fastest existing execution strategy in the HyPer architecture.

In order to compare all of the data processing approaches, the paper presents TPC-H Query 1, called Q1. This problem demonstrated the large interpretation overhead of tuple-at-a-time for analytical queries and motivated the VectorWise system. The paper also distinguishes between compact and full data types where some operations on data of larger size might force your computations into the largest integer width, which poses a threat to computation. We look into the perfect identity hash optimization which is essentially exploiting the limited domain of groups/columns by using the integer encoding of their single character values. We then looked into guarding against and preventing overflows. In the case of some operators, we could leverage the domains of those operations and find no need to check for overflows, but for other operators like sum() we would need to implement bounds on the results of the operations.

The author explains their proposed “in-register” aggregation method which is essentially a partial-shuffle to cluster the group ids in the selection vector, followed by two steps: storing the consecutive per-group positions and then calculating the groups/aggregates using ordered aggregation which is an algorithm outlined in the paper. The author also detailed a hand-written AVX-512 implementation of Q1 that used SIMD and was exposed to the vulnerability of write conflicts, which they worked around by enforcing data-parallelism degrees. Finally the runtimes of many implementation flavors of Q1 were compared in the evaluation section.

I liked how this paper had an explicit improvement on the existing implementation of HyPer, and stated that the reader could themselves improve on its query execution strategy. The “in-register” aggregation method was explained in detail and it was left open for the user to implement. I did however, find that section a bit non-linear in it’s description of the method, so it was a bit convoluted to understand.



Review 7

This paper aims at efficient query processors on hardware which is rich in SIMD capabilities. The dataset the author focusing on is TPC-H that provides implementation alternatives and benchmarking on different architectures. This paper also discusses different implementations of aggregation and proposes "in-register" aggregation which can reduce memory pressure and allowing processing on compact datatypes.

The background of query processing architecture has changed from using "tuple at a time" iterator model to "block at a time" model. The advantages of this are 1) reduce interpretation overhead 2) enabling direct mapping onto SIMD instructions. In the introduction of this paper, the author also gives answers of other ways to avoid interpretation overhead and gives the reason why a mix of vectorization and JIT is needed. Also the introduction of SIMD and its history are given. The experiment compared multiple implementations of TPC-H Q1 against HyPer-inspired, X100-inspired, and hand-written Q1 on different hardware and it shows that 1)using compact data type is important for efficiency. 2)in generic execution strategy, "in-register" aggregation performs the best.

The "in-register" aggregation is proven to be fast because it allows to operate on compact data types and it is SIMD friendly. For compact data type, the execution on compact data types can be seen as the case of compressed execution. The compression schemes store the outliers in separate location. For Q1 implementation flavors, except for the aforementioned design of experiment, it also includes implementation in Weld. And variations are data types, aggregation storage layout, overflow checking and prevention, and aggregation algorithms.

The main contributions of this paper land: 1)showing that there is large space for combined vectorized and JIT compilation. 2)showing that query executors on SMID benefit from using compact data types 3)proposed new adaptive "in-register" aggregation strategy which triggers in aggregations with highly group locality. 4)this paper will give inspirations of future database query engine with regard to innovative vectorized query compilers.

This paper is relatively hard to understand especially for the detailed execution. Also as the conclusion illustrates the improvement of this paper may be to exploring more compact data representations and how to fully take advantages of it.


Review 8

This paper address the problem that how we should design a efficient query processors when future hardware that is rich in SIMD capabilities. There are two previous proposed method block-at-a-time execution and JIT compilation. However the author believe that there is a lot to be explored in between and also especially on SIMD.

The paper try to find out the trade off between the above two extreme example, so the paper focus on the a restrict set of operation(TPC-H query). In the paper, the author discuss three aspects of TPC-H Q1, that are operating on compact data types, the perfect identity hash optimization and guarding against numerical overflow in aggregates. Then the author discuss their different implementations. The author showed their In-register aggregation and fully-vectorized implementation. The “in-register” aggregation is an adaptive way of handling aggregates. If there are too many distinct group value, the partial shuffle can fail.

The main contributions of this paper are list here: first the author showed that there is a large space to explore in between the design of JIT and vectorization. Secondly, the paper also showed that the query execution on SIMD with compact data type and also suggest the design choice for database storage. More over, this paper proposed a new a new adaptive “in-register” aggregation strategy. At last, there is one thing that need to be mentioned that, this paper proposed something that can inspire the researchers in the future to design a novel data processing architecture.

I think the one week point is that this paper does not provide a new architecture or it does not provide a new architecture for data processing. Though the discuss and claims in this paper looks make sense, it would be a lot better if the paper can propose a modern structure of data processing and implement it.



Review 9

This paper by Gubner and Boncz tries to imagine a middle ground between JIT and vectorization, two popular modern approaches to query execution. The authors emphasize the use of compact data types, which can be used to take advantage of SIMD. The idea is that even for data that is stored in 64 bit integers (or equivalent floats), the data actually frequently takes on a small set of values and does not need too much space. Knowledge of the true range of the data allows for advanced optimizations. Some assumptions do seem closely tied to Q1 of TPC-H, although it seems feasible that they are relevant for real world datasets as well.

The authors additionally dive into hash optimization, which takes advantage of the limited range of values explored in the section about compact data types. Overflow protection is also explored as an area for potential improvement; especially with SIMD, where overflow flags are not provided the same way as they are for traditional CPU instructions, this becomes a key bottleneck to explore. In-register aggregation is presented as a way to efficiently perform grouping operations that is able to take advantage of SIMD. Overall, the paper explored many areas for improvement and potential optimizations for query execution.

Compared to other papers, I found that the introduction was very well done and provided sufficient background to truly understand the problem. JIT and vectorization are both very well explained, as is the intention on focusing on Q1 of TPC-H. The authors don’t just say that this will be their focus; they explain clearly why they think that this is a good benchmark for the problem that they aim to solve, and they write out the full query as a figure in the paper. Then, they constantly refer to the query when explaining various concepts. I would look to this as a model for how to write a thorough introduction.

Unfortunately, I didn’t find the quality of other parts of the paper to be quite as high as the quality of the introduction. Notably, the graphs in the results section were extremely disappointing. Figure 3, for instance, plots 6 lines and pairs of them have both the same color and symbol, making them indistinguishable. Of course, some context is given in the text of the paper itself, but ultimately it almost feels as if the reader is left to make some assumptions, which shouldn’t be the case in a results section. While figures 3 and 4 were the most egregious, figures 5 and 6 were also quite busy and hard to read. Finally, I felt as though the conclusion was mostly well done, but I do always find it a bit silly when authors focus on the fact that they are not quite at the level of hand-optimized query plans. This almost seems to be a given in database systems.


Review 10

This paper explores the performance of different query processing strategies on the TPC-H Q1 workload. The goal here is to illustrate the large design space for query processing when combining vectorized and JIT compilation techniques. Also, the paper shows that use of SIMD instructions further enlarges the design space.

Besides these traditional techniques, the paper also raises some other considerations when designing query processing strategy:
Compact data types: The schema of a table can only give coarse data type information. However, if more accurate and precise statistics can be obtained, the process engine can use this information to choose more compact data types and representations. Thus enhance performance.
Overflow prevention: Tradition methods use overflow checking, which more or less will incur some overhead. If we have knowledge such as the number of data tuples, data domain, then in some case we can be sure no overflow will occur and hence no checking is needed and no overhead incurred.
In-register aggregation: When using SIMD instructions to do aggregation, special care is needed so that there are no lost updates in the aggregation table. The author of the paper thus suggests a new algorithm based on the idea of virtually reorder data tuples so that consecutive tuples belong to the same group. They claim this method can also reduce the amount of load and store operations.

The paper conducts experiments to show the performance of different processing strategies. Based on the experiment, they noticed that the combination of vectorized execution, compact data types, overflow prevention, and in-register aggregation can achieve the best performance on their test workload (TPC-H Q1).

In my opinion, one main drawback of this paper is that the experiment is only performed on one workload. Thus I am not sure whether the combination the paper suggests can also outperform other strategies on a different workload. Therefore, the usefulness of overflow prevention and in-register aggregation is unclear.


Review 11

In the paper "Exploring Query Execution Strategies for JIT, Vectorization and SIMD", Tim Gubner and Peter Boncz explore the design space for efficient query processors on hardware that is proficient in SIMD capabilities. More specifically, they peek at the realm between interpreted block-at-a-time execution and data-centric JIT compilation in conjunction with SIMD. On one end of the spectrum, we have block-at-a-time execution which comes with two major advantages: reduced interpretation overhead and a direct mapping onto SIMD instructions. On the other end of the spectrum, we have JIT compilation of database query pipelines into LLVM which reduce interpretation overhead, work well for analytical and transactional queries, and has better overhead for mixed workloads. However, JIT compilation has tight tuple-at-a-time loops which do not exploit SIMD. Current research has studied the potential for SIMD, but focuses on SIMD-izing operations rather than evaluating future hardware and database architecture. Thus, rather than proposing a new architecture, this paper attempts to shed light on the trade-offs between these two extreme points by combining vectorization with JIT. With so many "design flavors", Gubner and Boncz deliver that this is a problem worth investigating.

Using a TPC-H query, Q1, as a guideline for discussion, Gubner and Boncz discuss aspects that are vital to implementation flavors:
1) Compact vs Full Data Types: Observing Q1, they decide to use a 64-bit integer to represent l_tax. However, there are more subtleties that follow: l_returnflag and l_linestatus are represented as single byte chars, l_extendedprice has a domain that fits 32-bit integers, and operations with addition, subtraction, and multiplication use various amounts of bytes to operate. Thus, we hit a wall when trying to optimize for a system that can update beyond its domain or enforce its tight bounds. Even though Gubner does not give an exact answer, he urges research in this area.
2) Perfect Identity Hash Implementations: Current systems use hash-based group IDs that when dealing with standard aggregations. We can exploit the fact that columns have a limited domain and a specific optimization can represent their char values as Unicode integer values. Thus, a single column can be used as an array index group-ID and leave non-occurring character slots in the aggregation table unused. Even though it is extra memory, it is not that much in the grand scheme of things.
3) Guarding against Overflow in Aggregates: Current systems such as C or C++ have conditionals that check for overflows at each operation - something that can lead to large overheads. LLVM is faster but also breaks down as this acts as the bottleneck for performance. Having prior knowledge of the domain can help us determine overflows - we can implement a maximum tuple bound that never occurs in practice and have the system return a run time error in the case it does happen.
4) In-register aggregation: This is mentioned as an optimization to eliminate conflicts, reduce the amount of loads/stores, and utilize a virtual order to exploit faster aggregation times.

Even though the paper gave great insight and opportunities for theoretical advances toward SIMD parallelism, it still has some drawbacks. One major drawback was the way that this paper was presented. When designing a problem statement, there needs to be a good balance between generality and interestingness. If there are too many constraints, the paper suffers from a lack of application; if there are too little constraints, it is unfeasible to even discuss the problem. I felt that the paper suffered from the latter. It gave a little guidance to the audience about current approaches and leaves them in the dark for how they could actually apply this to a real setting. Another drawback was how they did not discuss branch prediction of hardware and how that doesn't increase overhead for checking for overflows.


Review 12

This paper describes the improvements that can be made with several different types of query execution strategies. First, it goes over loading vectorized query results versus standard tuple-at-a-time execution. While vectorized execution allows data to be loaded more naturally, effective pipelining of tuplewise execution can allow it to be faster. Secondly, it describes compacting data types. Compressing data types based on the actual range of the values can shrink them so as to improve parallel computing performance. Thirdly, it discusses aggregation strategies. Using larger data types can prevent overflow, at the cost of worse parallel performance. In-register aggregation can also be used to speed up any aggregation task.

This paper does promote methods of query execution that support SIMD parallelism. This is most notably done by using compact data types, such as smaller integers, that better lend themselves to improved parallel execution. Due to the increase in use of highly parallel computers, this should allow greater efficiency on these machines.

This paper was extraordinarily difficult to follow. I couldn’t really tell what the main purpose of it was; it just seemed to be going over how various execution structures performed on a single TPCH query. The introduction and conclusion seemed not to have as clear of a purpose statement either. It seemed to switch from comparing vectorized versus tuple execution, to SIMD improvements, to aggregation strategies without explaining any of them very well or showing how these things fit together.



Review 13

The paper explored the design space for efficient query processors mainly from two approaches: interpreted block-at-a-time execution “vectorization” and “data-centric” JIT compilation. The design space is based on two aspects: granularity of execution and unit of compilation. The paper argued for redesigning database systems to allow using more compact data types during query evaluation then naturally provided by the schema. This allows to maximize SIMD data parallelism and leads to more efficient processing. Further, the authors presented in-register aggregation, an efficient aggregation technique that can further exploit the benefits of compact data types as partial aggregates may use smaller types themselves and can be vectorized easily.

The main contributions of this paper can be divided into three parts: they show that there is a large design space for combined vectorized and JIT compilation in the form of different implementation flavors for Q1. They show that query executors on SIMD significantly benefit from using compact data types and discuss de- sign options for database storage that maximize opportunities for using thin data types. Specifically, this involves both compact storage and associated statistics, but also what policy is used for numeric overflow. They contribute a new adaptive “in-register” aggregation strategy, that adaptively triggers in aggregations with highly group locality.

As for the drawback of this paper, I’m confused about the authors’ intention to take a pessimistic aggregation strategy to optimize scenarios dealing with compact data types since its initial purpose is to explore design space from vectorization and JIT-compilations. The work is not supposed to execute on an assumption of compact and contentious scenarios and should be more general.




Review 14

This paper explores the design space for efficient query processors. The main approaches are block-at-a-time execution and “data-centric” just in time compilation. The advantages of block-at-a-time execution is reduced interpretation overhead and enabling a direct mapping onto SIMD instructions. Since tuple-at-a-time approach takes more than 90% of CPU, block-at-a-time delivers a much better performance. On the other hand, JIT proposes to compile database query pipelines into low-level instructions. This approach leads to tight tuple-at-a-time loops, which is the scenario the first approach tries to avoid. Therefore, the paper proposes the execution options that combining vectorization with JIT. The paper first illustrates that there is a large design space for combined vectorized and JIT compilation using Q1. There could be "compressed execution" with VectorWise use “patching” while JIT-can generate specific code for each compressed representation. The perfect identity hash optimization in Q1 also provides fertile ground for SIMD execution. Overflow prevention as an optimization for aggregate calculations are faced with significant cost, however the “in-register” aggregation approach of JIT reduces memory access. Therefore, overflow prevention can be applied much more aggressively. Then the paper shows that query executors on SIMD significantly benefit from using compact data types. The “in-register” aggregation is an adaptive way of handling aggregates, it can fail if there are too many distinct group values in the vector. Therefore normal vectorized aggregation primitives are useful here. Evaluation is performed to find out in which situations the in-register aggregation
excels comparing with standard array-based aggregation. In case of few distinct group values, in-register aggregation outperforms the standard array-based aggregation. Then the paper evaluates multiple versions of Q1 comparing their response time. The ones without overflow detection outperforms the ones with the feature. As in conclusion, more compact data types allows to maximize SIMD data parallelism and leads to more efficient processing. Therefore, it is worth to further explore the design space.

The advantage of this paper using the same example, Q1, to illustrate all the points. Although, technical parts are quite hard to understand, using one example does help clear things out. The disadvantage of this paper is that there are no definite answers to almost all topics covered in this paper. Maybe introducing some current approaches would help.


Review 15

This paper is a very novel kind of paper for me. I haven't read such kind of paper before. In my view, the whole paper was built upon a benchmarking query TPC-H. They looked extremely detail into the execution of the query under many different flavors. The reason why they did such research is that they observed a large design space to explore between the interpreted block-at-a-time execution and data-ecentric JTI compilation. What's more, they showed that modern processors with SMID should be taken into account when evaluating database architecture. Since it is a different and much broader level of evaluation considering this time and age. The execution of TPC-H Q1 is discussed in detail, covering the different choices of compact data types, identity hash optimization and guarding against numerical overflow in aggregates. The paper also gave their own flavor of Q1 implementation, which uses in-register aggregation and taking advantages of AVX-512. In the evaluation part, the paper did enough experiments on my view. Current state-of-the-art of different flavor is chosen to use in the experiment. The paper shows there is truly much design space to be explored.

The strong part of the paper to me is that it really looks detail into the benchmarking query. Every part of the query execution is taken into account. Also, the writing of the paper is satisfactory. Many figures and tables are used in the paper to illustrate their idea and show the experiment result. It is really helpful for readers since the number of the experiment of the paper is large.

The drawback of the paper is exactly the same as the strong one. It only looks into the one benchmark query. So I wonder whether their observation is limited or biased by this specific query.


Review 16

In this paper, they explored the design space for efficient query processors on SIMD capabilities from vectorization and data-centric JIT compilation. Besides, they also proposed a novel adaptive aggregation strategy. Working on query processing architectures is definitely meaningful, as they mentioned in their paper, in the past decades, this architecture has evolved very fast from traditional tuple-at-a-time iterator to vectorization and JIT-compilation. At the same time, the SIMD instructions in common x86 have evolved from MMX to AVX-512. Due to the coming of novel techniques, it is worthwhile to reconsider the old architecture and propose a novel query processor based on new techniques. Besides, it is also significant to do an investigation of existing work and find potential design space. Next, I will summarize the crux of this paper with my understanding.

As they said in their paper, the advantages of a vectorization are that it reduces the interpretation overhead and enables a direct mapping on SIMD instructions. For JIT compilation, it also avoids this interpretation overhead by compiling the database query pipelines into low-level instructions. The HyPer is an example of utilizing JIT compilation which is a data-centric compilation leads to tight tuple-at-a-time loops, however, it does not exploit SIMD. The advantage of JIT is that it is not only beneficial for analytical queries but also transactional queries. Besides, JIT-compilation provides a chance to compile and optimize together in a single framework both user and database systems operations. The mixture of vectorization and JIT leverage the advantage of both and a representative example is the Data Blocks paper. In this paper, they first discuss the TPC-H Q1 in detail covering three main aspects include operating on compact data types, the perfect identity hash optimization and guarding against numerical overflow in aggregates. For the Q1 implementation flavors, their method includes variations in data types used, aggregation storage layout, overflow checking/prevention as well as variations of the aggregation algorithm. They implement their Q1 use the hand-written AVX-512 implementation utilizes overflow prevention and pushes 16 tuples-at-a-time through its pipeline. The experiments result indicate that use compact data type has become important for efficiency and their flavors using these consistently improve performance. Also, for modern architectures, there is still considerable headroom beyond the fastest system up to data on HyPer.

There are several technical contributions in this paper. First, they show that there is stall a large design space for combined vectorized and JIT compilation so that researchers can further explore this field. Second, they also show that the query executors on SIMD significantly benefit from using compact data types and discuss design options for database storage that maximize opportunities for using thin data types. Third, they contribute to a new adaptive “in-register” aggregation strategy which adaptively triggers in aggregations with highly group locality. To sum up, this paper not only performs novel experiments on existing techniques but also introduce something new, they want to inspire future database query engine designers to propose innovative vectorized query compilers.

The drawbacks of this paper are minor. In this paper, they only consider a restricted set of operations and focus on the well-known TPC-H query 1, and they do not consider the joins in the query, if they can include more operation, their experiments will be more convincing.



Review 17

This paper’s main goal is to argue that if DBMS’s can choose more compact data types than what is specified in the schema, several things can be done to improve performance. For example, SIMD data parallelism can be exploited more if data operations are as compact as possible. The paper also provides ideas regarding optimized aggregation, such as by storing aggregations in registers rather than SIMD registers, as doing the latter results in conflicts that cause anomalies in the aggregation table. These ideas are based on analyzing Q1 from the TPC-H benchmark and comparing with HyPer and AVX-512 systems.
The paper’s strength is that it does a good job relating most of its points to HyPer and AVX-512, which provides good contrasts to the reader. It also seems to achieve its goal of providing motivation for future research in this area.
There are several weaknesses I can see in this paper. The first is that all of its points are motivated and compared using only 1 query from the TPC-H benchmark, which I find strange. Another weakness is that the authors argue that data should be compacted more than schemas suggest so that their optimizations will be valid, but do not know how to make this data compaction possible in systems that must be updatable, which to me seems like the more important issue. In general, the conclusions of the paper seem to be quite weak compared to the other papers we have read. As a final note, the paper claims that HyPer does not leverage SIMD capabilities, but the Data Blocks paper we read claims that HyPer does provide SIMD-related optimizations, and this current paper references the Data Blocks paper, so I find that discrepancy strange as well.