Review for Paper: 9-Automated Selection of Materialized Views and Indexes for SQL Databases

Review 1

The execution speed of database queries depends greatly on what indexes and materialized views are present. This is because index scans and lookups in materialized views are much faster than the sequential scans, aggregations, or joins that would be required to perform certain queries without the help of the index or materialized view. Because there are exponentially many possible indexes and materialized views for a large set of tables, the choice of which ones to generate must be made carefully, based on the likely workload of future queries.

Agrawal, Chaudhuri, and Narasayya present a system for jointly optimizing the selection of indexes and materialized views, which they have incorporated into Microsoft SQL Server 2000. This system efficiently narrows the search space of potential indexes and materialized views to a set of candidates. The candidate set includes materialized views that are semantically related to frequent, expensive queries, and also merged views that are useful for multiple queries. A greedy selection method is used to choose a subset of the candidates, based on cost estimates for the expected workload.

The authors present an implementation of their index and materialized view selection system, which works in a production database system with the full range of SQL queries. Experiments show the candidate selection component eliminates roughly half of syntactically relevant materialized views, with only a slight increase in expected query cost. Most important, the joint selection of indexes and materialized views results in better query runtimes on benchmark data than selecting indexes first, then materialized views, or vice versa, when storage space is bounded. This shows that significant improvements can be made by jointly selecting indexes and materialized views.

The plan presented for selecting indexes and materialized views outperforms a sequential planner using the same methods to choose the components independently. But the method is compared to a restricted version of itself, so this is not surprising. The algorithms presented for candidate selection and configuration enumeration and simple heuristics, such as taking the total cost of queries using a set of tables as an indication that the set of tables should be included in a materialized view. It is likely that these algorithms could be improved on, perhaps through a machine learning approach, leading to a better index and materialized view selection module.


Review 2

The paper sheds light on the concept of automated selection of materialized views and indexes for SQL Databases which are usually reviewed in isolation of each other. There is a need for automatic tuning tool since if this selection is done manually, it is prone to error, time consuming and might not be able to handle continuously changing workload. A Materialized view is a view that stores physically the data from the query defined by a view which is a virtual table derived from the real tables of a database.

The authors explain the architecture employed for index and materialized view selection. It is based on a Syntactic structure selection and does configuration enumeration, simulation and cost estimation. It also does candidate selection which is identifying a set of indexes and materialized views for a given workload. Related work to this has also been discussed in brief.

The paper discusses how a candidate materialized view is selected via finding interesting table-susbsets, with help from the query optimizer to gain a syntactically relevant materialized views and merging of views. Sequence of pair-wise merges is done with the parent view being merged to generate a single merged view. Algorithms for merging are included in the paper. A greedy algorithm is used for selection of indexes. However, the paper doesn’t discuss issues related to selection of indexes on materialized views.

Together the interaction between indexes and materialized views improve the performance of a database. They have a greater impact in the presence of storage constraints and updates. Some experiments have been performed to evaluate the quality and running time of the authors’ algorithm for candidate materialized view selection. Another expirement on architecture issues has also been done.

The paper is able to provide fundamental knowledge of the physical database design which is robust. However, this concept can’t be used for SQL workloads since it will make the system overly complex. The indexes and materialized views discussed are only a part of physical design space and required in-depth knowledge of surrounding components as well. But the assumptions made in the paper are all supported by relevant experiments later on.



Review 3

What is the problem addressed?
This paper proposes a way to automatically selecting an appropriate set of materialized views and indexes for SQL databases.
Why important?
Besides indexes, the presence of the right materialized views can improve performance which make a judicious selection of materialized views is crucial problem to solve. In general, both indexes and materialized views are physical structures that can accelerate performance, and design tool should consider these two together to optimize the physical design for the workload on the system, but recent papers fail to do that and consider the problem of index selection and materialized view selection in isolation. Moreover previous paper focus on search problem of picking good materialized view form a given set which is not salable.
1-­‐2 main technical contributions? Describe.
The paper present an architecture and algorithm for address above problems. Firstly the paper show it's critical to search over the combined space of indexes and materialized views at the same time. Secondly, it provide a principled way to identify a smaller set of candidate materialized view. The task of candidate materialized view selection using three steps:
(1) From the large space of all possible table-subsets for the workload, we arrive at a smaller set of interesting table-subsets. A contribution they made is define a metric that capture the relative importance of a table-subset. (2) we can further reduce the space by exploiting the query optimizer and the algorithm is based on the intuition that if a materialized view is not part of the best solution for even a single query in the workload, then it is unlikely to be part of the best solution for the entire workload.(3) generate an additional set of “merged” materialized views in a controlled manner such that the merged materialized views can service multiple queries in the workload. Because step two only consider materialized views that are optimal for individual query, step 3 extend the solution space and build additional "merged" views on them and this approach is preferable for scaling.
1-­‐2 weaknesses or open questions? Describe and discuss
It will be good to develop a theoretical framework and abstraction for physical database design to compare alternative algorithm, and indexes and materialized view are merely part of the physical design space there are lots of physical design tool to explore.



Review 4

This paper presents a research result for automated selection of materialized views and indexes for SQL databases, including an architecture and novel algorithms, and a principled way to identify a small set of candidate materialized views. For the scalable approach for candidate materialized view selection, there are two important techniques: identify interesting set of tables (we can do further materialized views only over this set), and view merging technique (identify candidate materialized view that might be optimal for multiple queries but not for single query). Also, this paper provides discussion about trading choices of indexes and materialized views, and detailed performance experiments.

The general problem here is that there is no industry-strength tool for automated selection of materialized views and indexes for SQL workloads. The reason why we need to consider materialized views is that it sometimes can improve performance, particularly for decision support applications. The paper also lists some concrete examples to show the importance of materialized views. Since most of previous work in this area consider the index selection and materialized views separately, this paper provides an innovative way to combine both of them.

One of the major contributions in this paper is that it provides a way to eliminate lots of “disinteresting” set of candidate materialized views without throwing the “interesting” ones. This idea is very important in the method presented in this paper, because we don't need to search over all syntactically relevant indexes and materialized views (infeasible in practice) and we can focus on a small interesting subset. And this paper presents an efficient selection of candidate materialized views, which contains 3 main steps:


1. obtain a smaller set of interesting table-subsets from the large space of all possible ones.
2. select a best configuration for each query (cost-based analysis) in a set of materialized views
3. generate additional set of “merged” materialized views

One interesting observation: I think this paper is very good for presenting the architecture and novel algorithms for automated selection of materialized views and indexes for SQL databases, and it also provides detailed performance experiments. It seems to work well, and it has been embedded into Microsoft SQL Server 2000. However, it would be more convincing if this paper provides more mathematical or theoretic proof of some concepts, such as why the algorithm works.


Review 5

In this paper proposes a novel algorithm for the problem of selecting a good set of indexes and materialized views for a database. Before this paper, researchers usually considered indexes and materialized views separately. But an index can be seen as a case of materialized view. Also both indexes and materialized views are important parts of database physical designers so considering them together seems a wise decision.

The algorithm suggested and developed, based on a given workload, first identifies syntactically relevant indexes, materialized views and indexes on materialized views. Then, by using a cost estimation technique, it removes the candidates that are not worth further exploration. Finally, it searches among combinations of the remaining items and finds the best one for physical design.

In contrast with the novel architecture, there are some weaknesses in the paper, which are as follows:
-The paper does not cover the issue of selecting the indexes and materialized views and just refers to another paper.
-The paper could be better organized with a better structure. As a reader, I felt they are jumping to another issue while some issue is not closed yet.
-It is not clear that what happens if the workload changes after sometime. Is the selection of proper indexes and materialized views repeated or what?


Review 6

This article presents a system to automatically select which materialized views and indexes to use to speed up workloads. They present the idea that you should not consider index and materialized views separately, otherwise you are missing out on the possibility to dramatically improve performance. However, considering these two ideas together results in a much larger plan space, so something must be done to reduce this space. They propose a cost metric that, given a workload of queries, will allow finding interesting table-subsets. They also use the assumption that if a single materialized view is not at part of the best solution to some query, than it is very unlikely that it will be in the best solution for any query on the workload. They then consider merging views together, to further increase performance. They then discuss how they consider indexes and materialized views at the same time, and perform experiments to verify that their method performs better than building the indexes and materialized view in isolation.

The authors do a good job of presenting the reasoning behind their idea, and then verifying that their idea did what they wanted it to do. They showed that they were able to dramatically cut down on the number of materialized views to consider, while at the same time were able to keep the performance very near to the performance of an exhaustive algorithm. Their join selection of indexes and views at the same time yields better performance for the workload as a whole.

One weakness of this paper is the amount of knowledge that it already assumes the reader has. I had to brush up on the topic of materialized views, but that's hardly the authors problem. I did find that experimental setup could have been expanded upon. They said themselves that they limited their experiments to relatively small workloads - I would really liked to have seem how the performance of this algorithm scaled. The paper does also read a bit like a sales pitch - they really seem to be trying to convince me to buy some Microsoft software.


Review 7

Though the materialized view selection has been proposed before, no related work automated selection of materialized view and indexes for SQL has been done. So the paper shows the solution of selecting materialized view and index and provides evaluation of the solution. The solution is proposed by showing the architecture and algorithms.

The paper is good at providing pseudo code and detailed evaluation, which make the solution more convincing.


Review 8

This paper goes over the automated material views and index selection algorithm that is used in Microsoft’s SQL server 2000. Material view is structure richer than an index that provider better context to the data addressing. Material view selection is conducted when analyzing and optimizing the existing data structure before processing requests on them.
The algorithm described by the paper has the following steps:

1) Given a workload, get all the possible indexes and materialized views.

2) Identify the syntactically relevant index and materialized views, based on the queries that are likely to be requested.

3) Given the candidate indexes and materialized views, perform a greedy configuration enumeration to determine the ideal configuration to choose the appropriate indexes and materialized views by numerating through the possible choices and evaluating the cost of each.

In the evaluation, the proposed algorithm showed strong performance. The algorithm is able to significantly reduce the search space, with minimum reduction in quality of the materialized views found. Their JOINTSEL enumeration algorithm is able to show consistent improvement over existing solutions. Some of the things lacking in the paper are any potential drawbacks or limitations of the algorithm and design choices that were made, which is not addressed at all in the paper.



Review 9

This paper introduces a new technique that automatically select both materialized views and indexes for query optimization. The paper shows the underlying mechanism and the related work that supported this approach.Then it introduce experiments result.

This approach takes the indexes and materialized views into consideration and significantly expand the search space. Hence it implements several strategies to limit the set of all the candidate(index or materialized view). Several metrics are designed to evaluate the cost or cost-saving contribution of the materialized view and are used to filter out the candidate views. This paper also present a view merging technique to benefit the multiple queries.

Consider the experiments, there is a drawback: the experiments in this paper shows a relatively good performance but the workloads tested are relatively small, It would be more convincing to try heavy workload queries.



Review 10

Problem:
The main problem addressed by this paper is the index and materialized view selection problem. In a database, these are used to optimize queries drastically. However, because there are a very large number of possible indexes and views, choosing which ones to build based on a set of queries (the workload) is a hard problem. Algorithms before this paper also had several shortcomings, which the authors point out. Previous algorithms performed exhaustive searches of all indexes and views that were syntactically relevant to any query in the workload, in order to find an optimal solution. Because the search space grows very quickly with respect to the number of tables, this is not scalable. Previous algorithms also only considered indexes and materialized views separately, not comparing them on equal footing.

The authors of this paper do not perform an exhaustive search, but first limit the search space of possible indexes and views into a set of candidates. Candidates for views are found first by finding “interesting subsets” of tables- i.e. subsets on which views could be constructed that would cut the cost of the workload. Then, the query-optimizer is used to determine what view would be the most useful for each query in the workload based on the table subsets. Then views are merged to attempt to save space and to be able to serve multiple queries, and these become the candidate views. Candidate index selection is not discussed. A subset of the candidate views and indexes are then selected using a greedy algorithm.

Strengths:
The strength of this paper comes in identifying that exhaustive searches are not feasible, and coming up with a solution to limit the search space. It also questions the previous assumption that views and indexes should be evaluated separately, and instead develops a way to choose between them that adapts to each situation. The results of this paper are also very strong, showing very large performance speedups and very small quality drops.

Weaknesses/Remaining questions:
Some components (like index candidate selection) are omitted for space reasons.
Performance tests were run on small databases. Although the results are good, they would be stronger if they were run on large workloads.
This algorithm doesn’t seem to adapt to changes in workload.



Review 11

The paper illustrates algorithms to select beneficial materialized views and indexes automatically. Both indexes and materialized views can significantly improve the performance of database and it is necessary to create an effective physical database design for the workload of the system in order to maximize the performance. While it is tractable to generate an effective physical design manually for a handful of simple SQL queries, it becomes exponentially difficult as the number and the complexity of queries increases even for the most experienced database administrators. Therefore, automatic creation of physical design has been a very interesting topic, especially for many of commercial DBMS vendors to automatically tune their system and the paper is an example of such work.

Before the paper, there has been previous work to automatically select indexes by same authors. A materialized view is much richer in structure than an index since a materialized view can be defined over multiple tables and can involve join and aggregate operations. This characteristic makes automatic selection of materialized views more complex than the automatic selection of indexes. Also, query performance of the given workload is affected by both materialized views and indexes. This makes it necessary to consider both physical structures.

To briefly sum the algorithm in the paper, their algorithm uses several heuristics to obtain only interesting “table-subsets” from a given query. From these interesting “table-subsets”, the algorithm generates syntactically relevant materialized views, which are then pruned with cost estimation from the query optimizer. The algorithm then merges these views to identify materialized views that may not be optimal for a single query, but can be beneficial for multiple queries in the workload.

The paper provides a very valuable read to its readers as it demonstrates algorithms used in an industry-strength physical database designer, which is a part of a real-world commercial DBMS. The paper demonstrates a good example of database research having a meaningful contribution to the real-world, commercial DBMS product.



Review 12

This paper introduces a new approach to increase database system performance through the joint use of index and materialized views. Previous work in the field considered index selection and materialized view selection to be two individual problems. However, when the two are considered jointly, the quality of the recommendations improves and thus results in an overall improvement in the database.

The approach is compromised of four components: syntactic structure selection, candidate selection, configuration enumeration, and configuration
simulation and cost estimation. Some pruning is done when identifying possible candidates in the first three steps due to the infeasibility of searching the entire candidate space. The final component is responsible for estimating the impact of possible configurations when applied to the workload.

One of the main contribution of the paper is in efficient candidate selection. The candidate selection module produces set of candidate indexes and materialized views, which the configuration modules chooses from. However, unlike previous work in field, the paper performs a joint enumeration of the two candidate sets, but it also looks at two other approaches to the problem and the problems with each. Finally, the paper uses the TPC-H benchmark and two custom workloads to illustrate the performance improvements of the approach in the paper compared to exhausted methods.

Although I appreciate the fact that the authors presented a joint solution to improving database performance, I feel like the experimental results were lacking. In their results, it seems that they compared their approach to exhaustive methods, and had to limit the size of the workloads as a result. I would have liked to see how their approach performed with a larger workload and against other work in the field in relation to index and materialized view selection. The purpose of the paper was intended to demonstrate how previous work sacrificed database performance improvements due to identifying index selection and materialized view selection as two separate problems, but there seems to be no results showing how much was sacrificed.


Review 13

Indexes and materialized views are both physical structures that can significantly improve workload performance. The authors mention that most prior work considers index selection problem and materialized view selection problem separately, which does not take into account the interaction between indexes and materialized views. In this paper, they propose algorithms for selecting interesting materialized views, merging views to generate views that are useful for multiple queries, and enumerating a joint space of indexes and materialized views to find their final recommendation.

1. Candidate materialized view selection
The purpose is to identify a much smaller set of candidate materialized views but also preserve most of the results of searching the entire space. The basic idea is that they first select a configuration that is best for each query in the workload, and then generate an additional set of merged materialized views which can service multiple queries using their merging algorithm.

2. Joint enumeration over indexes and materialized views
Traditional approaches have following problems which their joint enumeration approach does not have:
a) Ordering problem
It must either choose the indexes first and then choose the materialized views later (INDFIRST) or the reverse of it (MVFIRST). Both approaches have their drawbacks and might not be able to find the optimal recommendation.
b) Storage fraction problem
Given the global storage bound, how to pick the fractions of the total storage bound to be allocated to indexes or materialized views? According to this paper, the fraction highly depends on the workload.

The main contribution of this paper is that they propose an idea that it is better to perform enumeration over the joint space of indexes and materialized views instead of enumerating individually. The pseudo codes are clear and the experiment results in the paper prove that their algorithms do improve the performance in terms of the cost of workloads.

There is one thing in their experiment results that are not discussed closely in the paper. In section 6.2.1, they use three different workloads to test the performance of their algorithm. We can see that the performance improvement is most significant in the WKLD-8TBL workload (largest reduction of materialized views and smallest drop in quality) and least significant in WKLD-4TBL. It would be better if they can discuss about under which workload characteristics is their approach more useful.



Review 14

This paper discusses the architecture and algorithms for automated selection of materialized views and indexes for SQL databases. The motivation for this study is that performance can be significantly improved because today’s commercial SQL database systems support the use of materialized views; however, an accurate and efficient selection process for materialized views is needed. The paper describes architecture and algorithms that can select from a set of all potentially useful materialized views for a workload in a scalable matter and take into account the complexities of SQL as a query language.

The key components of the architecture are:
Syntactic structure selection – identify indexes that are syntactically relevant
Candidate selection – identifies a set of traditional indexes, materialized views, and indexes on materialized views for the given workload that are candidates for an optimal solution
Configuration enumeration – over the joint space of indexes and materialized views, this algorithm returns a configuration consisting of a total of k indexes and materialized views, numerates all configurations of size up to n <= k, and chooses the left over k-n configurations with the greedy algorithm.
Configuration simulation and cost elimination – compares the configurations considered by the configuration enumeration model by examining the expected impact on workload queries costs from the proposed configurations

The steps towards candidate materialized view selection are:
1. Finding smaller set of interesting table sub-sets of the large space of all table sub-sets for the workload
2. Propose a set of materialized views for each query in the workload based on the table sub-sets
3. From the set of materialized views, generate an additional set of merged materialized views that can answer multiple queries

Overall, I enjoyed this paper’s explanation of materialized views and indexes, and felt that it provided a very clear and comprehensive description of the architecture and algorithms. I also enjoyed the real world examples given on the use of materialized views and indexes on real-world databases.


Review 15

This paper focuses on automatically selecting materialized views and indexes on database tables which have previously been studied but often separately, rather than at the same time. The authors describe a technique for selecting syntactic structure, selecting candidates for materialized views, as well as indexes, then finding a configuration that is ideal and simulating and estimating the cost of configurations. These configurations are defined as physical designs that could include candidate indexes, materialized views or indexes on materialized views.

The realization that indexes and materialized views are similar in that they are both redundant structures that aim to lower query execution time, compete for storage, and incur maintenance overhead for updates and the decision to examine and develop these database objects jointly is the key contribution of this paper. It is strong in it's explanation of MVFIRST and INDFIRST and motivation for joint enumeration. It effectively places it's contribution in the context of related work.

The drawbacks of the paper lie in the empirical evaluation. The authors test their method on a small data set and with small numbers of queries. Their largest workload has 125 queries. I would like to know if their improvement over the benchmark is statistically significant. I'm not convinced that the sample size for queries is large enough to find the effect they claim. They do show consistent improvement with JOINTSEL and it may be the case that by varying the storage constraint there is a point at which the improvement becomes statistically significant. This would have been an insightful analysis


Review 16

Part 1: Overview

This paper proposes a solution to automatically selection of materialized views and indexes for SQL Databases. At the moment SQL databases provides materialized views and indexes but lack automated tuning. Based on previous work of candidate indexes and materialized view selection, configuration enumeration, and cost estimation, this paper presents a new algorithm to choose the view to materialize. It first break down the possible views into table subsets, then merge some of the materialized views and finally choose the view to materialize in the configuration enumeration phase.

They provided algorithms for finding the interesting table subsets, reducing similar choices, as well as generating merged views. Actually they are doing a for loop to try to find the optimal merged views that can be candidate for materialization. Also they extend view candidate to index candidates and vice versa. Greedy enumerating algorithm is used for selecting indexes as well as materialized views. In the evaluation section, they employ TPCH-22, WKLD-4TBL, and WKLD-8TBL real life workloads to the experiment.

Part 2: Contributions

This paper’s results are implemented as a tuning wizard binded with Microsoft SQL Server 2000. This pushes the academic results into real industry and this is highly valuable.

This paper combines the automated selection of materialized views and indexes into a similar problem, divides this problems into several procedures and provides solutions to each of the procedures.

Part 3: Possible ways to improve

Maybe they should have shot for some theoretical proof of some worst case bounds for their algorithm. There would always be a question mark that “is this algorithm good for all SQL workloads?” Noted that they only tried three SQL workloads in their simulation.

Optimal performance measure is omitted due to the complexity of computation. Maybe they can provide some approximation of the optimal results and do some comparison with their own results. Here the experiment settings lack some good benchmarks.

This paper provides a lot of algorithms for different procedures including finding interesting table subsets, pruning similar views, merging views, enumerating choices. However there should be some relations between these procedures, that is, the settings of these sequential procedures could affect the performance of each other. There should be some experiment to test and try to catch the optimal settings. And maybe there could also be some theoretical proof to show the best configurations.



Review 17

The paper addressed the problem of automatically selecting appropriate set of materialized views and indexes while considering their interaction and complexity of SQL workloads. This problem is important because having the right set of materialized views and indexes can significantly improve performance of database applications. In order to address this problem, the paper introduced two main techniques. The first technique is identifying “interesting set of tables”. These set of tables are identified based on estimation of potential reduction in cost of the workload by materializing one or more of the tables in the set. Specifically, a table-subset is selected based on the cost of all queries and its size, and the total size of tables in the queries which occur during a particular workload. This technique is important as the search space of relevant materialized views for a workload could explode otherwise. Although significantly reduced, many of the tables might not be useful for answering any queries. This is because the query optimizer using cost estimation might opt not to select the particular tables. Consequently, the authors suggest an algorithm which prune the “interesting set of tables” further by removing those tables which are not chosen by the query optimizer.

The second technique is called “view merging” which allows to merge two or more number of views, though not optimal for a single query might be useful for multiple queries and end up being optimal for the whole workloads. This is particularly true in a storage constraint environment which cannot accommodate all individual views. Specifically, the technique looks for a merged view which is storage efficient, has a cost not significantly higher than the parent views, and is able to answer all queries that can be answered by the parent views. In addition, the paper explores the interaction of materialized views and indexes and the resulting trading choices between them. In this regard, their technique, joint enumeration approach, is interesting as it handles complex interaction of candidate indexes and materialized views for a particular query. In this approach, both indexes and materialized views are treated in the same footing and the benefit gained by each of them is individually evaluated using a greedy algorithm.

The main strength of the paper is its identification of materialized views dynamically based on the workloads rather than based on static search algorithms in a search space. In addition, the paper presented the techniques in simple explanation and also present its claims with extensive experimental evaluation.

The paper has some limitation in explaining several decision for some of the algorithms. For example, while finding “interesting table-subsets”, the authors proposed to use the weightage of the size of the tables to discriminate between table-subsets which occur in the same number in the queries of the workloads. However, they fail to support the correlation of this relative weight value with an optimal materialized view. Furthermore, the authors relies on a greedy algorithm in selecting between indexes and materialized views for optimal configuration. However, greedy algorithms while simple are found to give sub-optimal solutions. It would be interesting to look for other algorithms which might lead to a better optimum configuration.


Review 18

In this paper, a cost-driven materialized views and indexes auto selecting method is presented. This industry-strength tool for automated selection is part of the tuning wizard in Microsoft SQL server 2000. This paper focuses on the class of single-block materialized views consisting of selection, join, grouping and aggregation. The key contribution of this paper is to idea of configure enumeration over the joint space of indexes and materialized views.

Two key observations are discussed in this paper. First, it is not feasible to try to find a candidate materialized view matching the query exactly. Second, some certain table-subsets can only lead to a small reduction in cost for the workload. Therefore, set size of interesting table-subsets can be further reduced.

The paper then presents the algorithm that can recommend both indexes and materialized views. As mentioned in the paper, future algorithms can be developed and evaluated with more infrastructure support. Indexes and materialized views are only part of the physical design space, which means that more key factors should be considered to further shrink the search space and improve set quality.


Review 19

This paper is described an innovative algorithm for selecting materialized views and indexes. It solved problems faced by previous algorithms: 1) isolated consideration of index and materialized view selection, 2) only select materialized view from a given set.
To solve the problem of nearly infinite set of possible materialized view, this algorithm take advantage of some key observation: 1) some of queries can be grouped and benefit from a single materialized view, 2) inexpensive queries doesn't really benefit from materialized views, 3) table size will affect the result.
The algorithm works in three steps to take advantage of above observations: 1) using some evaluation to derive a small interesting table-subset. This enables us to enumerate materialized views later. 2) For each query workload on table-subset, derive a best configuration. 3) Merge some materialized views. The merged views, together with views from 2) will be sent for evaluation. After evaluation, recommendation might be given.

Contribute:
As mentioned above, this paper proposed and implemented a novel way of generating materialized views and indexes. It combined the generation of indexes and materialized views, and it could generate any materialized views instead of just a set of given ones.
Also it provides some key observation that will benefit later researches on optimization.

One thing that I like the best in this paper is that the algorithm has a very clear structure, which makes it relatively easy to understand and to implement.

Weakness:
Although it says that it combines the selection of indexes and materialized views, this paper talks mainly above materialized view selection. That makes the relation between indexes selection and materialized view selection to remain unclear.
Also, there are several times in the paper that his previous paper is mentioned, so I think it might be better if the author can provide the main idea of that paper.



Review 20

This paper proposes a method that considers optimizing the physical design for the workload on the system by considering materialized views and indexes together.

The method proposed in brief is as follows
1. There is a candidate selection module that identifies a set of traditional indexes, materialized views and indexes on materialized views that it deems worthy of further exploration.
2. Amongst these designs is the ideal physical design, they refer to as a configuration. Since naïve searching among this candidate selection would result in a huge state space, they have used a specific Greedy algorithm that picks optimal configuration of size upto m (m <=k) and the remaining (k-m) configurations are picked greedily.
3. The next step is to choose a table subset (a subset of tables that are present in the query) where materializing one or more views on this subset can reduce the cost of the workload significantly; above a given threshold.
4. Even though the goal is to select syntactically relevant materialized views that are not used in answering any query, it is shown to result in suboptimal results over a period of workload or consequent queries.
5. So, the original candidate selection of materialized views is maintained and used as a starting point for generating additional ‘merged’ materialized views.
6. The authors also explore the differences that arise from creating the relevant indexes first followed by the materialized views and vice-versa.
7. Therefore, the method of joint enumeration of candidate indexes and materialized views is recommended so that it weighs each option on the same ground.

In the initial section of the paper, there is too much repetition about how this paper deals with using materialized views and indexes together. I also thought the graphs that quantify the quality of using view merging and reduction in materialized views are not very clear.

However, I definitely think a lot of the concepts in this paper were well explained and well described with examples, therefore the algorithm was easy to follow.



Review 21

This paper introduced a new problem of automatically selecting materialized view and indexes for SQL database. Inspired by the fact that both materialized view and indexes are physical structures that can largely boost query performance, the tactic of using the rich structure of materialized view together with index together is proposed.

The architecture of the jointly selection of index and materialized view firstly start by taking in the workload with the help of log tracing. It identifies the syntactically relevant index, materialized views and indexes on the materialized view from a large range of choices. In order to have the search space decreased to a manageable scale, it further utilizes traditional tools to select a interesting subset of candidate index and materialized views. The focus of this paper is a greedy algorithm to prune syntactically relevant materialized views. It uses the query optimizer to locate a subset of tables that are used for answer queries frequently, those less used ones will be not be added to the configuration enumeration. Also the indexes on those table-subset will also be added into the final configuration. The last but not least, is the view-merging step to generate a new merged materialized view that can better serve all queries in the workload instead of just getting optimal solution on one single query. Hence the final set of materialized views vary largely from the ones in the candidate set, but can achieve lower cost in answering than them.

To sum up, this paper gives a architecture of new DB tool to recommend the combination of indexes and materialized views. So to speak, some drawbacks in this paper are:

1. potentially the workload selected at the beginning as the test sample may be too small to represent the real world workloads.

2. Also it didn’t provide enough details in how to choose index on the merge views, and how would that affect the final performance.

3. In terms of test results, there is no comparison against other products mentioned in the paper like Redvrick/Informix and Oracle 8i.



Review 22

This paper introduces architecture and algorithms for automated selection of materialized views and indexes.
At first, the author points out that there are some problems with current research about this topic. One is previous work didn’t consider the interaction between indexes and materialized views. Another is using syntactic materialized view to search which is large amount of work. The method that presented in this paper emphasize on dealing with these problems.
The architecture of index and materialized view selection is: systematic structure selection, choose a set of candidate indexes and candidate materialized views then configuration to find the recommendation. The method in this paper divided into three steps:
1. Choosing a smaller set of interesting table-subsets from all possible table-subsets for the workload.
2. For each query, from the interesting table-subsets, using cost-based analysis to get a configuration that best fit that query.
3. From the result of 2, generate an additional set of merged materialized views that can serve multiple queries, and doesn’t have significantly higher cost. The paper talks about joint enumeration to deal with the interaction among indexes and materialized views.
In the end, the paper provide experiment result to support their claim that the candidate selection, configuration enumeration and joint enumeration have better performance than other methods.
This paper first explicitly point out the problems that current index and materialized view selecting strategies ignore and then the method is to improve selecting strategy on these problems. My concern is, as the new selecting strategy exclusively dealing with such problems, is there any other problems be introduced? I.e, the cost of selecting interesting table-subsets may exceed the benefit.
Another drawback is : Although it is reasonable to think that there are some interaction between indexes and materialized views, the author doesn’t explain that how this interaction exist and in which circumstances the old selecting method is not good because of the interaction.


Review 23

Selecting material views and index for databases to make queries more efficient is still an open problem in the database community. This paper outlines an approach to automatically select materialized views for SQL databases given a specific workload. By doing so, the query optimizer can then take into account the materialized views to create a better, more efficient plan, making all future queries faster.

To create a set of materialized views, the algorithm goes through the workload to find the TS-Cost of specific table subsets. Since the number of table-subsets is large, we can use this cost to limit all materialized views to subsets that are interesting. With these implementations, the author produced experimental results that significantly reduced the amount of materialized views needed.

However, I believe there are many weaknesses with the paper because the authors do not account for many complexities in a practical scenario.

1. The algorithm needs a workload to work off of, but in real life, workloads can vary and queries may change depending on the time of day or even time of year. How does the algorithm account for this or is it the programmer’s responsibility to give the algorithm a complete set of cases?

2. How often is this algorithm supposed to run and what is the performance impact when it is running while users are still querying for information? The trend of queries change over time and the schema itself may change frequently as well, so is there an optimal way to keep the materialized views updated?

3. Sometimes programmers would rather have queries take a longer amount of time because they are memory bound and other times they want queries to run even faster because they are CPU bound. Would it be useful to be able to fine tune the algorithm so that it can create fewer or more materialized views based on the situation?

4. In the experiment results, many of the workloads were not real world examples or benchmarks. We also do not know what the database schema consisted of so I still wonder how the algorithm will perform practically in a production DBMS.



Review 24

This paper discusses the development of an automated tool for the selection of which materialized views and indexes to use in a database configuration. As with many other problems we have read about, the main problem here is that the search space of views and indexes that are syntactically relevant to a workload is exponentially large. Therefore, exhaustive algorithms are infeasible for these purposes. In this paper, the Microsoft team describes some of the heuristics and algorithms that they used to efficiently prune the search space and estimate the benefits of interesting views and indexes.

The selection tool uses several core modules. The candidate selection module analyzes the workload and selects table-subsets that are useful for a large number of queries and are large enough to significantly decrease the cost of relevant queries. Microsoft uses cost estimation techniques that rate tables not only by how many queries they are used in, but by the fraction of rows they represent relative to all tables referenced by a given query. This allows for the exclusion of tables that, while syntactically relevant, are too small to produce significant benefits by their inclusion in a materialized view. These selection criteria prune the search space enough that Microsoft's configuration enumeration module is able to examine all possible interesting table subsets for a given query that occur in the set produced by the candidate selection module.

This contributions of this paper are quite significant. Where, before, it was difficult to choose even a set of beneficial indexes (which essentially represent a small subset of materialized views) for a workload, Microsoft has developed an efficient method to find not only a set of useful materialized views, but a set of indexes on those views. As shown by the data in section 6, the reduction in search space they were able to achieve on benchmark workloads ranged from 40-75%, while maintaining similar quality as an exhaustive evaluation of syntactically relevant views.

My chief complaint about this paper is the lack of detail in some sections. The authors used about as much space saying that their discussion was limited by space as it would have taken to go into more detail on an interesting topic such as how they manage the selection of indexes on materialized views. Furthermore, some of their algorithms seemed a bit abstract to me, and I would have like a little more detail or an example explaining them.


Review 25

The paper proposes an industry-strength tool for selecting materialized views and indexes for SQL workloads, which then is implemented as a robust physical database design tool for Microsoft SQL Server 2000. To accelerate performance, both indexes and materialized views are essential physical structures. The importance of this paper is that it ensures that the solution is robust and takes into account the complexities of full query language. On the other hand, previous paper focuses only on the search problem in a given set.

First, the paper shows how to identify interesting set of tables that we need to consider materialized view. It introduces the architecture for materialized view selection. The enumeration is not scalable since the space of relevant materialized views is very large. Therefore, it proposes an algorithm, for finding interesting table subsets in the workload.

Second, the paper presents a view merging technique that identifies candidate materialized views that can be beneficial to multiple queries while not optimal for single query. It also provides an algorithm for this problem. The biggest difference of the work in this paper from previous work is that it considers indexes and materialized views selection as a similar problem. On the other hand, previous work in physical database design considered these as two isolated problems. The paper has many experimental results that support their algorithms for selection.

In conclusion, the paper proposes the architecture and algorithms of a robust physical database design tools for Microsoft SQL Server 2000. In my viewpoint, the paper is convincing because it provides concrete algorithms for the indexes and materialized views selection problems, and sufficient experimental results that can support its ideas.



Review 26

This paper provides algorithms and methods for picking indexes as well as materialized views to make in SQL databases to increase query performance. Previous attempts to select materialized views and indexes looked at them as separate entities but this paper looks at how they can work in tandem to be more effective.

To find a materialized view 3 main steps are taken:
1.) Determine a small subset of interesting table subsets. This set is crucial for reducing the search space from all possible table-subsets, which can drastically increase performance.
2.) Prune syntactically relevant materialized views. This is done with the help of the query optimizer and evaluates what table-subsets are used to answer queries and if it is not it is pruned away.
3.) Merging views together when it is beneficial. This can serve the purpose of having a merged view that will be helpful for multiple queries not just one.

The paper does a pretty good job of explaining the algorithm and went in depth in the techniques used. It discussed tradeoffs from losing optimal answers to limit search space and improve speed, as well as provided metrics for its results. I think the psuedocode blocks were helpful in understanding the algorithms as well, it was the first paper that’s really used those (that we’ve read and I can recall) and I thought it was a nice touch.

I can’t think of any real weakness’ of the paper other than maybe the fact that it only talked about SQL. I would imagine these techniques work for other systems but I’m not sure and it would have been nice to at least hear about that and if they don’t work what about them would fail that doesn’t with SQL. Also the paper was slightly hard to read just because it was all text blocks without many visuals until the end and it was all just very technical description of an algorithm, but that is a personal preference and I’m sure some other students prefer that. I prefer to break text up occasionally with a descriptive image (but in their defense it is tough to have images for algorithms)



Review 27

The purpose of this paper is to discuss a new methodology for selecting the most appropriate materialized views or indexes (which can be seen as extensions of the concept of materialized views). In addition to describing a novel architecture and algorithms to address the problem for automatically generating materialized views, an end-to-end solution is offered in addition to a set of algorithms for the selection of materialized views and indexes. While most of the prior work discussed presents the problems of selecting indexes and materialized views separately, this paper approaches both in tandem.

The most convincing part of the referenced work was the implementation of their methods into an industry-standard DBMS application, allowing them to concretely compare their algorithms/infrastructure to existing benchmarks. Specifically, the fact that their methodology is scalable and robust to many different kinds of data (artificial and “real-world”) coupled with the experimental comparisons to existing alternative approaches makes their work very convincing in its advantage over competing architectures.

It would have been interesting to see some discussion about their scheme choice as it would be implemented in systems beyond Microsoft SQL, though this is a Microsoft research paper so their description of only one tool is understandable. However, a good use of this would be a discussion of failure modes/constraints introduced by using only the MS SQL server. In addition, index selection is not specifically discussed in detail, probably because it is classified as a special case of materialized views. There are probably specific algorithms or methods for selecting candidate indexes that could have been discussed.


Review 28

Since databases used at enterprise scale encounter such high volume workloads, having a DBA customize materialized views and indexes can be a daunting task. This paper proposed novel algorithms for automatically selecting useful materialized views and indexes. The paper also introduces a new architecture for view and index selection to optimize the entire process.

The architecture proposed breaks the process down in to a set of discrete steps composing of syntactic structure selection, selection selection, configuration enumeration and a final recommendation is made. In order to reduce the search space of all syntactically relevant materialized views, an algorithm is proposed for identifying interesting table-subsets which vastly reduce the number of syntactically relevant materialized views (this occurs in the candidate materialized view selection phase). The view merging algorithm presented allows for materialized views that are not optimal for any individual query, but are useful for multiple queries to be considered. This novel intuition allows for the algorithm to discover materialized views to be discovered that may be optimal for the workload.

While explaining the algorithm for excluding non-useful syntactically relevant materialized views, the “Find Best Configuration” algorithm is simply assumed to exist. Since this is such an important piece of the algorithm proposed, it seems odd that no explanation is given to the intuition that the “Find Best Configuration” algorithm uses. Another deficiency in the paper was its lack of explanation regarding the results for elimination of useless materialized views. A nice graph is presented showing the percentage reduction in syntactically relevant materialized views for the given workload, but there is no explanation as to how it is know that the discarded views are not useful. There should be some explanation/verification that the discarded views were indeed useless.


Review 29

The paper discusses an alternative solution to solve the materialized view and index automated-selection problem for SQL workloads. Here, the writers see materialized view as a more complex version of index with the additional restriction on SQL language usage. Thus, materialized view selection is a more complex process than index selection. The paper tries to propose solution that would pick the interesting materialized view based not only on the given set of existing materialized view, but also from exploring the possibility of other materialized views that would accommodate multiple queries in the workload.

The paper established that in order to pick a physical design consisting of indexes and materialized views it is critical to search over the combined space of indexes and materialized views. Fortunately, the paper also sees it important to have a smaller set of candidate materialized views to help reduce the enumeration cost. It presents two techniques: (1) how to identify interesting sets of tables such that we need to consider materialized views only over such sets of tables (cost-based and weight-based) and (2) a view merging technique that identifies candidate materialized view that while suboptimal for a single query, could be beneficial to multiple queries in the workload.

One apparent contribution of the paper is that it proposes a merged solution for index and materialized view selection, while prior works consider the problem separately. Since indexes and materialized views are fundamentally similar, they can interact with each other. Therefore, in considering the enumeration algorithm, the paper proposes a joint enumeration using greedy algorithm. There is comparison charts for the implementation of joint enumeration vs. MVFIRST/INDFIRST, showing that joint enumeration performs well and faster. Another contribution is that the paper tries to find the optimal solution of indexes and materialized views selection that would work for the whole workload instead of just for one query. The Cost-based pruning of syntactically relevant materialized view algorithms helps trim down the list of candidate materialized view, thus helps reducing cost of examining materialized views during view merging.



Review 30

The purpose of this paper is to present a novel method for automatic selection of materialized views and indexes for SQL databases (specifically Microsoft SQL Server 2000). Given that materialized views and indexes are so often dependent on one another, the authors present a novel approach for selection of both concurrently. The authors also seek to do their own pruning of the set of candidate materialized views and indexes during the configuration enumeration phase. It is the case that previous methods had assumed that they are provided with a set of relevant materialized views and/or indexes and then work from their. The authors push back against this and insist that automatically generating this set is an important part of the automated selection process.

The technical contributions of this paper are the algorithms presented in the inner sections along with experimental evaluations that do indeed show that their method greatly outperforms previous methods that focused on only selecting indexes or materialized views independent of one another. Additionally, as alluded to above, the authors present algorithms not only for the simultaneous selection of materialized views and indexes, but also they present algorithms that allow for the selection of the candidate materialized views and indexes. They create their own configuration enumeration phase that first finds interesting table-subsets from among all possible for the workload in consideration, then they prune syntactically relevant materialized views based on data received from the query optimizer, and finally they perform view merging to round out this procedure.

I think that this paper actually has few weaknesses. The paper is presented in a methodical manner, with the authors addressing any concerns and justifying design decisions and choices of tunable parameters with empirical evaluation. There are a few points at which the discussion is lacking, but in those moments, the authors either state that the section is omitted due to space restrictions or point the reader in the direction of a reference. I do think the conclusion is a bit lacking. I wish there had been an additional future works section, because the authors hint at interesting extensions, but do not take the time or space to discuss them further.



Review 31

Title: Automated Selection of Materialized Views and Indexes for SQL Databases
This paper proposes an end-to-end solution to the problem of selecting materialized views and indexes while taking the interaction between these two into consideration. While the previous research work focusing on the search problem which picks and attractive set of materialized views and implicitly assuming that the picked set is of all potentially interesting candidates which are inappropriate with respect to the large workload of SQL, the proposed method will rather make selection by judging on the interaction between the two. Experiments conducted in this paper demonstrate higher robustness and reliability of the proposed method.

As a brief summary, the algorithm first identifies indexes, materialized views and indexes on materialized views as candidate answers to the query. It than compares the preferences over the candidates by judging jointly on the properties of each candidates either of materialized views or of indexes.

In section 4.3 the authors state that a sub-optimal recommendations can be achieved under some circumstances. However this statement comes without a solid proof and can potentially raises some doubts. Besides, in the cases of compromising to a sub-optimal solution, the paper does not provide any information of whether the sub-optimal solution is bounded, how strong the assumption of sub-optimization is and how likely the required assumptions can be fulfilled which leaves this “observation” not that persuasive and the advantage of such compromise less attractive.

Another interesting question to ask is, whether could there be another way of such atomization? For example if there are some benchmark datasets available, is it possible that come machine learning/data-driven algorithm can do a better job in such selections since many similar tasks in fields such as operating systems are taking advantages of machine learning techniques.


Review 32

This paper identify the non-trivial task of selecting materialized views and indexes together, and addresses this issue by proposing an end-to-end physical design framework with detailed algorithms of implementing it. The framework was implemented as part of the tuning wizard of Microsoft SQL Server 2000. it can be decomposed as the following components:
1. A syntactic structure selection module identifies the syntactically relevant indexes, materialized views that can potentially be used to answer the problems.
2. A candidate index selection module and a candidate materialized view selection module that together form a candidate selection module. This module provides the worthy-explored indexes and materialized view together for the following module to configure.
3. A configuration Enumeration module that find the ideal physical design.
4. A configuration simulation and cost estimation module that is part of SQL server to provide cost estimation support for view selection as well as configuration enumeration.

This paper has an emphasis on candidate materialized view selection since index selection is introduced by the authors’ previous work. This paper introduce the selection process as:
1. Finding interesting table subsets.
2. Exploiting the query optimizer to prune syntactically relevant materialized views
3. Merging views with a goal of both maximizing potential support for queries and minimizing storage.
For each of these steps, it gives a comprehensible algorithm to help its reader to understand the process.

Besides that, the paper also talks about trading choices of indexes and materialized views and a batch of related works to provide the background of this research. Moreover, a comprehensive experiments is given at the end of the paper. Experiments are implemented on a modern commercial database, and the results show the framework has archived its design goals.

Strengths:
1. This paper proposed a framework that can automatically select materialized views and indexes to prune the database for better performance as well as less cost.
2. This paper have a complete set of experiments that implemented on modern commercial database that make its results more than convincing.
3. This paper illustrated its points in section 4 with a set of examples, which I found is very precise and helpful for understanding the problems.
4. The pseudo code of the algorithms in section 4 is very clear and helpful to understand the implementation behind the concepts.

Weaknesses:
1. Though materialized view and indexes selections are for sure can improve the performance with the given or analyzed queries, it is not necessary true for other queries. In fact, if the future queries’ pattern changes a lot, it is a potential concern that this whole process may start over to take advantage of the queries and the workload is not trivial.
2. Though it is understandable that the indexes selection part has been discussed in previous work, it would be better if this paper could provides at least an short overview of the indexes selection to help its reader to understand.



Review 33

This paper proposed a way of automatically select materialized view and indexes for SQL databases. The main contribution of this paper is a set of algorithms that prune the search space of materialized views, as well as the idea of selecting materialized view and index at the same time.
The author presented an architecture of automatic materialized view selection system. The key idea is that, by analyzing query workload, syntactically related views are first filtered by candidate selector. The selector drops most of the searched views, only leave a limited amount of “interesting materialized views”. A materialized view can be proposed on any subset of tables in the query. First the selector scans through the large space of table-subsets. It evaluates the importance of table-subsets by a slightly modified sum of cost of queries where the table-subset occurs. It unions all the table-subset that meet an importance threshold into a set, further prunes this set with the help of query optimizer. All the left tables are candidates of materialized views. Lastly the merged views of these candidates are also considered as candidate. This is because that some candidate materialized views are suboptimal, and it implies that some other materialized views are suboptimal for single query but they are optimal considering all queries.
The author also claimed that they create indexes even on materialized views, thus further improve performance.
The method proposed in this paper is clear and effective logic wise. The evaluation part shows that it works well in production. However, one weakness of this paper is the lack of illustration of the method to combine selection of materialized views and indexes. It’s not clear to readers how this co-selection benefits the whole system and whether it’s practical to apply to all DBMSs.