This paper proposes algorithms to select materialized views and indexes for SQL databases. They bring up an end-to-end architecture for index and materialized view selection tool, with the following considerations: - Candidate Materialized View Selector: exhaust selection of syntactically relevant materialized view isn't scalable, so the authors propose algorithms to select table-subsets of interest to get the sub-tables which have both higher costs and weights compared to others. Merging view is also beneficial since the space of materialized views is limited, so the authors propose algorithms to merge views for good performance-space trade-off. - Index and Materialized View balance: the author holds the view that MVFIRST and INDFIRST both have drawbacks, where the fraction of storage is hard to decide, redundant recommendations can occur, and the first selection would preclude useful second selection; on the other side, joint enumeration of index/materialized view choice is desired, since the storage bounds is no longer a problem, and interactions between index and materialized views are considered. The authors provide some experimental results to support their algorithms. The results show that their materialized view selector is fast with little performance degradation, their view merging algorithm provides good performance increase when storage is bounded, and their joint enumeration method outperforms both MVFIRST and INDFIRST. In general, this paper is well structured, with concrete algorithms and examples for the materialized view selection/merge, and complete experimental results for the system built. One small drawback is the paper doesn't introduce the algorithm involved to do the JOINTSEL process of index and materialized view. |
Materialized views and indexes for SQL database are important features in today’s commercial DBMS. A desired materialized view enhances the performance of the product significantly. Therefore, it’s important to have an efficient automated materialized view selection tool. This paper makes use of a previous index selection for Microsoft SQL server and builds innovative architecture for picking physical design of indexes and materialized views on that. Then, the paper presents the method for identifying a smaller subset of materialized views as potential candidates, which saves a lot of time from searching the entire possible choices. Some of the strengths of this paper are: 1. The paper pointed out that many prior work separate index selection and materialized view. However, they are both essential for effectiveness, so putting them together can optimize the physical design for the workload on the system. 2. Finding interesting table-subsets, prune syntactically relevant views and view merging techniques significantly reduce the size of materialized view candidates. 3. The experiment on a MS SQL server 2000 proves the theory presented in the paper improves the recommendation quality of the premise that useful materialized views won’t be eliminated from the candidate sets. Some of the drawbacks of this paper are: 1. The range that this paper can be implemented to is limited to single-block materialized views with selection, join, grouping and aggregation. 2. Take reinforcement learning into consideration may be a good idea to further enhance the materialized view recommendation system. |
This paper presented an end-to-end solution to the problem of selecting materialized views and indexes, to solve the problem of building an industry-strength tool for automated selection of materialized views and indexes for SQL workloads. By considering the interaction between indexes and materialized views, we can optimize the physical design for the workload on the system, and improve the performance and scalability. This paper proposed an architecture of index and materialized view selection, and introduced two key techniques for candidate materialized view selection: 1)finding interesting sets of tables algorithm 2)Cost-based pruning of syntactically relevant materialized view algorithm 3)view merging algorithm. The paper has several technical contributions and strengths: 1).It took into account the interaction between indexes and materialized views by considering them together to optimize the physical design for the workload on the system. 2).It gave innovative techniques for dealing with the large space of potentially interesting materialized views that are possible for a given set of SQL queries and updates over a large schema. 3). It ensured that the solution to this problem is robust and takes into account the complexities of full SQL as a query language, as well as pragmatic issues. 4).It built an industry-strength physical database design tool that can determine an appropriate set of indexes, materialized views (and indexes on materialized views) for a given database and workload consisting of SQL queries and updates. 5).The algorithms proposed by the paper are effective, scalable and of high quality, as well as offer better recommendations. 5)It is now used as a part of Microsoft SQL Server 2000’s upcoming release, which means the solution proposed by the paper can be applied directly into industrial field and proved useful and efficient. This paper also has some drawbacks: 1).It didn’t solve the problem of dynamically determine which materialized views should be maintained for SQL workloads. 2).It didn’t develop a theoretical framework and appropriate abstractions for physical database design that is able to capture complexities of the physical design problem. 3).It only gave solution to part of physical design, so the general problem of physical design is still not solved. |
I hoe this time works |
For commercial-sized SQL database systems that have to handle large volumes of data and queries, proper use of materialized views and indexes can greatly increase performance. The difficult part is often in selecting appropriate materialized views and/or indexes in a timely manner, due in part to the potentially enormous search space. To address this problem, the paper “Automated Selection of Materialized Views and Indexes for SQL Databases” by Agrawal, Chaudhuri, and Narasayya (all from Microsoft Research) presents an end-to-end method for automatically selecting materialized views and indexes. This entails a novel architecture and algorithms for selecting and culling candidate materialized views and other tasks. As detailed by the authors, the architecture’s key components are: 1. Identifying syntactically relevant indexes and/or materialized views useful for the query 2. Selecting from this pool of candidate views/indexes for efficient ones 3. Determining the ideal physical design, i.e. configuration enumeration 4. Simulating the configuration and estimating its costs Item 2 is the topic of focus in this paper, with previous papers addressing the other items like Topic 1 in greater detail. Interesting table subsets are selected based on query cost, which is estimated by a cost estimation section already present in the SQL server. Syntatically relevant materialized views that are never used by any query are eliminated, and from here, the system tries to see if views can be merged together to support multiple similar queries. Algorithms are provided for these major steps. The paper then discusses the interaction between indexes and materialized views, which were often analyzed in isolation in previous work in the field. Here they talk about the merits of jointly enumerating the index and materialized views spaces, as well as some potential downsides of picking materialized views or indexes first. The authors conclude with a section detailing their empirical results and improvements achieved using their system. This paper’s main contribution is the development of a successful automatic system for identifying and selecting materialized views and indexes to improve the performance of SQL queries. This effort can be considered successful because it was included as a tuning wizard in Microsoft SQL Server 2000. In terms of the paper itself, the authors did a good job of showing the overall picture and where their system sits, and the experimental results lends further empirical support for their original claims. Examples were clear and helped to improve understanding, such as the ones explaining the motivation for merging views. This paper was well written, and the one weakness that comes to mind is the lack of detail in some sections, probably owing to the length restrictions. One example of this is in the description of indexes; while the authors advocate doing them tandem with materialized views, the paper mostly focuses on the latter. A follow-up could help in clarifying how the indexes interact with materialized views in this system. |
This paper presents an end to end solution for selecting materialized view and indexes. This problem is important because though there were works in materialized view selection in OLAP databases, no work has dealt with building industry-strength tool for automated selection of materialized views and indexes for databases. Also, the presence of the right materialized views can significantly improve performance. This paper proposed a new architecture of index and materialized view selection and several new algorithms in finding interesting table-subsets in the workload, merging view pair, generation of a set of merged views from a given set of views, and algorithms form cost-based pruning of relevant materialized views. There are several conclusions that this paper drew which serve as contributions. 1)searching over the combined space of indexes and materialized views before pick a physical design consisting of indexes and materialized views is important. 2) this paper presents a way of identifying a much smaller set of candidate materialized views such that searching over the reduced space of candidate materialized views preserves most of the gains of searching the entire space of possible materialized views. Two techniques which form the basis of candidate materialized view selection: 1) shows how to identify interesting sets of tables. 2)presents view merging technique which is beneficial to multiple queries. Extensively experiments show the effectiveness of the proposed architecture and algorithms. The performance test are not so convincing for the reason that those tests run on small dataset. Hope the author of the paper chooses to use large dataset in testing. This is the drawback of the paper. |
In this paper, three authors presented an end-to-end solution to the problem of selecting materialized views and indexes. The author also tested their result and ship it with SQL Server 2000. This is the problem that solved by this paper, by its algorithms that can prune the search space of materialized views and figure out the index with it at same time. As the result, the author present an architecture and novel algorithms. The main steps of this architecture can be summarized as followed: 1. Find a representative work load for which we need to select views, this can be done by look into the log of the database. 2. Given the workload, we need to figure out syntactically relevant indexes, materialized views and indexes on materialized views that can be potentially used 3. Then, the algorithm need to select the candidate index and matirialized views. The index selection is introduced in previous work. The author focus on introducing the selection of materailized views. We first need to find the interesting table subset, then exploit the query optimizer to prune syntactically relevant materialized views and last merge views to maximize support for queries and minimize storage. 4. Then it will need to use the previous greedy algorithm to configure the enumeration. 5. Final recommendation get The authors has done a thorough experiment to show that their algorithms did provide performance boost. The main contributino of this paper is that 1.Pointing out that selecting a material view and index is an important task that can boost database performance. 2. Presenting an architecture and novel algorithms as the foundation of a robust physical database design tool for Microsoft SQL Server that recommend both indexes and materialized views. I 3. Extending a technique that dynamically determine which materialized views should be maintained to SQL 4. Developing a theoretical framework and appropriate abstractions for physical database design The weakness of this paper are listed as following, 1. The author did not assume a condition that the RAM is limited. 2. The workload that is picked in first place may be bias, do we need an algorithm? |
This paper covers a method for selecting materialized views and indexes for RDBMSs. It builds on previous work for OLAP databases, however due to larger workloads, the authors do not want to merely choose from all syntactically relevant materialized views. Instead, they add a pruning phase, which is shown to save time while also keeping high quality candidates. Additionally, they propose a view merging phase, which is useful in environments without much storage space. Algorithms from previous work, such as Greedy(m, k) for configuration enumeration, are used for other pieces of the process. The overall process for the selection is shown in a flow chart on page 2: workload -> syntactic selection -> candidate index/view selection -> configuration enumeration -> recommendation. One main innovation that separates this paper from previous papers is that the process of selecting materialized views and indexes at the same time (JOINTSEL). The previous literature had used an INDFIRST or MVFIRST approach, which led to suboptimal selections. The authors performed experiments showing that the JOINTSEL approach is superior (a) when space is an issue and (b) when there are many updates. Additionally, JOINTSEL allows the interaction of indexes and materialized views to be considered, which naturally leads to more optimal selection. This paper is just part of the AutoAdmin project at Microsoft. The project aims to make all databases self-tuning, and the methods detailed in this particular paper were implemented in SQL Server 2000. I would be interested in knowing if these methods have been adopted in other commercial databases like Oracle, and what the current state of the AutoAdmin project is. Finally, I would be curious to hear about how well adopted these features are by software companies, and how/if human DBAs can still be a useful resource in defining indexes and materialized views for their specialized workloads. Generally I thought that the paper itself was clear and well written, but some information about expected usage would have been appreciated. |
Problem Addressed: The problem the author of the paper trying to address is to come up with an efficient algorithm to consider Materialized views and indexes at the same time when a database system planing a workload. Fundamentally, materialized view and indexes are the same. They create some additional physical structure to accelerate the performance. Consider materialized views and indexes at the same time and make recommendation can achieve better performance than only consider one aspect. The problem is, materialized views is much richer and has a much larger search spaces then indexes. Exhausting the possible space of materialized view for a workload is time-consuming and not scalable. Main approach: The author purposed an architecture and novel algorithm to do efficient search in space of possible materialized views that is syntactically relevant to find materialized views recommendation that is close to the optimal choice. The module Candidate Materialized View Selection is in charge of doing this. Within this module, the algorithm first find interesting Table-subsets, then prune the syntactically relevant materialized views using query optimizer. For the result set of materialized views, the view merging algorithm will generate the final merged views. Final step consider the indexes and merged materialized views from the first step at the same time using JOINTSEL approach. Strength and technical contribution: The approach described in this paper can efficiently recommend materialized views and indexes by consider them both at the mean time. According to their experiment, the algorithm used to come up with merged materialized views recommendation is much faster (10x) than exhausting possible search space with the hurt in performance of less than 10%. View merging algorithm has better performance when storage bound is smaller. Hence the entire architecture is both time-efficient and memory efficient hence scalable for large workload. Drawbacks of the paper: Overall, the paper give the description of their architecture with appropriate detail for readers to understand. If it can have more examples to explain some of the concepts like “pair-wise view merging” in the paper, it would be better. Problem Addressed: The problem the author of the paper trying to address is to come up with an efficient algorithm to consider Materialized views and indexes at the same time when a database system planing a workload. Fundamentally, materialized view and indexes are the same. They create some additional physical structure to accelerate the performance. Consider materialized views and indexes at the same time and make recommendation can achieve better performance than only consider one aspect. The problem is, materialized views is much richer and has a much larger search spaces then indexes. Exhausting the possible space of materialized view for a workload is time-consuming and not scalable. Main approach: The author purposed an architecture and novel algorithm to do efficient search in space of possible materialized views that is syntactically relevant to find materialized views recommendation that is close to the optimal choice. The module Candidate Materialized View Selection is in charge of doing this. Within this module, the algorithm first find interesting Table-subsets, then prune the syntactically relevant materialized views using query optimizer. For the result set of materialized views, the view merging algorithm will generate the final merged views. Final step consider the indexes and merged materialized views from the first step at the same time using JOINTSEL approach. Strength and technical contribution: The approach described in this paper can efficiently recommend materialized views and indexes by consider them both at the mean time. According to their experiment, the algorithm used to come up with merged materialized views recommendation is much faster (10x) than exhausting possible search space with the hurt in performance of less than 10%. View merging algorithm has better performance when storage bound is smaller. Hence the entire architecture is both time-efficient and memory efficient hence scalable for large workload. Drawbacks of the paper: Overall, the paper give the description of their architecture with appropriate detail for readers to understand. If it can have more examples to explain some of the concepts like “pair-wise view merging” in the paper, it would be better. |
This paper concerns the problem of building a tool for automated selection of materialized views and indexes for SQL workloads, which is crucial for improve DBMS performance. Previous paper in this area either considered materialized views and indexes in isolation or ignore the problem of large space of potentially interesting materialized views. These methods will lead to a suboptimal solution and poor performance. To solve these problems, this paper proposed: 1) a principled way to identify a much smaller set of candidate materialized views and 2) a new way that searches over the combined space of indexes and materialized views. The goal of candidate selection is quickly eliminated materialize views that are never used when answering any query. To achieve that goal, three steps are needed: 1. Find interesting table-subsets using a new metric called TS-Weight, which can discriminate between subsets even if they appear in the exactly the same queries. 2. Utilize query optimizer to eliminate the set of materialized views that are not the best solution for even one query. Materialized views are purposed based on “pure-join” and possible selection conditions and group by statements for each query in the workload. 3. Merge the views got from step 2 to generate new views that are useful for multiple queries. This step is very useful when the whole system is under memory pressure. Authors of the paper compared their method to exhaustive proposal method and found that their method significantly improves running time with only minor sacrifice in quality. For configuration enumeration, the paper purposed a method that jointedly searches in the space of indexes and materialized views using the Greedy(m, k) algorithm. Compared to methods that either select index or materialized views first, this method achieves better quality and eliminate the need of configuring storage partitioning parameter which typically varies with storage constraint and workload. In my opinion, the method purposed in the paper heavily depends on the cost estimation module. Under the situation where the database is large and complicated, and therefore the cost estimation becomes inaccurate, I wonder the quality of the method purposed compared to other methods. |
In the paper "Automated Selection of Materialized Views and Indexes for SQL Databases", Sanjay Agrawal, Surajit Chaudhuri and Vivek Narasayya propose a method to automatically select materialized views and indexes for SQL workloads. When considering indexes and materialized views at face value, they are both physical structures that can enable better execution speed within a database. At the cost of physical space, index scans and materialized views are much faster than sequential scans and aggregations. Thus, it seems clear that one must consider them together in order to optimize the physical scheme for the workload. Yet, prior works in this field have quarantined these two structures separately. Agrawal notes that indexes are simply a special case of a single-table, projection-only materialized view. Therefore, if innovative techniques are created to deal with materialized views, we end up killing two birds with one stone. In order to combat the problem of choosing from multiple generated materialized views, Agrawal presents an architecture that optimally selects these indexes and materialized views and (as Microsoft employees) integrates them into Microsoft SQL Server 2000. This system has a syntactic structure selection, candidate selection, configuration enumeration, configuration simulation and cost estimation which effectively reduces the pool of candidates for indexes and materialized views. This pool only contains candidates that are "interesting". By targeting queries that are used routinely, queries that are expensive, and the overlap between queries, the pool of candidates is narrowed down even further. They then implement and use a greedy algorithm that chooses the best configurations based on how expensive the estimated workload is. Finally, merges are applied to the parent view, creating a single merged view that contains all the proper information for optimal materialized view lookup. One problem that we face when reading this paper is the lack of technical details on how to effectively select the indexes and materialized views. Most of the details are either referenced or the base knowledge of the reader is assumed to be moderate in this field. Although redundancy is bad, shedding some light on index candidate selection could pave the way for more open questions. Furthermore, the experiments were tested on small databases - usually not a good indicator of the limits of the approach. I would have liked it if they attempted to use larger workloads in order to simulate what would happen in a real world setting. Lastly, I'm curious about whether their algorithm could be modified so that the amount of merges can be controlled by the user. Sometimes, due to memory constraints, multiple materialized views are not feasible and thus only a certain amount can be created. What are the consequences to that approach? Could machine learning be implemented in this approach somehow? |
This paper focused on detailing an effective algorithm for generating a useful configuration of materialized views and indexes for any arbitrary set of queries on any arbitrary dataset. Having good views and indexes can significantly reduce the cost of a workload; as such, it would be useful to have a way to automatically generate useful indexes and views ahead of time. The first step to getting a good configuration of views is determining syntactically relevant views, which are those that can actually be used to answer any query, and so have columns and tables used for some query. The next step is selecting candidate views, or views that might actually be used in a query, which are those views that cause a noticeable improvement when running the full set of queries. Views can also be merged in certain circumstances to save space. After this, the best configuration of views is generated. This paper uses an algorithm that fully enumerates all combinations of a limited number of views, and greedily picks the rest. This is much faster than enumerating all possible views, and has nearly the same performance. The paper provides a useful, practical means of generating indexes and materialized views for an arbitrary workload on an arbitrary dataset, and it does so without requiring a specific setup. Parts of it, such as the configuration enumeration, are entirely replaceable, so the results of this paper are general enough to work with many different existing implementations for generating indexes and views. The paper interesting does make a distinction between indexes on tables, materialized views, and indexes on materialized views. However, it declares that the differences are negligible. While this is useful for picking a configuration, it does leave out how to compare indexes to views when making a selection between them. |
The paper mainly discuss the issue of automatically selecting appropriate index and materialized views for SQL database in industry thinking. The question is important as in nowadays SQL database systems, the utilization of materialized view is widespread, therefore a judicious selection of materialized view is of vital importance. Different from previous papers, this paper considers the interaction of indexes and materialized view as they are similar physical structures containing useful information. In addition, several techniques have been adopted to accelerate the search process including reducing the search space by pruning candidate materialized views and view merging The approaches are experimented and demonstrated to have faster speed and better solutions However, materialized views and indexes are still part of physical design tools and this innovative industry design tool need to be completed |
“Automated Selection of Materialized Views and Indexes for SQL Databases” by Sanjay Agrawal, Surajit Chaudhuri, and Vivek Narasayya discusses work on how enumerating over the combined space of candidate indexes and materialized views can find helpful indexes and materialized views that could not have been found if searched for separately. Prior work had considered separate searching approaches, but never for searching indexes and materialized views together. As a DBMS uses both indexes and materialized views together to perform operations, it would make sense that we should ideally find the combination of indexes and materialized views that makes the DBMS most efficient. |
In this paper, the authors focus on purposing a novel solution to the problem of selecting materialized views and indexes automatically. It’s an important issue since selecting materialized views and indexes judiciously can significantly improve the performance of modern DBMS. They solve this problem by purposing a novel architecture and algorithms. In the architecture of selection tool, given a representative workload, indexes and materialized views will be recommended. This framework includes 5 components: syntactic structure selection, candidate selection, configuration enumeration, configuration simulation and cost estimation. First of all, they give a big picture of the system architecture and a brief introduction of each component. Then in the related work part, they summarize several papers on selecting of materialized views. They point out potential issues in previous work and how their method overcome them. The “Candidate Materialized View Selection” section is the core of this paper. This is important because filter out unnecessary views can reduce the space of enumeration for search. They introduce 3 steps for their selection algorithm: 1. Table-Subsets elimination for finding interesting subsets. 2. Cost based pruning of syntactically relevant materialized views and selection of best configuration. 3. Views merging for avoiding sub-optimal recommendations. Next, the authors discuss the interaction between indexes and materialized views, they purpose their approach JOINTSEL and make a comparison between INDFIRST and MVFIRST to illustrate why their joint enumeration approach outperform than two other alternatives. Then, they talk about the experiments done with their architecture. From the experiments, they give the following conclusions: 1. Interesting table-subsets finding algorithm performs well, it eliminates unnecessary views while keeping useful views. 2. The view merging algorithm significantly improves the quality of the recommendation. 3. The candidate selection component significantly reduces the running while maintaining high-quality recommendations. 4. Configuration enumeration module Greedy(m,k) gives good results compared to complicated algorithms while running significantly faster. 5. The joint enumeration – JOINTSEL gives significantly better solutions than MVFIRST or INDFIRST. The main technical contribution of this paper is that it is a pioneering work builds an industry-strength tool for automated selection of materialized views and indexes for SQL workloads. Just as they said in their paper, despite some work in materialized view selection in the context of multidimensional (OLAP) databases, most of the prior work considers the problems of index selection and materialized view selection in isolation. However, their solution considers the interaction between them in configuration enumeration thus make a significant improvement. In their solution, they explore the richness of structure of materialized views and not only focus on a given set. A key contribution is that they provide efficient algorithms for selecting candidate materialized views. In various experiments, they prove that their approach is robust and efficient. This is an innovative paper at that time and I don’t find any main drawbacks. I think maybe they can discuss more about syntactic structure selection and how configuration simulation and cost estimation work in their system to make a better selection of materialized views and indexes. |
This paper does two main things: first, it makes an important observation about the relationship between indexes and materialized views—namely, that the former is a special case of the latter—and second, then uses this observation as a key component of an automated selection algorithm for materialized views and indices. The automated selection process has several steps—it starts by identifying possible indexes / materialized views that are syntactically relevant, while pruning as early as possible configurations that can be seen from heuristics to be not worth pursuing. This process is called configuration enumeration, and there are several sub-algorithms that are used to make this as accurate and efficient as possible. Several specific metrics and algorithms are given and evaluated in the context of choosing configurations to enumerate, such as exploiting the query optimizer and merging views. The detail that seems to be the most important to the authors, however, is the claim that index enumeration and materialized view enumeration should happen jointly, rather than serially; this is because of the fundamental relationship between the two that was discussed early in the paper. Experimental evidence is given which supports this idea, using the implementation given in the Microsoft SQL Server 2000. One of the strengths of the paper is that its authors immediately established credibility by showcasing that their contributions were featured in an industry setting (Microsoft SQL Server 2000). They also exploited a fundamental comparison of indexes & materialized views to create an algorithm that leverages this and improves efficiency of enumeration. One weakness of the paper is that although empirical evidence is given to support the paper’s ideas, the experiment wasn’t on a workload big enough to really confirm that the joint index/MV enumeration is definitively better, and more experiments would likely be necessary to truly confirm this. |