Review for Paper: 5-Self-selecting, self-tuning, incrementally optimized indexes

Review 1

This paper proposes adaptive merging, which is an adaptive, incremental and efficient technique for index creation, and compares its performance with Database Cracking (another index adapting technique).

The motivation for building this index selection method is, selecting a good set of indexes can avoid queries from scanning large parts of database, with a reasonable amount of update cost. Database cracking proposed incremental and automatic optimization for index selection, but requires many steps to achieve steady state optimized performance, not efficient, and not work well for block-access storage.

The proposed adaptive merging method uses partitioned B-tree, initialize partitioned tree when a new index occurs, and then do incremental optimization by merging indexes from different partitions when a query arrives and some index is used.

This method works well for query stream, applies to disk-based and in-memory databases, useful for both efficient search and query execution, can optimize itself according to past queries without new queries, and can tune the range of key merge to achieve simpler space management.

Adaptive merging and Database cracking have a fundamental difference that database cracking relies on partitioning in database, while adaptive merging focus on merging the database keys. Their performance is compared under multiple sets of queries, and the results show that adaptive merging is optimized faster (though the advantage decreases when query ranges are small), and can yield better performance than database cracking.

In general, this paper is well structured and clearly illustrates the algorithm it proposes. One drawback is, the queries used to test and compare performance is just a self-designed query set (though the set makes illustrations easy to understand). Comparing performance on more widely accepted test benches or real-world cases would be better.

Review 2

Automatic index tuning is necessary in relational DBMS because the number of possible indexes are usually in a large scale. Current techniques cannot meet the demands of index optimization. Database cracking technique is inefficient when applied on block-access devices. Traditional index creation sorts data efficiently but is neither adaptive nor incremental. Therefore, this paper proposes adaptive merging, an adaptive, incremental and efficient technique for index creation.

Some strengths of the paper are:
1. Adaptive merging technique combines the efficiency of B-tree creation with the adaptive and incremental behavior of database cracking, which solved the weakness of using each of these methods alone.
2. Using replacement selection in initial index creation step avoids the potential memory leak if the process exceeds the allocated memory.
3. Adaptive merging combined with partitioned B-trees provides the option to tradeoff between current query cost and optimized key range for subsequent queries by reducing the fan-in of merge steps.
4. Adaptive merging adapts to changes in query patterns much faster than database cracking because the limited fan-out of partitioning used in database cracking.

Some drawbacks of the paper are:
1. The paper discusses little about how adaptive merging can be applied to materialized view of relational DBMS.
2. In the performance evaluation part, there’s no experiment for comparing the adaptive merging with the index tuning techniques.

Review 3

This paper proposed adaptive merging, an adaptive, incremental, and efficient technique for index creation. As index selection is a central, classic, and very hard problem in physical database design, and existing tools have weaknesses of long interval between monitoring and index creation, waste of data access, difficulty in predicting the key ranges to focus on, slowness and high expense, unstable efficiency and low performance in block-access data. And newly-proposed technique is efficient, adaptive and incremental with high performance.

This paper introduced adaptive merging technique to combine efficient merge sort with adaptive and incremental index optimization. The essence of adaptive merging is to exploit partitioned B-trees to focus merge steps on those key ranges that are relevant to actual queries, to leave records in all other key ranges in their initial places, and to integrate the merge logic as side effect into query execution.

The paper has several strengths and technical contributions:
1).Adaptive merging fully optimizes B-tree with a faster speed, smaller memory allocation, higher efficiency and less overhead, and can be extended to block-access storage, and results in a complete, coherent, and searchable B-tree index even if it is not fully optimized.
2).Adaptive merging requires less overall effort and adapts to changes in the query pattern much more rapidly.
3).Adaptive merging is widely-applicable in all cases in which traditional B-trees can be used.
4).All the conclusions were draw based on extensive experiments comparing adaptive merging with only database cracking, which means it is safe to say that adaptive merging has such many advantages over database cracking.

The paper also has some drawbacks:
1).The experiment compared adaptive merging with only database cracking, but there are more techniques that should be considered to prove adaptive merging’s advantages.
2).The paper didn’t apply their algorithm into real-life database to prove the industrial usefulness.

Review 4

Indexes perform a vital role in relational databases by speeding up the performance of queries. In the modern world, with our huge amounts of data, however, it is impractical to choose the most appropriate index by hand, which necessitates the development of automatic index selection and tuning. The paper “Self-selecting, self-tuning, incrementally optimized indexes” by Graefe and Kuno (both from Hewlett-Packard Laboratories) proposes a new method called adaptive merging that improves on contemporary methods in automatic index management. As the authors mention, one previous method, “database cracking,” which relies on partitioning data based on the predicate, can suffer from inefficient sorting, especially when used on block-access devices. On the other hand, adaptive merging, aims to overcome this block-access limitation by using partitioned B-trees. Each partition contains a sorted array and given a query, multiple partitions are scanned in an interleaved fashion, and merged together into a single sorted stream. In essence, adaptive merging is a mixture of the partitioning method of “database cracking” with B-tree data structure to create partitioned B-tree. Beyond explaining this adaptive merging method, the authors also discuss various considerations like transaction support, and present experimental results comparing their method to database cracking, where they found adaptive merging to be more efficient in the task of index optimization, especially as the workload increases.

The main contribution of this paper is the development of a new method for automatic index selection and tuning that is an improvement over the previous widely-used method at the time (database cracking). In their experimental evaluations database cracking might move data records up to 5 to 10 times more often than adaptive merging, so there is a clear performance improvement achieved by this new method. This paper did a good job in clearly explaining the concepts associated with database cracking and adaptive merging, especially through the frequent use of examples and diagrams.

The first weakness that comes to mind is that the paper could have included a more comprehensive experimental evaluation. After all, one of their experiments only simulated queries against a random permutation of the first 10 million integers to demonstrate performance improvement. While a useful starting step, it probably does not adequately simulate the complexity of queries and data that are being used in the real world. If there is no widely used benchmark for this type of testing, it would be understandable, and in fact, the authors mentioned the inclusion of more detailed experiments as one of their future directions. Also, the authors could have gone into further detail in their explanation of some special phenomena in the experimental results. In particular, the adaptive merging shows a lot of randomly occurring sharp peaks, even as the overhead continues its downward trend. All things considered, this paper still has a great deal of merit and brings an important contribution to the important problem of automatically managing database indexes.

Review 5

Index selection is hard in physical database design. Few and wrong indexes many queries to deal with large volume of database and causes huge cost. One of the method is to focusing on very fast scans, another is to do the index tuning corresponding to the actual workload. Database cracking is one of the adapting indexing technique. However, 'Database cracking' has some drawbacks. 1)it requires many steps to reach the final representation for a key range so that it is a very slow move to be adaptive to the optimization of newly loaded data. 2)the efficiency of transforming an initial data representation into a fully optimized one depends on the query pattern. 3)searching efficiency never reaches that of a traditional index if cracking leaves unsorted minimal partitions. What paper proposed resolves these problems.

To address these problems, this paper introduce adaptive merging which combines the efficiency of traditional B-tree creation with the adaptive and incremental behavior of database cracking. The performance comes from both during adaptation and individual queries. The adaptive merging uses B-trees with 1)an artificial leading key field permits creation and removal of partitions 2)index creation is divided into generation and merging 3)query execution may optimize such an index by merging the key ranges required to answer actual queries. The algorithm has proved to perform better, stabler and much faster than 'database cracking' with substantial experiments.

One drawback of this algorithm is that it needs to initialize with a partitioned B-tree which is a relatively costly in the beginning compared with 'database cracking'.

Review 6

In this paper, the author try to solve the problem that how to automatically select and tune the indexes. As the relational database has grown lager and lager, it is beyond human ability to choose the index by hand.
The authors proposed an adaptive incremental and efficient technique for index creation. This technique introduced in this paper has overcome all weaknesses of database cracking such as
1.Require many steps to reach the final representation
2.The efficiency of optimizing an initial data representation depends on the query pattern
3.Search efficiency never reaches a traditional index if cracking leaves unsorted partitions

The adaptive merging combines efficient merge sort with adaptive and incremental index optimization. The essence of partitioned B-trees is to use standard B-trees to persist intermediate states during an external merge sort. The adaptive merging is based on partitioned B-trees focusing merge steps on those key ranges that are relevant to actual queries, and leave records in all other key ranges in their initial places.

The adaptive merging is based on precious work on index selection data cracking. They both offer a promising alternative to traditional index tuning that relies on monitoring. They both create a new index when a column is used in a predicate. The difference is that database cracking somehow behave like quicksort, it will sort the index using partitioning in the query key range. However, the adaptive merging method is based on merging, is merges the newly seen sorted query key range to optimize.The author has done a lot of experiment to show that his algorithm outperform the existing database cracking.

The main contribution of this paper is that is provided a novel algorithm named adaptive merging. Also, it provided thorough compare between adaptive merging and database cracking, both in theory and in comprehensive experiment. The author utilize the idea of merging sort instead of quicksort, which is quite interesting. Also, the author carefully examined previous work and provided comparison in details make this paper even stronger.

One subtle thing I would like to point out is that it is better to introduce the experiment setup.

Review 7

This paper describes a method to efficiently create new indexes automatically. With large databases, it is impossible for humans to monitor all of the queries and add indexes that are necessary as queries are coming in. However, having these indexes is necessary for effective database performance. Therefore, the ability of a database system to automatically create indexes based on the workload is invaluable.

The method proposed in the paper aims to improve on two types of previous methods
1. Traditional monitoring and index creation utilities. This is a slow process, which means that indexes can’t adapt to the current workload (and might not be needed by the time they are adopted), and the utilities can be resource intensive. They also cannot be optimized for certain key ranges.
2. Database cracking, a newer and better method that is able to create indexes “on the fly” that are optimized only for the necessary key ranges, but are still inefficient.
The novel method, adaptive tuning, primarily builds on database cracking.

Adaptive tuning takes on all of the advantages of database cracking: it is able to created indexes automatically when a column is used for selection for the first time, and it creates an index on that column in a way that is able to optimize only the key ranges that are used in the workload. By using a different data structure, the partitioned B-tree, adaptive tuning is able to more efficiently adapt to new query patterns (key ranges queried). The number of steps that it takes to go from the initial data structure to an optimized version is much smaller, which is the key contribution of this approach when compared to database cracking. The performance evaluation section does a good job describing the improvements made by the new algorithm.

I personally felt as though the diagrams in the paper weren’t very well thought out. There were both text and graph figures, and I think that a combination of the two into a clear example would have improved the effectiveness of the diagrams overall.

Review 8

Problem Addressed & Why it is so important:
The problem the author of the paper trying to address to create a self-selecting, self-tuning indexes. This problem is so important because for today’s data warehouse, the number of possible indexes exceeds human ability to analyze them one by one. The number and magnitude of tables and queries are both beyond human capability to manually choose indexes. Hence an automated self-tuning indexes is in demand. Prior work in solving this problem including a method call Database cracking. The author of this paper purposed a better method called adaptive merging which is more efficient than Database cracking on both in-memory storage and block-access storage.

Main approach:
The main approach purposed in this paper is a data structure called partitioned B-tree. The difference between Database cracking and Adaptive merging is Database cracking is based on partitioning whereas Adaptive merging is based on merging. The partitioning only has a fan-out of 2 or 3 each time but the merging can have fan-in factor that’s a magnitude larger then fan-out of the partitioning. It makes adaptive merging do less copies on each record and faster converge to final optimal index.

Strength and technical contribution:
Method purposed in this paper has the same objective to that of Database cracking which is the only other proposal the authors seen for the self-tuning optimized indexes. The main strength of this approach is that it is more efficient than database cracking and converges faster to optimal index. The optimal index of adaptive merging is the same as a traditional B-tree. The author did many experiments on comparing convergence curve of query overhead for Database cracking and adaptive learning in different workloads. It turns out, adaptive merging achieves faster convergence and lower overhead no matter the query range is large or small and the memory storage is in-memory like or block-storage. Adaptive merging can also take better advantage of larger memory. The other strength is that the writer gives concrete examples on how partitioned B-tree functions, how it achieves self-tuning. This makes the reader with no prior knowledge to partitioned B-tree learn this data structure fast.

Drawbacks of the paper:
The experiment part can be better organized. It would be better to separate the session of comparing Database cracking with adaptive merging and the session of studying adaptive merging’s behavior under different circumstance. Also the writer can categorize the circumstance in to categories like query range, query focus skewness, storage type and storage space. It would make the experiment section more organized and clear.

Review 9

For a relational data warehouse with a large number of tables and columns, the problem of how to automatically select indexes are crucial as too few indexes lead to poor performance, while too many indexes force high update and storage costs. One solution is to periodically monitor database requests and create or removal indexes. However, this method never considers partial index and the problem of lack of index during the update interval. Another way is called “database cracking”, which is an adaptive and incremental method, but it’s very inefficient when applied on block-access devices.

This paper purposed “adaptive merging” that overcome these weaknesses. It combines the advantage of fast creation of traditional B-tree with the adaptive and incremental behavior of database cracking. The major difference between database cracking and adaptive merging is that the former relies on partitioning while the latter relies on merging.

The key idea of adaptive merging is based on a data structure called partitioned B-trees. A partitioned B-tree add an artificial leading key field to traditional B-trees. The key value defines partitions within the tree. A partition is similar to a run in external merge sort: data within a partition are sorted, while they may overlap between partitions. The whole method works as follows:
- When a column is used in a predicate, create an initial index for that column. The initial index is created using run generation algorithms such as quick-sort or replacement selection. Each run becomes a partition in the partitioned tree.
- Subsequent queries find required records within each partition, merge them into one or more new partitions and return.
- Repeat this process until there’s only one partition.

The advantage of adaptive merging over database cracking comes from the fact that during merging, fan-in can easily exceed 100, while in database cracking the fan-out is typically only 2 or 3. This means database cracking needs a large number of updates (queries) to reach its final state (each update typically incurs high overhead), while adaptive merging can reach final state quickly.

However, the paper never talked about the performance of adaptive merging under memory pressure, which will negatively affect fan-in during each merge process. It will be better to see a comparison between adaptive merging and database cracking under such circumstances.

Review 10

In the paper "Self-selecting, self-tuning, incrementally optimized indexes", Goetz Graefe and Harumi Kuno tackle the problem of optimization of indexes in data warehousing. Index selection is a classical problem in physical database design that forces developers to face space-time constraints. Having very little indexing can make many queries scan too many parts of the database, while having lots of indexing is associated with heavy update costs. They propose a new approach for index creation called adaptive merging - an adaptive, incremental, and efficient technique used for index creation. To further support their technique, they compare it to concepts such as "database cracking" and B-tree creation to reveal its shortcomings with respect to adaptive merging. Adaptive merging takes the best of both worlds and also offers exhaustive performance evaluations and optimizations.

Database cracking initially involves keys within a table having a static location - existing in the order that they originally started. Once a range is queried, a pivot is used to separate all the keys that are lower than it and all the keys that are higher than it. These keys exist to the left and right of the pivot respectively. Now, we can think of the either side as a "search space" and can effectively eliminate the extra search time due to the fact that the DBMS has a good indicator of where it should search. This technique can be compared to quicksort - picking a pivot that is almost near the middle will guarantee the best efficiency, on average. Contrary to this, adaptive merging divides the keys into many subdivisions and sorts them accordingly. These subdivisions are then placed in a B tree and are easily retrieved when something is queried. Going through them in parallel, the sought records are retrieved sequentially. This technique can be compared to mergesort - which worst case, is better than quicksort. It is clear that adaptive merging is better in terms of the overhead for each query that it is run for.

Although this paper gave an objective view on performance through experimental evidence, it was quite misleading. First, it ignored initial costs of generating the sorted runs for adaptive merging which makes them lose some credibility to their claims. Additionally, having multiple indexing techniques and comparing them to adaptive merging could identify what it excels at. This way, we have a general indicator of our current best option. Lastly, one thing that I felt that was out of place in the paper was the section on extensions of adaptive merging. They chose to include this in the bulk of their research, but admit to only pondering about it. This does not contribute to anything meaningful unless it is placed near the end of paper and asked in the form of open-ended questions.

Review 11

This paper details an efficient method for automatically generating and simplifying indexes for a database. These indexes can be generated on columns of any table whenever a query requires them. If useful indexes can be generated quickly and automatically, queries will be significantly sped up. However, generating traditional indexes is slow, and requires indexing of all rows, even those that aren’t used. Database cracking is a dynamic means of building indexes that doesn’t require much initial setup, but it has a limit to how well sorted the index can be, and it moves data entries around more than necessary.

The proposed solution is adaptive merging use a partitioned B+ tree. This is a B+ tree with an extra field added to its key to represent partitions. A single sorting pass over a dataset can create sorted partitions, and therefore construct the initial tree. Range-based queries on the tree will, in addition to answering the query, take their results from the original set of partitions and put them in a new partition. Once all of the data has been transferred to new partitions, the partitions can be easily merged to form a normal B+ tree. Any data that is not queried will never be moved, and even data that is queried will only only be moved a few times (often only once).

One of the most useful attributes of this solution is that it uses almost the exact structure of a standard B+ tree, so algorithms for updating the tree and managing transactions are the same, and infrastructure for implementing these indexes can be relatively easy. In addition, having a fast, reliable way to automatically generate indexes means that users don’t have to worry about deciding which indexes they’ll need for future queries.

This paper, however, doesn’t seem to include much analysis as to the storage cost of creating these indexes whenever needed, and there’s an open question of how effective creating indexes as needed is compared to automatically or manually generating indexes ahead of time that could be useful for a known future workload.

Review 12

The paper mainly focus on optimization on automatic index tuning, which is important as the indexes in relational database systems are too abundant to be understood by humans.
The paper mainly adopt an approach named adaptive merging. It combines the efficiency of traditional B-tree creation using partitioned B-tree with the adaptive and incremental behavior of database cracking. The essence of adaptive merging is to exploit partitioned B-trees in a novel way, namely to focus merge steps on those key ranges that are relevant to actual queries, to leave records in all other key ranges in their initial places, and to integrate the merge logic as side effect into query execution.
In terms of performance, adaptive merging is experimented and demonstrated as faster than data cracking and performs better in memory and on block-access storage. The main difference between the two approaches is the reliance on partitioning in database cracking and on merging in the new techniques.
However, the evaluation part of the paper is trivial and not organized well

Review 13

In this paper, the authors purpose a novel approach for creating indexes adaptively, incrementally and efficiently. The adaptive merging algorithm makes index creation become self-selecting, self-tuning and incrementally optimized. The index selection problem is important, since in modern DBMS especially for data warehouse, the number of indexes is very huge, correctly selecting the appropriate indexes can greatly improve the performance of DBMS. In order to make better indexes generation, based on the database cracking technique and B-tree creation, they introduce an innovative idea - adaptive merging that combines the efficiency of traditional B-tree creation and the adaptive and incremental behavior of database cracking. Experiments show that this idea is promising and outperform previous approaches.

Database cracking is trying to solve this problem by combining automatic index selection and partial indexes, although it is a pioneering work, it suffers from several problems like efficiency in transforming and searching, poor performance on block-access storage. Based on this work, they combine tradition B-tree generation process and purpose adaptive merging. In adaptive merging, data is separated into several partitions (sorted), then initial indexes and partitioned B-tree is generated, given a query, it searches in every partition and merge the accessed keys into a new partition in B-tree.

The main technical contribution of this paper is the concept of adaptive merging. It is a pioneering work by combining database cracking and B-tree generation using the merging idea rather than partitioning. In their paper, they point out 4 potential problems of database cracking, they introduce their idea by making detailed comparisons between adaptive merging and database cracking and show how adaptive merging overcome the problems in database cracking. Since the merge idea is used, it eliminates overhead and works better in block-access storage case.

Generally, it’s a great paper with innovative ideas, I don’t find any main drawbacks. Maybe they can give more concrete examples when describing how adaptive merging works which will make people understand their ideas better. Besides, this paper spends almost all the time making comparison to database cracking, it should include some other techniques in index selection and show the advantages or drawbacks of adaptive merging compared to them.

Review 14

The paper introduces a technique called “adaptive merging” which was designed to enable automatic creation & tuning of database indexes. This technique was based on an earlier successful technique called database cracking, which leveraged partitioning to attain performance benefits that manual index creation techniques do not afford. Database cracking’s major weakness, however, was that it was designed to perform well on in-memory arrays, and does not succeed outside of this environment. Adaptive merging utilizes most of database cracking’s fundamental strategy but whose key difference is using partitioned B-trees as a central data structure which (along with a few minor algorithm changes) allows it to perform much better on block-storage databases.
The most important strength of adaptive merging is that it performs well on block storage, which was one of database cracking’s biggest weaknesses. Experimental results also showed that adaptive merging has much lower average overhead than database cracking. Both of these techniques offer benefits over manual index creation, such as the ability to avoid manual monitoring, manual (offline) analysis and (sometimes) additional overhead.
One weakness of adaptive merging is that according to the experimental data, although average overhead is much lower, it is also slightly less consistent than database cracking’s overhead. Also, the experiment itself was not exhaustive or complex enough to be ideal, which was noted by the authors themselves. Also, several variations to the adaptive merging algorithm are given, which is a strength, but none are analyzed, which is a slight weakness.