Review for Paper: 5-Self-selecting, self-tuning, incrementally optimized indexes

Review 1

Database indexes facilitate faster selection of tuples in a given range of values, but the indexes themselves take time to create. A traditional B-tree index takes at least as many steps to produce as are required to sort the indexed data. Adaptive indexes use information gleaned from the real query workload on a table to decide which columns to index and which parts of each column to keep sorted.

In "Self-selecting, self-tuning, and incrementally optimized indexes," Goetz Graefe and Harumi Kano propose a novel method called adaptive merging that indexes only those columns that are queried, preferentially devotes effort to sorting frequently-queried ranges of values, and eventually can yield a conventional B-tree index. Their method improves on a similar approach called database cracking, which uses partitioning of an unordered list rather than merging of sequences in a partitioned B-tree, to generate its index. Unlike database cracking, adaptive merging can produce a completely sorted index after just a few queries, limited only by the query workload and the merge fan-in that available memory allows.

The main contribution of the paper is a novel method for dynamically generating and improving a partitioned B-tree index for a table, with little time overhead. When a column of a table is queried for the first time, the column's values are copied to a new data structure, where they are sorted in place in groups of a fixed length, to make a partitioned B-tree in linear time. Each subsequent time the column is queried over some range, the overlapping sequences of all matching partitions are merged, to form a new partition. This process tends to produce larger partitions over time, eventually leading to a unified B-tree if all the data are queried often, or to a partitioned B-tree where all heavily queried ranges are in unified partitions.

As demonstrated in the paper's experiments, adaptive merging offers little benefit relative to the older method of database cracking if queries are over point values or very small ranges, and uniformly distributed over the range of values. This is because under such a workload, each merge step combines only a small number of entries into a new partition, so the progress toward a sorted index is very slow. Fortunately, the performance of adaptive merging in such cases can be improved by automatically merging over a wider range than the query range, for very narrow queries.

Review 2

This paper deals with a study on the optimization of the indexes in a data warehouse and providing a new alternative for its improvement. This is done by using self-selecting, self-tuning and incrementally optimized indexes. The paper sheds light on the concept of Database cracking which offers adaptive and incremental indexes, but falls short in front of the authors’ proposed adaptive merging technique. A comparison between the two is made and performance tests are conducted to substantiate the claim.

For gaining query performance, Automatic index selection is a need since long. It has led to the creation of partial indexes as well since it focused on recognition of data items being queried the most. The present trend however doesn’t follow the same path as the current concepts don’t care about the contents of an index but instead they strive to create new indexes based on query execution.

Database cracking is one such technique in existence which favors optimization of indexes via automatic index selection and partial indexes. When the column is used in the predicate of a query for the first time, it creates a cracker index which is then refined by further use of the column by the various queries employed by the user. This uses partition similar to that in quicksort.

On the other hand, partitioned B-trees align with the suggestions of the authors which make use of merging instead of partitioning like in Database cracking. The Adaptive merging technique uses merge sort with adaptive and incremental index optimization. The query performance is much higher in this technique since the B-trees are always sorted prior execution of a query on them.

The paper outlines the techniques of Index selection, which derives from the selection process in database cracking, initial index creation, which rely on merge techniques after partitions have been created in the B-tree and Incremental index optimization which majorly scans the multiple partitions in an interleaved way, making a new partition after sorting all the partitions into one and putting these records within the partitioned B-tree. Since the structure in question is a B-tree, traditional methods for concurrency control, logging, and recovery apply. Many variations of the techniques have also been discussed.

The authors finally give a performance evaluation of database cracking and adaptive merging. Many factors such as queries in sequence ranging from very small query ranges to long query sequences and even small memory allocation are tested. The results conform with the authors’ idea that adaptive merging offer a better alternative to traditional index tuning.

Review 3

This paper describes a new technique for index creation called Adaptive Merging, which is self-selecting, self-tuning and incrementally optimized. It combines strength for both traditional B-tree creation (efficiency, merging) and database cracking (adaptive, incremental). This paper describes index selection, initial index creation, incremental index optimization, table of contents, transaction support and updates for Adaptive Merging with plenty of comparisons with database cracking during illustration. Also it provides ideas on how to do variations and optimizations, and detailed performance evaluation (again, highly compared to database cracking).

The problem here is that automatic index tuning is widely needed because index selection becomes a very hard problem in these days. For example, too few or wrong indexes require queries to scan large parts of database, but too many indexes force high update costs. Database Cracking is one technique to solve the problem, however, it also has some weakness including inefficient sort (especially for block-access devices). Therefore, the author represents a better solution, Adaptive Merging.

The main contribution of this paper is the concept of Adaptive Merging. This paper provides detailed comparison between Adaptive Merging and database cracking, and shows how Adaptive Merging has more advantages over database cracking. It provides automatic creation and incrementally optimized indexes in large data warehouses with “external” block-access storage. It is very brilliant that Adaptive Merging take use of partitioned B-trees, because partitioned B-trees is very similar to traditional B-trees and the author points out that there is a high probability that partitioned B-trees and Adaptive Merging can apply in all cases where traditional B-trees being used.

An interesting observation: in general this paper provides a clear and detailed description about Adaptive Merging with plenty of performance evaluation. It might be better if more details of construction of Adaptive Merging are provided. Also it might be helpful if more examples on how Adaptive Merging works are provided. The idea and concept is a little bit hard to understand without some detailed examples.

Review 4

In this paper, Graefe and Kuno proposed a novel approach for automatic index creation and index tuning called “adaptive merging”. As compared to the contemporary index selection tools which rely on ongoing actual workloads and invoke creation and deletion of indexes occasionally, this method not only solves the problem of long-running index creation processes which potentially can disturb processing of current workload, but also it has the ability to focus on the rows that are more queried and create index for them instead of the entire table.

Further more, even in comparison with “database cracking”, adaptive merging seems much more promising in many aspects. Specially, adaptive merging works good in both block-access storage and in-memory, while database cracking seems not to work good in block-access storage environment. Also, database cracking needs more steps to get to the optimized state in comparison with adaptive merging method.

The interesting idea behind adaptive merging, which is actually its key difference with database cracking, is its use of merging (like merge sort) as opposed to database cracking which uses partitioning (like quick sort).

One important flaw of this research, in my opinion, is that they relied on a limited type of randomly generated queries that might not be close to actual workloads. The result of performance tests would be more reliable if some well known benchmark was used.

Review 5

This paper proposes an index type that the authors think will yield the best of both worlds - it will tune itself, like database cracking, but not require large amounts of queries to build the index, like standard B+Trees. The data is still kept "sorted", yet the database is able to adapt more quickly to new data and queries than a system using database cracking. They also claim that this system, called "adaptive merging", works well with both in memory and on disk storage. Their idea also allows the concept of "partial indexes", in response to real world applications - for example, for buisness applications, their is almost no need to query data from last year, while data from this year will be queried very frequently.

Rather than come up with some dramatically new idea, the authors have proposed a slight modification to an existing idea. They wanted some features of B-Trees, and some features of database cracking, and so they invented a data structure + algorithm that combines both ideas, and yields the benefits of both. This lets DB practitioners continue to use the same systems for transactions, concurrency control, logging, and recovery since the B-Tree data structure is still being used. The adaptive merging technique is also well suited for block-access devices, such as disks, and is very flexible. It can be used everywhere B-Trees can be used, including "compound" B-Trees, and B-Trees on computed columns.

Though the authors seem to think that it does not require many queries to reach efficiency, it seems, especially for queries where only a small percentage of the total data is returned, that it can take quite a few quires for the database to tune itself. However, the heuristic they propose near the end of the paper seems to alleviate this problem to some degree. I would also have liked to have seen how well the adaptive merging technique performs in comparison to plain B-Trees. While the authors claimed that performance was comparable, I would have liked to see some graph comparing the two index types.

Review 6

Index selection is important and hard since either too many or too few indexes can slow down the performance of the database. To improve the efficiency, indexes should be tuned in runtime. The traditional way is selecting an index according to the code in the beginning, which is not flexible. Another way is database cracking, which is not efficient enough and does not well for block-access storage. This paper proposed a new algorithm called “adaptive merging” for dynamic index selection to improve the performance.

The adaptive merging algorithm adopts several prior algorithms.
i) Automatic index selection: during runtime, some data may be more queried than others, which means they are more valuable to be indexed. This mechanism could observe and find out the index to be tuned during running.
ii) Database cracking: This algorithm modifies the indexes dynamically according to the prediction of queries. It is based on partition, which is similar to quick sort. But it is designed for in-memory indexing. For block access storage, it needs other algorithm to work.
iii) Partitioned B-trees: It works in block access storage, which can be a complementary of database cracking.

For the adaptive merging algorithms itself, it is similar to database cracking, while it is based on merging instead of partition, which is like the difference between merge sort and quick sort. It can be considered as an improvement of the database cracking algorithm which enables it to work in block access storage and has higher performance. Its index selection adopts the same way as database cracking. And when creating an initial index, the algorithm adopts the method of partitioned B-trees. For updating the data structure, traditional search and update methods can be applied. One trick for deletion is that it can insert a “negative” record to counteract the original record to be deleted.

Also the paper provides detailed performance evaluation to prove that the adaptive merging algorithm is effective. The evaluation compares the performance of the database cracking algorithm and the adaptive merging algorithm. The experiments simulate the situation of large random data in the database with different quantity of queries. It shows that in the same database, the more queries run, the greater advantages the new algorithm shows comparing to the database cracking. All those tests show that though the performance of adaptive merging is not stable, it still has significant improvement.

Review 7

This paper proposed a kind of self-tuning incrementally optimized index, adaptive merging. In relational databases, index selection is critical, while too few indices may leads to lots of scan and too many indices leads to high update cost. One approach to this problem is to tune index according to the queries.

Database cracking method which combines automatic index selection and partial indexes try to solve this problem by partition the data into groups according to the predicate, and leave the partition unsorted. The main drawback of this method is that quick-sort-like idea does not perform well on a block-access device, and leaving each partition unsorted increases search time to O(n), which is undesirable.

The basic idea of adaptive merging method is based on partitioned B-tree, which is like a traditional b-tree with an artificial leading key field in each record. Each partition is already a sorted array. And for each range search query, it searches in each partition, and at the mean time, merges the elements in that range to a new sorted partition.

The paper also compared performance between adaptive merging and database cracking, concluding that adaptive merging outperforms database cracking in most cases, especially with long query sequence.

The paper concluded that database cracking is more suitable for in-memory arrays while adaptive merging works better for external storage. And adaptive merging has some other advantages such as it benefits from B-tree in terms of efficient search. The main reason that leads to efficiency advantage of adaptive merging is that partitioning pan-out is limited to 2-3 but merge fan-in can be much larger, so database cracking needs to do more movement of data records to sort them.

Review 8

This paper proposes an adaptive merging algorithm for index creation and optimization in a database. The algorithm is similar to “database cracking” in that it creates and optimizes index adaptively to provide efficient sorting. However, it solves the performance and scalability limitations of “database cracking” by relying on merging rather than partitioning by the use of partitioned B-trees.

The main steps of the algorithm are the following:
Data is separated into partitions, and each partition is sorted, creating the initial indexes and partitioned B-tree.
When a query is processed, the accessed key gets merged into their own partition to form a new partition in the B-tree.

Adaptive merging allow indexes to be dynamically updated and optimized based on the query history. Because the algorithm relies on merging, it reduces the performance and memory overhead that were present in database cracking, and work better for block-access storage. As a computer architect, it makes me wonder if it is possible to allow hardware support for these sort of index optimization techniques, so that these operations can be done quickly and efficiently in the background without relying on the software.

Review 9

This paper introduces an adaptive way of automatically tuning the indexes called “adaptive merge”. This paper tries to solve problems the high cost of keeping too many indexes in a database like relational data warehouse, and the inefficiency in current index automatic optimization methods.

The adaptive merging combines the idea of self-tuning in database cracking and the partitioned B-trees structure. It implements quick sort to initialize the index to several partitions and for each range query later, it creates new partition by moving the indexes in the current search predicate but not in the range of the current partitions.

This approach has several main advantages compared to the database cracking method:
indexes are not moved when they are not in the key ranges of the current query.
it has a high fan-out as it implements the idea similar to merge-sort
it needs only a single pass in a run generation for a more optimized the useable search index tree.
it performs well over block access storage

Compared to database cracking method, there may exist one drawback:
The adaptive merge has a relatively high initial cost, since it needs to construct a partitioned B-tree in the first run while database cracking simply copy the column to memory.

Review 10

This paper presents Adaptive Merging, a way of automatically and incrementally optimizing database indexes. Indexes are used in databases for efficient retrieval of data based on certain key fields which can be set by the user. The problem is that for very large databases (“data warehouses”), there are too many tables to determine indexes by hand. In addition, indexing might only need to be done on a portion of highly queried data in a table, and further indexing would be wasted work. Adaptive merging can be used to automatically and incrementally index a database based on usage.

A previously existing alternative to Adaptive Merging called Database Cracking is presented. In Database Cracking, initially all keys for a given table column are arranged in the order they exist in the database. Upon querying a range, a strategy similar to the “partition” function of quicksort is applied to these keys: the keys in the range are collected together, and all keys lower will be to the left of these keys and all higher keys will be to the right. This creates two additional unsorted partitions. By accumulating these partitions, the DBMS can reuse the known range of these partitions to limit search space for queries.

In contrast, Adaptive Merging begins by dividing all keys into partitions and sorting them- the range of each partition is then known. These partitions are then placed into a B Tree. When a query occurs, the DBMS can use the B Tree to find out which partitions contain desired records. It then scans multiple partitions simultaneously, and pulls out the records that are required by the query in sorted order. In other words, it performs a “merge” (from mergesort) on these partitions, except that it only pulls out the records matching the query in the result. These sorted records are then put into a new partition in the B-tree.

When this “merge” is performed, the number of partitions that can be scanned simultaneously is limited by the memory of the system X. If it will take nX memory to store all partitions (where n is an integer), then this will result in n sorted partitions. This means that future queries will only need to look at these n partitions in order to find the records queried by the first query.

Adaptive Merging has superior performance to Database Cracking. Because it gathers all queried records into n sorted ranges, the index is much more optimized for future queries after just one query. In contrast, after one query Database Cracking has only two boundaries between its unsorted partitions that it can use to optimize future queries. Adaptive Merging also has better performance on index sets that can’t completely fit in memory because it leverages a B Tree data structure.

Review 11

The paper proposes an incremental index creation technique, called “adaptive merging”, that utilizes partitioned B-trees to achieve adaptive and incremental index creation. It uses the term “adaptive” because it takes actual queries as a basis for creating and tuning the index. Every time a new query is executed, the index structure “adapts” to the query. There is a similar technique, which is known as “database cracking”. The authors mainly compare their technique with “database cracking” in this paper.

The difference between “adaptive merging” and “database cracking” is similar to the difference between two sorting algorithms, “mergesort” and “quicksort”. As queries are executed, “database cracking” sorts and optimizes the index in the queries key range just like quicksort divides partitions for sorting. Similarly, “adaptive merging” generates sorted runs or partitions when it sees a column is used in query predicates for the first time and as a new query with a different key range is executed, it merges the sorted runs to optimize the index for that key range also like in mergesort. These techniques are effective, especially when only a part of a large table is frequently accessed. In such case, maintaining a traditional indexing structure (e.g., B-tree) can be needlessly expansive and inefficient.

The paper demonstrates that “adaptive merging” is better than “database cracking” in terms of its overhead per query, but that is the only evaluation done for both techniques. There is no comparison between two techniques about the performance of query processing using the incremental indexes created from the two techniques. The overhead per query can also be misleading, because the sorted runs for the entire records in the index need to be generated for “adaptive merging” which is not required for “database cracking”. The experiment results are somewhat biased, neglecting the initial cost of generating sorted runs in “adaptive merging”. When the paper signifies about how many less queries their technique requires to incrementally optimize the index to be same as traditional B-trees, I personally questioned myself, “why don’t they just build traditional B-tree from the beginning if that is so important?” Surely the technique has its objectives and applications that differ from those of tradition B-trees, but I think their experiment and evaluation setup was dubious in that sense.

In conclusion, the paper proposes an interesting technique to create and tune an index incrementally, adapting to user queries. The automation of efficient physical database design has been an important research topic as the transaction workload and the complexity of database systems increases. Manually tuning the database has its own limitations, requiring consistent effort from DBAs, who are very scarce resource. The paper exemplifies what has been studied previously in this line of work.

Review 12

This paper introduced an improved version of database cracking known as adaptive merging. In database cracking, an index on a column is only created when the column is referenced in a query. When queries across a key range in the index are made, the column index is optimized over the range queried. Although database cracking works well with in-memory devices, it performs poorly with block access devices and takes many steps to reach the optimal key range, hence the need for adaptive merging.

Adaptive merging combines the pros of database cracking and merge sorting used in B-trees. In adaptive merging, the underlying storage structure is a partitioned B-tree. On initialization, the partitioned B-tree is created by running a sort algorithm on the data and creating several partitions. When a column is referenced a second time, each partition in the B-tree is scanned in parallel for the records within the query range. The matching records are extracted and put into a new partition. Subsequent queries within the same range will only look at the new partition.

Adaptive merging's performance overall is better than database cracking. It converges to optimal key ranges much more quickly. Additionally, since adaptive merging utilizes B-trees, there are many efficient search algorithms for B-trees that can be used.

If I am understanding it correctly, it seems as if partitioned B-trees were a convenient, but accidental discovery on the authors' part who were simply looking to improve on database cracking. As stated in the paper, the idea of adaptive merging works extremely well with partitioned B-trees, but I wonder what the result would have been had partitioned B-trees not been introduced. The algorithm introduced in the paper seems to rely on partitioned B-trees. So much that I think even if the authors were not introduced to the structure, the core principles of adaptive merging would have led to the creation of partitioned B-trees in order create an exploitable storage structure.

Review 13

This paper proposes an adaptive, incremental technique "adaptive merging" for index creation, and compares it with an existing similar technique "database cracking". The "adaptive merging" focuses more on merging while "database cracking" focuses on partitioning. The search is as efficient as B-tree access algorithms since adaptive merging uses a B-Tree as its underlying data structure (partitioned B-trees). An artificial leading key field is included for the partitioning. The experiences in this paper are mainly comparing these two techniques and the results show that adaptive merging outperform database cracking.

The interesting point is that adaptive merging utilizes each query execution, and that it applies in all cases where traditional B-trees can be used.

Review 14

This paper discusses methods for adaptive indexing, which adapt to the physical database layout for actual queries, over tradition, static indexing. This is important because in relationship databases with many tables, it is impossible for humans to determine the optimal index, and thus requires automatic index tuning. Graefe and Kuno suggest adaptive merging as an adaptive, incremental, and efficient technique for indexing that adopts quicker to new data and query patterns than database cracking, has better performance in memory and on block-access storage than database cracking, and has sort efficiency comparable to B-tree creation.

Adaptive merging is based in prior work on adaptive index selection. Previous automatic index selection approaches focus on automating decisions of selecting indexes to drop, create, or merge, but creation and deletion costs add to the database workload. Database cracking is analogous to quicksort and its main advantage is that key ranges never queried are never partitioned or optimized. The individual data records are each moved during each transformation interval from the initially un-optimized representation to the fully optimized representation. Database cracking performs well for in-memory databases, but does not work well with block-access devices like disk and flash storage. Partitioned B-trees, analogous to merge sort, is based on merging rather than on partitioning. Partitioned B-trees, unlike traditional B-trees, include an artificial leading key. Depending on record insertion and deletion, partitions appear and disappear without catalog modification. Partitioned B-trees suggest self-tuning with better query execution performance during index optimization and querying the final data structure, but does not suggest optimizing key ranges as a result of query execution.

Adaptive merging has the goal of combining efficient merge sort with “adaptive and incremental index optimization” (Graefe). Index selection for this design copies that of database cracking, so when the first time use of a column in the predicate creates a new index by copying the appropriate values. Initial index creation, like that of the partitioned B-tree, uses an in-memory algorithm and consists of many partitions, which are sorted and then merged to bring the B-tree closer to a singer sort sequence in a single partition. Incremental index optimization happens with a predicate uses a column that has been used before; thus an appropriate index is already existent. The query will find, out of all partitions, the required records. In adaptive merging, the query will “scan multiple partitions in an interleaved may, merge these multiple sorted streams into a single sorted stream, write those records into a new partition within the partitioned B-tree, and return those records as the query result” (Graefe). Adaptive merging maintains a table of contents to hold information about completed reorganization methods. Traditional transaction support methods—concurrency control, logging, and recovery—apply. For updates, insertions are either placed in final target partition, or gathered in a new partition that is specifically for gathering insertions. Deletions either search for the appropriate record in the index or insert “anti-matter” records. Modifications are handled as updates or as pairs of deletion and insertion.
The drawbacks on paper are that there is little discussion on how easy or difficult it would be to implement adaptive merging on existing database management systems and operating systems. There is also little discussion on current applications that use this design, and how the performance of this design is on those real world applications.

Review 15

This paper talks about the situation of a data warehouse with many tables that cannot all be optimized with indicies by human beings. In order to efficiently query all of this data it is necessary to automatically create and tune indicies on these tables. Some approaches have been proposed to manage these indicies. Database cracking is one method that creates indicies only when a specific column is queried in the database. It also optimizes over key ranges when the key ranges are used repeatedly. However, this automatic process is slow compared to the traditional method of index creation. The authors propose a method called adaptive merging which is more efficient than database cracking.

This paper is important because in these data warehousing situations it isn't possible to manually figure out which indicies are best for all tables. An automated process is required to create and tune indicies. The efficiency of this process is important because the generation of indicies or the tuning at later times could affect access performance significantly. The existing method for doing this - database cracking had the weaknesses of requiring many steps to get to a final key range, the initial data transformation depends on the query pattern, the search efficiency never reaches that of a traditional index, and it only worked well for in-memory databases. This paper addresses these issues with its new method.

The paper does a good job of motivating its improvements over database cracking and describing its how their adaptive merging works. The drawbacks of the paper are in their evaluation. I think that evaluating on a permutation of the sequence of the first one million integers doesn't give you much of an idea of how these techniques would compare in the real world. They also mention in their future work that they should compare to traditional index tuning, which I think they should have done in this paper. It would improve the evaluation if they used a real world data set or some sort of benchmark for evaluation.

Review 16

Part 1: Overview

This paper brings up adaptive merging, an adaptive, incremental, and efficient index creation method. This technique performs efficiently as merge based index creation and adaptively and incrementally as database cracking.

The major method used here is to combine the advantages of two different index creation method and therefore establishes the goal. Based on prior research results, HP lab comes up with a new index selecting and optimizing algorithm.

Part 2: Contributions

Adaptive merging, a new index creation method is introduced. Unlike doing quick sort like partitioning in the previous “database cracking” technique, it exploits B tree partitioning in a new way. It let B tree persist intermediate sorted stated at all time.

Partitioned B trees can be actually scanned in parallel, and then merged into outputs. If latter queries are subsets of previous ones, they can be efficiently resolved.

Part 3: Possible drawbacks

The performance test of adaptive merging may rely on the data set. If all queries are independent it should perform just as database cracking and traditional B tree merging. Maybe it should include more test cases other than random permutation from 0 to 9,999,999.
Pseudo algorithm code may be included to better convey the partitioning idea.

Long solid paragraphs may lead readers stray. The figures and examples are pretty intuitive.

There is not much innovation on updating, while there may be some future work on that.

Review 17

The paper mentioned that in a typical relational data warehouse, the number of possible indexes is very large for human to reasonably design an efficient indexing structure. The paper stress the importance of having automatic index tuning. While there exist different solutions including tuning indexes in response to actual workloads, such technique doesn’t have index support during the interval and doesn’t prioritize rows based on frequency of access. In line with addressing such issues, the paper proposes a technique called adaptive merging which combines the efficiency of traditional B-tree creation with the adaptive and incremental behaviour of database cracking.

Instead of using partitioning based on quicksort like in database cracking, the adaptive merging uses merging based on merge sorts. The difference is that partitioning is inherently limited to a partitioning fan out of 2 or 3 whereas the merge fan in could be considerably larger , limited by the available memory. Adaptive merging like database cracking requires a flexible B-tree structured called partition B-tree.

The paper elaborates that the adaptive merging excel as it moves data record less often and adapt to change in query pattern much more rapidly than the data base cracking .

Review 18

In this paper, a new technique, adaptive merging, is proposed to make index creation more effective. The algorithm creates index automatically so that such indexing adapts more quickly to new data and to new query patterns. It has better query performance both in memory and on disk.

Some existing indexing tools are not flexible. First, automatic index selection does not consider partial indexes. They focuses more on index selection, creation and dropping decisions. Then for database cracking, it works very well for in-memory queries. However, compared to the method proposed in the paper, database cracking is not so appropriate for block-access devices.

Adaptive merging, introduced in this paper, is self-tuning like database cracking but with better query execution performance not only for in-memory cases but also for block-access ones.

The test result seems a little weird (data points of their model are a little sparse and curves are more dramatic), which might be why, in the conclusion part, the author include more detailed test as their future work.

Review 19

This paper proposed adaptive merging, an adaptive, incremental, and efficient technique for an index which focus on the key ranges on queries. On current state of data is explosion. Data cracking optimized the representation of the key which is queried often. Then it becomes a self-selecting index. But it still has many significant weaknesses on efficiency and access diffident storage types. The adaptive merging come over those problems and optimized the index. In the database cracking’s definition, key ranges never queried will never be partitioned or optimized. It improved the efficiency without slowing down in-memory scans. The adaptive merging is to exploit partitioned B-trees on cracking model.

The performances of two algorithms are compared in the paper. In queries in sequence case, adaptive merging converges much faster than database cracking. In long query sequence case, the adaptive merging has a fully optimized B-tree after 1% queries. But it efficient is not guaranteed when the query is small size. The result show in small query ranges case. The adaptive merging works well if key ranges to be merged are rounded as proposed above as an obvious improvement. The very small query ranges case could show us the Characteristics. The adaptive merging is designed for a large memory but database cracking is not.

Review 20

This paper explores a method that helps build adaptive, incrementally optimized indexes for relational data warehouses. This method incorporates the idea of creating an partial index for a given key range that is optimized for block-access devices too.

Index selection is a classic and an inherently hard problem. One approach is to focus on enabling fast scans using shared drives and columnar storage formats that is good for traditional disk drives and arrays. The other approach is to tune indexes in response to actual workload however, the interval between monitoring and modifying the indexes might be a large enough time for request pattern to change.

Database cracking on the other hand is focused, incremental and automatic optimization of representation of data. It is a combination of automatic index selection and partial indexes. One of the key points about cracking is that key ranges never queried are never partitioned or optimized which is a crucial advantage of adaptive indexing over traditional indexes. However, it works best for in-memory databases but not so much in block-access storage.

Partitioned B-Trees is something that the authors have used as a significant structure in this method. This applied to both traditional disks and flash storage as well as in-memory indexes. The way a partitioned B-tree is distinct from a traditional B-Tree is that they add an artificial leading key field The distinct values in this field defines partitions.

Adaptive merging aims to combine efficient merge sort with adaptive and incremental index optimization. The essence of partitioned B-trees is that it uses standard B-trees to persist during intermediate state to provide efficient search at all time even before B-tree optimization is complete. The index selection uses the same heuristic as database cracking. When a column is used in predicate the second time, an index exists though not completely optimized. Instead of just scanning the desired key range one partition at a time, the query might as well scan multiple partitions in an interleaved way, merge these multiple sorted streams into a single sorted stream. Once all records within a key range are optimized, it acts like a traditional B-tree.

One of the advantages of this method is that you can modify the features of the index optimization as per your requirement. It is possible at any time to defer optimization of remaining key range. The index is only partially populated but query execution speeds up.

The authors have performed multiple tests and their algorithm performs much better or converges to the final index a lot faster than plain database cracking. However, in case of a query where the complete column is referred to multiple number of times and not just the range, it may not be possible to reap the gains of this algorithm. In case of columns that cannot be sorted easily such as date and text, this algorithm may not perform at its best level.

On a whole, this algorithm seems to be a good replacement for traditional index optimizers and stabler than database cracking.

Review 21

This paper focused on introducing a new self-tuning and incrementally optimized indexes, it supports the functionality that both provided by automatic index selection and database cracking. This new indexes in build upon partitioned B-tree index, which supports persistent intermediate partitions. And as more user queries coming, the optimization is incrementally made on indexes exploiting the advantage of external merge sort.

The key advantages is that it since it not using quicksort like partition, it uses partition with b-tree index instead. The parallelism is better applied and history user query partition can be used to assist index construction of current query.

And the possible drawbacks are that since the b-tree partition are exploited in an incremental way, the first few queries may still not be able to achieve high-performance, it may still need a ramp-up time. So the performance can depend on the query repetition ratio.

The second drawback is that there are no pseudocode provided in this paper, reader may not be able to fully understand how is the parallelization done when using the intermediate partition, and how is that really helping optimizing the query process. And maybe more hardware support are need for this algorithm?

Review 22

As the database size is growing, indexing is a very important technology. Traditional indexing treat all record equally that is not adaptive. Database cracking is a technique for adaptive indexing, but it has many weakness like inefficiency. The author propose a adaptive sorting which is adaptive, incremental and efficient for index creation which has better performance than database cracking.

Database cracking’s idea is more often a key range is queries, the more its representation is optimized. It is a adaptive way for indexing,automatic create and refine index and can partial indexed. But it has many weakness. One is the efficiency for indexing and the efficiency for searching is not good as the traditional indexing. Second is, as the database cracking is similar to partitioning in quick sort, it is not suit for block-access device like disk and flash storage.
Adaptive merging is using efficient merge sort with adaptive and incremental index optimization using partitioned B-tree. It has better searching performance when fully optimized than the database cracking. And as it uses external sort, it is good suit for block-access device. The author explains his idea from index selection, initialization, incremental optimization and also explains the data structure of this index method.
Finally, the author present serval experiment between the performance under different situation of Database cracking and adaptive merging.

In the paper, the author clearly explain adaptive merging using examples and provide experiment results to support his idea. Also, he shows some ideas that not fully analyzed and future works. But it is good for us to see more details about the experiment like different datatype of the column to index or multi-column compound searching.

Review 23

The authors introduce an index auto-creation and update mechanism. Their design is heavily inspired by database cracking. The intention is for indexes to be created when queries are made as opposed to being tuned by the database admin or created proactively by other techniques. The authors note that their research is complementary to work being done by others in that the mechanism they introduce is complementary to systems that others have created, specifically mentioning that techniques that decide when a new index should be built are entirely complementary. In their performance evaluation, they show that they have lower overhead than database cracking.

An unfortunate side-effect of using adaptive indexes is that the variation in terms of performance overhead is very high. The performance predictability is decreased.

Review 24

This paper introduces a new way to create partial indexes that is more efficient for block-access storage. With existing methods, such as database cracking, if the table is large enough or block-access storage is used, the slowdown will not be in the algorithm itself, but rather the page faults that the OS is receiving. Adaptive merging, proposed in this paper, takes the advantages of database cracking, such as incremental indexing, and applies it to block-access storage databases by using a form of merge sort rather than a form of quick sort.

The main concept behind adaptive merging is to start off with a set number of sorted partitions. One can think of this as the step just before merging in merge sort where there are two partitions that are sorted but not yet merged. Then, which each query, the algorithm will look through the partitions for the specified range and create a new sorted partition or updated an existing already sorted partition.

The paper does a good job of illustrating the advantages that adaptive merging has over database cracking. It describes when database cracking performs very well and when it does not. Furthermore, the concept proposed is not just theoretical. The authors ran a prototype and compared the performance of adaptive merging with the performance of database cracking. However, this paper does have a few drawbacks:

1. It does not thoroughly explain how inserting and updating rows will affect the performance of adaptive merging. Two possible ways to insert values are proposed, but the upsides and drawbacks of each method are not analyzed.

2. For both adaptive merging and database cracking, nothing was explained regarding empty tables. If a table is currently empty and an index is placed on some column, what would be the best algorithm to use on that column as items get inserted? Should the DBMS switch algorithms after the table reaches a certain number of rows?

Review 25

Data warehouses contain an immense amount of data such that it is beyond human capacity to easily select the most efficient indexing strategy. Therefore, indexes must be selected and tuned automatically. Past methods of automatic tuning involved methods such as analyzing incoming queries then specifically creating an appropriate index or reviewing past queries and using system down time to create new indexes. This paper introduces adaptive merging as an efficient way to efficiently index records in an incremental fashion as sequences of queries are run against the database.

The benefits of adaptive merging are:

1) Indexes can be fully optimized over a much smaller number of queries than needed in a scheme like database cracking.

2) Only records that are actually queried will ever be moved within the underlying structure, eliminating overhead from moving untouched records.

3) The fan-in to the merge step is limited only by the number of runs that can fit in memory, so the number of steps required to get each record to its final place is very low (often just 1). This provides a significant improvement over database cracking, which was limited by the number of new partitions added during each new query.

4) Adaptive merging can be implemented using partitioned B-trees, which can be used in all cases where a traditional B-tree can be used.

This paper makes a strong case for the use of adaptive merging as a method of automatic index tuning. The authors include a great deal of data that shows the performance benefits of adaptive merging over database cracking when varying the number of queries in a sequence, percentage of records touched by each query, memory allocation, and query focus. They show that in all of these cases, adaptive merging creates a fully optimized index much faster than database cracking, suggesting that this is a much more efficient and adaptive strategy.

While the data in this paper offers a great deal of evidence that adaptive merging is a more powerful strategy than database cracking, this evidence is based on simulations rather than actual comparisons between the two systems using a representative database design and workload. Judgment on the benefits of adaptive merging should be reserved until such a time as these comparisons can be made. Additionally, I would have liked to see how other strategies for index tuning compared to the proposed method. While the similar goals of adaptive merging and database cracking make for easy comparisons, it would have been nice to see some data about the objective efficiency of query sets using adaptive merging.

Review 26

The paper proposes an adaptive, incremental, and efficient technique for index creation called adaptive merging. Nowadays, the relational databases are likely to contain many tables. Therefore, indexing is more and more important. Adaptive indexing focuses on adapting the physical database layout by actual queries. Database is one of the techniques of adaptive indexing. After numerous experimental results, the paper concludes that database cracking and adaptive merging offer a promising alternative to traditional index tuning that relies on monitoring.

Review 27

This paper describes a database optimization called “adaptive merging” which is an index creation technique that has strong performance and is better than “automatic index selection” and “database cracking”.

Adaptive merging takes advantage of database cracking’s technique of updating as you go along, mixed with partitioned B-Trees merging to allow for faster optimization. This is because merging has a “high fan in” (in the 100’s) and partitioning has a low “fan out” (2 or 3) so a fully sorted and optimized index is created faster. This algorithm initially makes a B-Tree and partitions it so that each of the partitions is sorted. Then as queries come in the selected tuples from each partition are merged into a new, sorted, partition of that range. Then as more and more queries are executed the total sorted partition becomes larger until you have a full-sorted index. I think this is a good idea but if the final end goal is a sorted index that feels like a task that could be done in the background and that would give you a full-sorted index while having to do a smaller number of queries. Lastly, of course adaptive merging supports transactions and updates, as you would expect.

The performance metrics of Adaptive merging were to no surprise better than database cracking. They were significantly better in record count across the board for all of the scenarios tested. This makes sense to me, as it is a smart idea to use the merging after having already partitioned and take advantage of those merges. I see no drawbacks from adaptive merging and it seems like a clear upgrade from database cracking and something that should be adopted for all databases.

This paper describes a database optimization called “adaptive merging” which is an index creation technique that has strong performance and is better than “automatic index selection” and “database cracking”.

Adaptive merging takes advantage of database cracking’s technique of updating as you go along, mixed with partitioned B-Trees merging to allow for faster optimization. This is because merging has a “high fan in” (in the 100’s) and partitioning has a low “fan out” (2 or 3) so a fully sorted and optimized index is created faster. This algorithm initially makes a B-Tree and partitions it so that each of the partitions is sorted. Then as queries come in the selected tuples from each partition are merged into a new, sorted, partition of that range. Then as more and more queries are executed the total sorted partition becomes larger until you have a full-sorted index. I think this is a good idea but if the final end goal is a sorted index that feels like a task that could be done in the background and that would give you a full-sorted index while having to do a smaller number of queries. Lastly, of course adaptive merging supports transactions and updates, as you would expect.

The performance metrics of Adaptive merging were to no surprise better than database cracking. They were significantly better in record count across the board for all of the scenarios tested. This makes sense to me, as it is a smart idea to use the merging after having already partitioned and take advantage of those merges. I see no drawbacks from adaptive merging and it seems like a clear upgrade from database cracking and something that should be adopted for all databases.

The one main negative I noticed from the paper was that it failed to point out any negatives in this algorithm. Surely there is a downside (I didn’t spot it however) and that is something that should be pointed out in a paper. There would likely be much more good than bad with this algorithm but the bad should be at least mentioned somewhere in the paper for increased credibility.

Review 28

The process of creating indices (e.g. with respect to certain computed columns) for databases with large amounts of data remains a critical problem for DBMS’s. In order to utilize the power of efficient indexing, one must carefully consider the context in which indices are used to minimize the number of queries or scans on large regions of data, and to discern queries that lend themselves to index optimization from so-called “ad-hoc queries.” Adaptive merging is an incremental method to efficiently create indices on large database structures, based on the existing structure of partitioned B-trees.

The optimization used in adaptive merging relies on the actual keys resulting from actual queries to the dataset. This way, when query patterns begin to arise, adaptive merging exploits the fact that it has been incrementally building the index from previous queries, making it more responsive to new input data. Compared to database cracking, the overhead incurred by the adaptive merging process is orders of magnitude smaller in most cases, and is convenient for externally disk-stored data.

While the algorithm itself is presented to be useful in many ways, there are several things that would have made the paper even more convincing than it already is:

(1) Discuss failure modes of the algorithm; I would be very impressed and surprised if there were no corner cases, or specific scenarios in which this algorithm was not the best case scenario.

(2) Comparisons aside from just database cracking and adaptive merging should be shown, i.e. comparisons to index tuning.

(3) Descriptions about the overhead incurred by generating multiple B-trees, and by the table of contents. Much of the description about the inefficiencies of adaptive merging are glossed over in this paper.

Review 29

This paper proposes an alternative in solving the index tuning problem in relational data warehouse using adaptive merging technique while also doing a comparison with the ore well-known “database cracking” technique. Typically, because data warehouse is subject to different kind of search query for data analysis activities, it has a large number of indexes. Despite its helpfulness, too many indexes results in high update costs. Therefore, index tuning is needed to find the best combination of indexes so the search will be faster and easier. Adaptive index is considered the best, with “database cracking” as one of the most commonly practiced. This paper proposes a new approach in doing adaptive index tuning: adaptive merging. This paper also explains performance test results of the two techniques.

Although the two techniques have the objective of creating an automatic and adaptive index selection (and both are based on B-tree), database cracking relies on partitioning while adaptive merging uses merging. In database cracking, keys in cracker index are partitioned into unsorted and disjoint key ranges. Then, every time there is search ongoing for certain range of queries, the cracker index are partitioned (and sorted) based on the search query, and so on. The result is a cracker index with many partitions. On the contrary, in adaptive merging, for every query, the cracker index is merged according to the search term and then partitioned, so in the end there are not many partitions in cracker index.

What I like about this paper is that it highlights the suitable implementation for each technique. Adaptive index is “tempting” and very useful, but in this paper reminds us that database cracking technique is designed for in-memory arrays, thus is not very supporting for data warehouse which typically consists of large data records placed in “external” block access storage. Using database cracking in this environment means it will require more data moves compared to adaptive merging. Since adaptive merging relies on merging operation, the data is sort/indexed per each query. If an index is to be used for the second time, it will only require access to the previously merged key range (no joining partitions required).

However, records in data warehouse, although they keep growing, are usually of the same nature. In data analysis, while initially there would be trial-and-error on how to look at the data from many perspective, usually after sometime it would finds its own pattern. In this case, is database the performance of database cracking and adaptive merging would differ very much?

Review 30

The main contribution of this paper is to introduce a modification to optimized indexing that is more efficient than the well-explored database cracking methods, and that is suitable for use in very large databases and in databases that utilize block-access storage.

The authors introduce adaptive merging which differs from Database cracking in a similar way that mergesort differs from quicksort. (Adaptive merging relies on the use of merges rather than partitioning around pivot values). The purpose of this paper is to introduce a new method for index optimization. They iterate through the weaknesses of database cracking as well as other older methods for index optimization and then they improve upon these earlier methods by overcoming said enumerated weaknesses.

The strengths of this paper are due to its careful examination of previous work and developed methods and its subsequent incorporation of the components of these methods that work well and the exclusion of those that the authors do not believe work well to achieve their specific goal. I also think that the authors do a satisfactory job in justifying their design choices, including choices of structures. They offer a thorough examination of partitioned B-trees and explain why they are a very logical (and efficient) choice to exploit in the performance of adaptive merging that is necessary for their new indexing optimization.

As far as weaknesses go, one thing I will quibble about is the “Variations” section. This is really a re-branded future works/possible extensions section, and I was very thrown off by its placement in the paper. This section should, as it typically does in such papers, come at the end of the paper to not interrupt the presentation of the core material of the paper. Why throw in future possibilities before even discussing the empirical evaluations of the basic structures?

Additionally, in the results section, and for example in the discussion of Figure 8, there is no accounting for the large spikes in overhead that are present in this graph (and no account for the spikes in the following graphs on the adaptive merging curves). For all of the graphs, the titles are more or less pointless, and the reader has to hunt down the figure reference in the text to gain any sort of insight into the interpretations of the graphs and the results they depict.

Review 31

This paper points out that index selection for very large relational data warehouse is very hard for physical design, and then proposed a self tuning indexing technique: adaptive merging to address this issue. Adaptive merging is an adaptive, incremental and efficient technique that use side effect of external sorting to auto-generate the index with upcoming queries. It has many advantages towards it precedents, especially comparing with Database cracking:
1) It is a self-tuning, little overhead, incremental optimized, fast convergence indexing technique
2) It is not constraint itself as an in-memory indexing method (as database cracking), but also can be applied to many block-accessed indexes, including B-Tree and its variants in multidimensional indexing as well as hash-based indexing.
3) It is perfect compatible with existing transaction techniques that make it a universal applied optimization technique.

This paper pays special attention on its novelty. it compare adaptive merging to database cracking in many aspects, and also introduced previous researches on indexing and show that they are not relevant. It also designs a complete experiment to measure its performance gain toward database cracking, further proving its novelty as well as its effectiveness.

1. This paper proposed a novel and effective incrementally optimized index method: adaptive merging.
2. This paper introduce the database cracking and adaptive merging with a consistent example, which helps its reader on conceptually understanding.
3. This paper designs a complete experiments, which shows great improvements of adaptive merging towards database cracking.

1. Although this paper provides a complete comparison experiment between database cracking and adaptive merging, it would be more convincing if this paper could provide other normal indexing technique to show the improvements. However, this weakness can be dropped if this comparison has been done in previous research, i.e. in Database cracking.
2. There might be a minor mistake on the example of database cracking in figure 2 based on my understanding. For the bottom box, since the last query key range is f through j, I think g in the second partition should be associated with f in the third partition since it is alphabetic after f. so instead of “bcaa, edge, f, hjii, noyulzutwvokmrpxsi” it should be “bcaa, ede, fg, hjii, noyulzutwvokmrpxsi”.

Review 32

This paper proposed adaptive merging, an efficient technique for adaptive and incremental index creation. It has the same idea as database cracking of incrementally building index and manipulating records accordingly, but it outperforms database cracking as it adapts more quickly and works well in disk-based databases.
Database cracking incrementally constructs adaptive index by partitioning a column that has query predicate. The keys in cracked index are partitioned into disjoint key ranges and unsorted within each. Every time a query comes, the index is further partitioned by the end points of queried range. Though it performs well in memory-based databases, database cracking takes many steps to reach the final representation of key range. It also has fundamental problem of applying to disk-based systems as the the partitioning requires whole column to fit into memory.
To overcome the weakness of database cracking, the author developed another technique by achieving the same goal in the opposite direction. Instead of making the index partially sorted through partitioning, the proposed adaptive merging incrementally merges partitions with the help of partitioned B-trees. This merge sort like bottom up approach has higher number of fan-in than the fan-out of database cracking, thus reaches final state more quickly.
The first step of adaptive merging is to divide the whole list of index into partitions and sort each internally. Once a query comes with predicates of a key range, it picks the index in the key range from all partitions and combine them into a new partition. It also combines the newly added partitions when necessary. By doing this the partitions of initial runs are shrinking and the more informed new partition with all values in a specific key range are growing quickly.
This paper is a very smart theoretical research as it solved the same problem from an opposite direction compared to database cracking and eliminated some of its weakness. However, there is lacking of experiment running on real system which can justify its claimed advantage in disk-based databases.