Review for Paper: 4-R-Trees: A dynamic index structure for spatial searching

Review 1

Traditional, single-dimensional indexing methods are poorly suited for geographic data or computer-aided design data, which are often queried for overlapping spatial regions. Single-dimensional indexes, such as the B-tree index, require time-consuming methods like inner loop programming to find rectangles that overlap a target rectangle in multi-dimensional space. In "R-Trees: A Dynamic Index Structure for Spatial Searching," Antonin Guttman proposes a data structure called the R-tree as a solution.

The R-Tree is similar to a B-tree in topology, but each node is keyed by a bounding rectangle instead of a scalar value, and child nodes have bounding rectangles contained in their parent's. Guttman's suggested algorithms for inserting and deleting from an R-tree keep the tree depth-balanced, and encourage the tree to store similar bounding rectangles near each other. As an example of a node-grouping heuristic, Guttman suggests, when inserting a tuple into an R-tree with its corresponding rectangle, a parent node be recursively selected, starting from the root, whose rectangle requires the least enlargement to contain the new rectangle. This greedy method for choosing the parent node for a new entry tends to produce trees where similar bounding rectangles are grouped together. Similar rules are used for splitting R-tree nodes.

The main contribution of the paper is the novel data structure for multi-dimensional spatial data, the R-tree, with associated algorithms for updating an R-tree. R-trees allow rectangles overlapping a target rectangle to be found in logarithmic time in some cases, similarly to how B-trees allow efficient search for scalar values. Difficult issues regarding how to decide where to insert a new entry, or how to split a node when it would otherwise contain too many entries, are dealt with using suitable heuristics.

Unfortunately, R-trees offer fewer guarantees than their one-dimensional cousins, B-trees. In an R-tree, it may take linear time to find all entries whose rectangle overlaps a given rectangle, even if there is only one such entry, because the search must check all overlapping child nodes of a given node for matches, rather than following just one path from root to leaf. In addition, there is no guarantee that an R-tree will produce the optimal grouping of entries, such as the grouping that minimizes the area of the union of its internal nodes' bounding rectangles. The suggested R-tree algorithms use heuristics that may produce sub-optimal tree structures.



Review 2

This paper deals with the introduction of a new structure R-Trees, for storing spatial data objects efficiently. Spatial objects such as map coordinates which are multi-dimensional require a data structure that can easily access range of points swiftly. For this purpose, a multitude of data structures were researched on such as Cell methods, Quad trees, K-D-B trees, Corner Stitching and Grid file but all of these were unable to address the problems related to to spatial object storage.

R-Tree is a height balanced tree which contains the indexes in its leaf nodes. Its notations are explained in full detail in the paper with pictorial depiction of the structure hence formed. The paper then sheds light on various algorithms required for data management task such as insertion of records, deletion of records, updating records and searching them. The different algorithms used for the access and modification of records are as follows –

a. Algorithm Search – The algorithm searches the R-tree for a rectangle with the help of other algorithms defined below.
b. Algorithm ChooseLeaf – This algorithm selects a leaf in the tree.
c. Algorithm AdjustTree – This algorithm adjusts the covering rectangles when they are split during an insertion, deletion or update of a record.
d. Algorithm Delete – Removes index records from the leaves.
e. Algorithm FindLeaf – This algorithm searches for a leaf in the tree and returns it.
f. Algorithm CondenseTree – This algorithm eliminates the node if it has too few entries and then relocates them.

The node splitting during an insertion, deletion or update of a record deals with three algorithms –

a. Exhaustive algorithm – This deals with the generation of all possible groupings for a task and then choosing the best one out of it.
b. Quadratic-Cost algorithm – This algorithm finds a small-area split within the tree but is not responsible for finding the smallest one.
c. Linear-Cost algorithm – This algorithm finds the small-area split as well but it uses linear computations instead of quadratic ones.

After outlining the terminology related to R-trees, the author substantiates his support of the structure with certain performance tests. These tests are done on a Vax 11/780 machine, the famous consumer computer in the 1980s with UNIX Operating System and the R-trees coded in C. Different aspects such as CPU cost of inserting records as well as deleting them, search performance of pages and its CPU cost and storage efficiency were tested using the three algorithms of node splitting. It was deduced from the findings that R-tree structures were practical and useful for the hypothesis presented and should have been considered by the DBMS people.


Review 3

This paper want to handle spatial data efficiently supporting search, insert, and delete. The traditional indexing methods are not well suited to data objects in multidimensional space such as geographical information systems, circuit design, or even image process.

The intuition of R-tree is grouping nearby objects and representing a set of multidimensional geometric object by a collection of minimum bounding rectangles. Each internal node corresponds to rectangles bounds its children, and the leaves of the tree contain pointers to objects. Similar to a B tree, the nodes are implements as pages(the author suggest the size should be 50) which is designed for storage on disk, and the R-tree is also a balanced search tree. It is designed so that a spatial search can be fast.

Like the most tree, the searching algorithms are simple. However, the efficiency is not guaranteed because a spatial search may visit multiple subtree to confirm existence. This issues comes from two problems: the possibility of overlapping and covering too much empty area. In this paper, this issue is "controlled" in the inserting algorithm. The author proposes a greedy insertion algorithm: a entry is always insert into the subtree that requires least enlargement of its bounding box. On the other hand, in order to add a new entry to a full node, it's necessary to divide the entries into two nodes. The division should be done so that it unlikely that both new nodes will need to be examined on the following searches. It present an algorithm for partitioning into two nodes. Besides exhaustively find the optimal partition which is constant depending on the size of node, they provides a quadratic cost algorithm: find two "farthest" nodes as two initial node for two group and classify the remaining nodes one at a time and requires at each step the area expansion to be small.

The rest of the paper give a series of performance tests to show this structure performs well. However doesn't give theoretical arguments which makes me feels incomplete. It's not hard to see that there should be tons of way to improve with those two problem: overlapping and empty area. Moreover there should be some theoretical guarantee.



Review 4

Blah


Review 5

This paper provides a description of a dynamic index structure for spatial searching called R-Tree. It is very helpful to handle spatial data efficiently. This paper describes the structure, algorithms for searching and updating, and performance tests for R-Tree. The test results shows that R-Tree performs well and it is highly recommended to be used in current database systems for spatial applications.

The problem here is that there is no good structures for spatial data, so there is no efficient way to do operations on spatial data like searching in an area. The spatial search is very important in Computer Aided Design (CAD) and geo-data applications. There are many structures, but they all have different kind of problems. For example, classical 1-d database indexing structures are not efficient for searching in multi-dimensional spatial data, structures based on matching like hash tables is not suitable for a range search, and etc. Therefore, finding a good structure for spatial data that supports efficient search is very important, and this paper represents such good structure called R-Tree.

An R-Tree is a height-balanced tree (similar to B-Tree) with all index records in the leaf nodes. The index record contains pointers to data objects, and it contains an n-dimensional rectangle and tuple-identifier. Non-leaf nodes contain entries that has a rectangle covering all rectangles in its child’s entries and child-pointer points to the lower node.

The main contribution of this paper is obviously the design of R-Tree, an incredibly efficient structure for spatial data. This paper describes the structure and algorithms in detail, which is clear and easy for readers to understand. Also, it provides with plenty of performance tests, which guarantees it works and provides analysis on efficiency, memory cost, CPU cost and etc.

An interesting observation: in general, this paper is good for describing a useful structure R-Tree and provide detailed performance tests. One suggestion in performance test is that more data samples in different aspects should be tested. In this paper, it mainly does all tests for one set of data. It performs well but not very representative for all different kinds of data. Also, it would be great if detailed investigation on how R-tree can be easily added to RDBMS is provided.


This paper provides a description of a dynamic index structure for spatial searching called R-Tree. It is very helpful to handle spatial data efficiently. This paper describes the structure, algorithms for searching and updating, and performance tests for R-Tree. The test results shows that R-Tree performs well and it is highly recommended to be used in current database systems for spatial applications.

The problem here is that there is no good structures for spatial data, so there is no efficient way to do operations on spatial data like searching in an area. The spatial search is very important in Computer Aided Design (CAD) and geo-data applications. There are many structures, but they all have different kind of problems. For example, classical 1-d database indexing structures are not efficient for searching in multi-dimensional spatial data, structures based on matching like hash tables is not suitable for a range search, and etc. Therefore, finding a good structure for spatial data that supports efficient search is very important, and this paper represents such good structure called R-Tree.

An R-Tree is a height-balanced tree (similar to B-Tree) with all index records in the leaf nodes. The index record contains pointers to data objects, and it contains an n-dimensional rectangle and tuple-identifier. Non-leaf nodes contain entries that has a rectangle covering all rectangles in its child’s entries and child-pointer points to the lower node.

The main contribution of this paper is obviously the design of R-Tree, an incredibly efficient structure for spatial data. This paper describes the structure and algorithms in detail, which is clear and easy for readers to understand. Also, it provides with plenty of performance tests, which guarantees it works and provides analysis on efficiency, memory cost, CPU cost and etc.

An interesting observation: in general, this paper is good for describing a useful structure R-Tree and provide detailed performance tests. One suggestion in performance test is that more data samples in different aspects should be tested. In this paper, it mainly does all tests for one set of data. It performs well but not very representative for all different kinds of data. Also, it would be great if detailed investigation on how R-tree can be easily added to RDBMS is provided.


Review 6

In this paper, Guttman introduces R-tree, which is a dynamic index structure that provides efficient mechanisms for storing, searching, updating, deleting spatial data objects. Spatial data objects are multidimensional, non-zero size object (like countries in a map) which are frequent in CAD tools.

R-tree is a height-balanced tree, and leaf nodes contain pointers to actual data. Although the objects being stored in R-tree are relatively complicated because they can be multidimensional, it is really interesting that this structure is completely dynamic and doesn’t need any periodical reorganization. More over, R-tree is designed in a way that spatial search needs visiting a small number of nodes.

A considerable part of the paper belongs to describing different algorithms. I did not go to detail of the algorithms but they do not seem very straightforward (and we should not expect them to be!). For some functions, more than one algorithm is suggested. Performance of different functionalities and algorithms for them (if possible) in several settings are studied and the overall performance seem good.

The issue with this data structure, based on my understanding, is that there are parameters “m” and “M” which have not specified values and the optimal value in different situations and applications can only be determined by examination. This issue might limit the usability of this index-structure.




Review 7

This paper focused on the fact that the relational model is not very good a doing 2-dimensional (or higher dimensions) search. Instead of throwing out the relational problem entirely, and proposing a new model, the authors realized that the real problem is the lack of indexes to allow for quick retrieval of data. The two most used indexing schemes in most databases are B+Trees and hash maps. Hashmaps are only helpful for equality searches, so are useless for GIS applications. B+Trees allow for indexing on one dimension, but can't speed up 2-dimensional queries. The authors propose a new type of index, R-Trees, that they believe will solve the problem. The idea that the authors propose is very similar to B+Trees in that the index is a height balanced search tree. However, instead of the internal nodes keeping [*pointer*, value, *pointer*, value, *pointer*...], they keep a "rectangle" that contains all the points underneath them in the tree. This means that each parent node has a rectangle that contains all of its children. So, if a query matches with a leaf node, it must match with that node's parent as well. This allows geographic data to be queried by searching the R-Tree recursively, until there are no more nodes to examine.

The paper did a great job of explaining what options were currently available, and why they did not work for their use case. From their, they showed how they could add this new type of index to the relational model to get the performance that they desired. They did a great job of extending, instead of reinventing. They stuck with the relational model, and created an index very similar to B-Trees - they just modified it to fit their needs, instead of scrapping all the work done by previous engineers. They also thoroughly tested and verified their idea - they demonstrated the R-Trees are very effective, and even tested out different tuning parameters on their algorithm to maximize performance.

I found the node splitting portion of this paper to be the weakest. This is also one of the areas that would affect performance of the system the most. The authors show in their paper the difference between good and bad splitting, and argue that their quadratic and linear time splits workout to roughly the same overall performance. However, other methods, such as R* Trees, have shown much better results at limiting page overlap. In all fairness, though, this was something that was discovered after Guttman published the original paper.



Review 8

The problem addressed by the paper is that the traditional indexing is limited to handle only one dimensional data, but not data in multi-dimensional spaces. Thus a new algorithm is needed for spatial data indexing. The problem is important because the quick retrieval of spatial data is needed in fields like CAD and geo-data applications. A new indexing way which supports spatial data can improve the performance a lot.

The main approach proposed by the paper is to design a dynamic index structure call R-tree to meet the needs of spatial applications.
i) R-tree is similar to B-tree, while one node can have more than two children and the containment of each node has overlaps.
ii) Since it is a tree, the logistics of basic methods like searching, insertion and deletion are similar to a regular tree. The main difference is that a R-tree does not comparing the value of nodes, but check the overlaps between two index entries.
iii) The special part of R tree is node splitting. A good splitting could maximize the area of covering rectangles so that the searching scope can be maximized with smallest tree. The paper provides three ways to find a good splitting. Exhaustive algorithm is most easy to understand. It can find the best splitting while it is not efficient at all. A quadratic-cost and a linear-cost algorithm aims to find a “good” solution, but not a “best” one. Both algorithms pick the entries one by one to enlarge the total area.

The main technical contribution of this paper is that it creates a new data structure to solve the indexing problem of spatial data efficiently. The paper provides the algorithm of all the related operations of the data structure step by step so that it is very clear for readers.

This paper shows a new data structure, related algorithms and detailed performance tests. It is rich in content. So I do not find any drawbacks of the paper.


Review 9

The paper proposed R-tree index structure for spatial searching in database systems. Traditional index structure, such as B-trees only works with data in one dimension, and are not appropriate for spatial searching. Other structures such as Cell method and Quad trees have there own disadvantages while dealing with dynamic structures or paging of secondary memory.

Basically, R-tree is like a B-tree in the structure point of view, it also has two kinds of nodes: leaf nodes and non-leaf nodes. The leaf nodes stores a key value as an n-dimensional rectangle with a tuple identifier referring to the tuple in the database. And non-leaf nodes also stores a key value, with a child-pointer pointing to the child node. Just as B-tree, R-tree also has maximum and minimum child limitations on each nodes, and has split and condense mechanisms as B-tree.

The paper gave all the algorithm details about R-tree including searching, insertion and deletion, and proposed three algorithms for optimum node splitting, while one has complexity linear to the number of dimensions.

A performance comparison among those three algorithms is also given, and showed that the linear algorithm presents the best trade off between high speed for splitting and high speed for search performance. It is fast while not affecting much about searching performance.

One of the drawback of this structure is there can be overlapped area in sibling nodes in R-tree, which may result in redundant search.


Review 10

This paper introduces R-trees as a dynamic index structure for managing data indexes for efficient spatial searching. R-trees is a height-balanced tree that store boundary information of each variable of data object in its intermediate nodes to help organize data spatially. The main goal of this algorithm is to provide an efficient way to manage and index data in a way that can perform quick spatial searches.

Each leaf node of an R-tree contain pointers or identifiers to the tuple it is referring to, as well as the smallest boundary box for the n-dimensional data object it is representing. Each non-leaf node above the leafs contain pointers to its child, and the smallest boundary box that spatially contain the children below it. Therefore, it is possible spatially narrow down your search space as you move down the tree from the root.

While the organization and the idea of a R-tree looks promising, the evaluation of the paper is weak. The paper focuses only on comparing the performance of the different node splitting algorithms. While it would be important to identify the most optimal algorithm for node splitting to get the best efficiency with R-trees, as a paper that introduces R-tree, it is disappointing that the paper do not evaluate the performance and memory overhead of R-tree with respect to other algorithms/organizations for spatial searching.



Review 11

Focused on the problem that a database system need a effective method to retrieve multi-dimensional data items, such as spatial location, this paper propose an index structure called R-tree.

R-tree resembles B-Tree that they has the similar tree structure and leaves contains pointers to the data node. The main difference is the key and corresponding search and update method:
Key in the leaf node are array of scalar ranges represent an n-dimensional rectangle of the spatial object. In non-leaf node, key is an n-dimensional rectangle representing the smallest area that can contain all the rectangles in the children
search, insertion, delete and update is similar to that in a B-tree.
during node splitting, it needs an algorithms to make minimized the total area of covering rectangles for the two nodes after split. This paper provides a linear-cost algorithm for this purpose.

R-tree introduced in this paper has following advantages:
It produces a good performance for indexing spatial data objects.
It is easy to implement to the relational database at that time
It can be combined with abstract data types and abstract indexes and would work well for handling the spatial data in streamline.

Some drawback of the R-tree is that
its split has a relative high cost and it may be enhanced to avoid split if possible
covering rectangles in non leaf node may overlapping to one another, and it may result in inefficiency in search since more than one sub-tree under a node may need to be searched.



Review 12

This paper presents a data structure called R-trees, which is used to index data using n-dimensional rectangles as the key. This is useful when the domain of the data maps well to n-dimensional areas or space- for example, when trying to represent the areas occupied by building. Using an R-tree in this example allows quick searching for buildings that overlap or are contained by a target area. If a one-dimensional indexing method like a B tree were used for this problem, it is hard to eliminate many data points from the search, and so the search cannot be made efficient.

The R-Tree is arranged like a tree (duh), where each node has up to M entries, based on how many entries can fit in a memory page. Every entry has a key which is an n-dimensional rectangle. Leaf entries also have a pointer to a datum, and inner entries also have a pointer to a child node. The basic invariant is that every entry in a node must be contained in the n-dimensional rectangle of the node’s parent entry (except the root, which has no parent). In order to maintain balance in the tree, each node is also required to have at least m number of entries, where m can be tuned for performance.

Searching for entries that meet a criteria (like intersecting an area) is done by choosing entries in the root node that fit the criteria, then recursively searching down those nodes of the tree.

Insertion also proceeds down the tree, except that only one entry is selected per node- the entry that would need to be expanded the least, were the new data to be inserted in it. After traversing to a leaf node, a new entry is made which points to the new data. If a node overflows with entries during an insertion, the node is split in half and a new parent entry is added in the parent node (and the addition/splitting propagates up the tree as needed).

During deletion, the leaf entry to be deleted is found and removed. Then if the node does not have at least m entries anymore, the node and its parent entry are removed (and this removal propagates up the tree as needed). At the end of the operation, any abandoned data are re-inserted into the tree.

Using the right algorithm for splitting nodes, R-trees provide fast operations for indexing by spatial data, which can be leveraged by database systems in geo-data applications among others.



Review 13

The paper proposes a new database indexing structure, called R-tree, which is specifically designed for spatial data objects. The motivation behind the proposal is that the traditional one-dimensional indexing structures, such as B-trees and ISAM indexes, are not appropriate for search operations on spatial objects. For instance, range searches have to be performed multiple times on each one-dimensional indexing structure of the object. This is inefficient and slow. The paper also mentions other indexing structures for spatial objects and claims that most of these structures are not good because 1) they are not suitable for dynamic structure (Cell methods); 2) they do not take paging of secondary memory into account (Quad trees, k-d trees); 3) they are only useful for point data (K-D-B trees). However, the paper does not contain any performance comparison with any of these aforementioned structures. The performance comparison could have been a possible improvement of this paper.

An R-tree in a nutshell is a modified version of B-trees. It is a height-balanced tree and uses n-dimensional rectangle as a bounding box of a spatial object for its indices. The algorithms for search, insertion, deletion and update are similar to those of B-trees, where necessary modifications are made to accommodate its bounding-box indices. Some of these modifications do not seem to be justified well. For example, the discussion of node splitting in Section 3.5 talks about “bad split” and “good split”. The example of “bad split” has a larger covering rectangle than the example of “good split”, but the example of “good split” has an overlap between two split covering rectangles. It might seem that range queries will always perform better with the “good split” case, but the performance of range queries could be worse in the “good split” case if the query is lower-dimensional than n and involves the overlapped region.

In conclusion, the paper presents a novel idea (at the time) of the indexing structure for spatial objects. The important of handling spatial data has been ever increasing. The vast majority of people are carrying a smartphone with a GPS attached, meaning that available spatial data nowadays is huge and the need for indexing structures like R-tree is becoming more significant everyday. We also should note that how many of current database systems still utilize B-tree or its variants for their indexing structures. This simple fact should remind readers not to overlook the proposed indexing structure in this paper just because the paper is more than 30 years old.


Review 14

This paper introduced a new index structure called an R-tree, which is intended to be a more efficient indexing method when dealing with spatial data. Current indexing structures are not appropriate for dealing with queries on spatial data because either the search space for the data is multidimensional or because the structure uses exact value matching when a range search is necessary.

The paper first discusses several proposed structures for handling spatial data, but points out the flaws with each structure. An R-tree consists of a series of nodes each which contains a series of entries. Each entry stores a bounding rectangle and either a pointer to its child rectangle or a reference to a tuple in the spatial database. The key idea is to represent objects with a minimum bounding rectangle and store them in the leaf nodes of the tree.

The paper also provides methods for searching, inserting, and deleting nodes as well as two different algorithms for splitting nodes when new entries are added to the tree. An interesting observation is the delete operation reuses the insert operation and the update operation is simply a delete followed by an insert.

The paper remarks at the end that the linear node-split algorithm used for splitting produced worse splits, but these splits did not seem to affect search performance that much. However, it seems researchers are always looking for ways to improve upon algorithms to get just a little bit better performance. The low quality of the splits when performing the algorithm proposed in the paper, may be detrimental to search performance as data sets continue to expand in size. Therefore, if it hasn't already occurred, I expect to see more complicated node-split algorithms that are at least as fast as the one proposed in the paper, but produce better quality splits thus resulting in improved performance. Maybe the new algorithms will even produce splits with the smallest area possible, a property that was not guaranteed from the algorithms used in the paper.


Review 15

This paper introduces a dynamic index structure called "R-tree" that handles multi-dimensional spatial data efficiently. It provides algorithms for searching, inserting, deleting and updating the tree; and the (exhaustive, quadratic, and linear) node-split algorithms. With R-tree, the data objects are represented by intervals in several dimensions. Traditional data structures are not useful because we need a multi-dimensional, range-based search.

The algorithms are clearly explained step by step and the examples of applications of R-tree are also given in this paper. Several experiments are conducted over different parameter settings. The idea of using boundary values in different dimensions to represent tree nodes is very novel and useful.



This paper introduces a dynamic index structure called "R-tree" that handles multi-dimensional spatial data efficiently. It provides algorithms for searching, inserting, deleting and updating the tree; and the (exhaustive, quadratic, and linear) node-split algorithms. With R-tree, the data objects are represented by intervals in several dimensions. Traditional data structures are not useful because we need a multi-dimensional, range-based search.

The algorithms are clearly explained step by step and the examples of applications of R-tree are also given in this paper. Several experiments are conducted over different parameter settings. The idea of using boundary values in different dimensions to represent tree nodes is very novel and useful.


Review 16

This paper discusses the efficient handling of spatial data, which is required in computer aided design and geo-data applications. Traditional indexing methods don’t handle data objects of non-zero size located in multidimensional spaces well. Guttman suggests the R-tree index structure, which is easy to add to relational database systems that support conventional access methods and work well with abstract data types and indexes, solve this issue of handling spatial data.

The R-Tree is a height-balanced tree, which, like a B-tree, has leaf nodes that contain index records and pointers to data objects. R-Trees contain 6 main properties:
1. Unless it is the root, every leaf node contains a number of index records that is between half the number of and the maximum number of entries that fit in one node.
2. The n-dimensional rectangle bounding box of the spatial object index is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple.
3. If a node is not a root or a leaf, it has between half the number of and the maximum number of entries that fit in one node.
4. The n-dimensional rectangle bounding box of the spatial object index is the smallest rectangle that spatially contains the rectangles in the child node.
5. Unless a leaf, the root node has at least 2 children.
6. Leaf nodes all appear on the same level.

Searching in a R-Tree descends the tree from the root, similarly to a B-Tree. Search in a R-Tree does not guarantee good worst-case performance because more than one subtree under a node visited may need to be searched. Insertion in a R-tree is also similar to that for a B-tree because new index records are added to the leaves and overflow nodes are split, propagating up the tree. Deletion happens by removing the index record from the R-tree. If a tuple is updated and the covering rectangle is modified, its index record is to be deleted, updated, and re-inserted. Node splitting happens when a new entry is added to a full node, and is done so that the possibility of searching both new nodes in subsequent searches is small. The Exhaustive Algorithm, generating all possible groupings and choosing the best, is one way to do this. The Quadratic-Cost Algorithms aim to find a small area split but is not guaranteed to do so. The Linear-Cost Algorithm is similar to the Quadratic-Cost Algorithm but picks seeds differently.

The drawbacks on paper are that it does not discuss the use of R-trees on real-world examples. I would’ve liked to see how prominently R-trees are used now, and how well they have been performing in the real world applications.


This paper discusses the efficient handling of spatial data, which is required in computer aided design and geo-data applications. Traditional indexing methods don’t handle data objects of non-zero size located in multidimensional spaces well. Guttman suggests the R-tree index structure, which is easy to add to relational database systems that support conventional access methods and work well with abstract data types and indexes, solve this issue of handling spatial data.

The R-Tree is a height-balanced tree, which, like a B-tree, has leaf nodes that contain index records and pointers to data objects. R-Trees contain 6 main properties:
1. Unless it is the root, every leaf node contains a number of index records that is between half the number of and the maximum number of entries that fit in one node.
2. The n-dimensional rectangle bounding box of the spatial object index is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple.
3. If a node is not a root or a leaf, it has between half the number of and the maximum number of entries that fit in one node.
4. The n-dimensional rectangle bounding box of the spatial object index is the smallest rectangle that spatially contains the rectangles in the child node.
5. Unless a leaf, the root node has at least 2 children.
6. Leaf nodes all appear on the same level.

Searching in a R-Tree descends the tree from the root, similarly to a B-Tree. Search in a R-Tree does not guarantee good worst-case performance because more than one subtree under a node visited may need to be searched. Insertion in a R-tree is also similar to that for a B-tree because new index records are added to the leaves and overflow nodes are split, propagating up the tree. Deletion happens by removing the index record from the R-tree. If a tuple is updated and the covering rectangle is modified, its index record is to be deleted, updated, and re-inserted. Node splitting happens when a new entry is added to a full node, and is done so that the possibility of searching both new nodes in subsequent searches is small. The Exhaustive Algorithm, generating all possible groupings and choosing the best, is one way to do this. The Quadratic-Cost Algorithms aim to find a small area split but is not guaranteed to do so. The Linear-Cost Algorithm is similar to the Quadratic-Cost Algorithm but picks seeds differently.

The drawbacks on paper are that it does not discuss the use of R-trees on real-world examples. I would’ve liked to see how prominently R-trees are used now, and how well they have been performing in the real world applications.


Review 17

This paper is about handling spatial data efficiently for applications such as computer aided design and geo-data applications. I think this paper is important because the R-tree data structure that they develop for this purpose probably has other applications as well. It is not certain whether or not this method is still useful today, however, because the paper is 31 years old. The proposed R-tree is a height balanced tree similar to a B tree. The nodes are disk pages and the leaves are data objects. Basically, the parent nodes of a node spatially contain all children and it functions essentially the same as a B-tree.

The idea for this data structure implementation is a great idea for the applications it describes. It also mentions previous work which works for searching but not for dynamic applications tat require lots of inserts and deletes. The algorithms are fairly straightforward and easy to understand if you understand B trees except for the node splitting, which they attempt to resolve by two algorithms they evaluate; linear and quadratic cost algorithms.

The drawbacks of the paper are again related to the evaluation. A theme might be common in the papers I've seen so far that evaluation is difficult and what the author(s) attempt is not convincingly relevant to what should happen in the real world. They say that they evaluate this paper on the VLSI RISC-II chip layout and some other larger cell. This seems like a good application for CAD but is not an example related to the other application they mentioned (geo-data). They also talk about the number of searches they perform, and the number of inserts and deletes but do not talk about why they did these operations they way they did. The deletions are described as every 10th item and the searches were described as using random numbers. It would be better, though I realize probably more difficult to obtain, if they could get a repository of edits made to these CAD diagrams and run performance tests on that series of operations.


Review 18

Part 1: Overview

This paper brings up R-Tree, a dynamic index structure for spatial searching, for geo-data application. Multi-dimensional indexing is hard for most database manage systems at that time as the boundaries must be pre-defined. R-Tree is good for dynamic structured, non-point, large scale data. R-Tree stores data objects in intervals in several dimensions.

The major method here is to first address the problem in real application (geo data management) and then present a new algorithm and after that prove its upper and lower bound in terms of performance and finally do simulations to show its performance.

Part 2: Contributions

The main contribution is that the paper proposes a balanced tree structure good for dynamic indexing and embedded inserting and deleting in searching. Node splitting algorithms are also included to maintain a balanced tree for high searching performance.

Varies simulation settings are carried out in the performance test section. It shows that linear splitting algorithm is fast and did not affect search performance noticeably.

Part 3: Possible drawbacks

There should also be simulations to compare R-Tree with other proposed structures, such as K-D-B or Grid structures. In the performance test section, all the figures are to compare three different node splitting algorithms.

Some concepts should be defined clearer. It is hard to catch similar expressions like (I, tuple-identifier) and (I, child-pointer) quickly.



Review 19

This paper proposes a dynamic index structure called R-tree for efficiently handling of spatial data which is required in geo-data application and computer aided design. Classical one dimensional database indexing structures are not suitable to multi-dimensional spatial searching. Even structure based on hashing are not useful because it is difficult to do range search in multidimensional space. Structures such B-trees because, they are designed based on the assumption of having single dimensional search space. As such kind of traditional indexing structures fail in handling of multidimensional data, it is valuable to have an alternative indexing structure like R-trees. R-tree is a highly balanced tree like B-tree which index records in its leaf nodes. R-tree stores data objects by interval in several dimensions. The index stored in leaf nodes has a form of
(I, tuple-identifier)
where tuple-identifier stores pointer which point to the tuple in the database and I is an n-dimensional rectangle which contain the bounding box of the spatial object indexed as
I= (I0,I1,..,In-1)
where n is the number of dimensions and Ii describes the extent of the object along dimension i.

Different tree operations including searching , inserting and deleting are performed in similar with B-tree operation . The difference is the information stored in each leaf nodes is multi dimensional than a single key-value association.
One important technique described in the paper is node splitting. In order to add a new entry to a full node, it is necessary to divide the collection of entries between two nodes. An important observation addressed by the paper is to divide the node in a way that minimize the probability of accessing both news nodes on subsequent searches. As the decision to visit a node depends on whether it's covering rectangle overlaps the search area, the idea is to minimize the total area of the two covering rectangles after a split. The paper identified different important algorithms to do efficient node splitting
Generally, the algorithm is a good approach in handling the ever increasing important geo-spatial data. In addition, the fact that it can be easily integrated with existing relational database systems like INGRES and System-R is a plus.


Review 20

R-tree, a dynamic index structure
In this paper, a tree structure along with the algorithms are provided to make spatial searching faster, which is required in DBMS designed for softwares like CAD. (My examples: eBay, Google Maps, craigslist, yelp...) One example of such queries would be "get all shops within 20 miles of this address"

Classic one dimensional data structures are not appropriate. The structure we use for this purpose should support range search and multidimensional search. R-tree is proposed in the paper. Corresponding algorithms and test results are also presented.

R stands for rectangular. The key idea of this structure is to group nearby objects into their bounding rectangle and keep going aggregating rectangles into bigger sections. And all those rectangles are multi-dimensional with a maximum number of entities M. So that during search, one only need to decide whether to search in the subtree. The searching algorithm is thus trivial.

However, to keep the tree balance and make the rectangles cover as less empty space as possible. For insertion, the paper proposed an algorithm where if there is a new node contained by multiple rectangles, a rectangle is chosen to minimize enlargement. Splitting rectangles on insertion is another key difficulty.

This paper shows an effective data structure to accelerate spacial search. A lot of work has been done after that to improve manipulation algorithms based on the work of this paper.



Review 21

This paper proposed an index structure called an R-tree. It could work well on in retrieval on multi-dimensional spaces. Based on the traditional index methods couldn’t well represent multi-dimensional space by point location. The subsequent structures failed in memory allocation or storage types. The R-Tree becomes an alternative structure to solve the location search.

Basically R-Tree is a height-balanced tree. It stores the pointers in its leaf nodes, and present it record entries as (I, tuple-identifier). The main search algorithm based on spatial segmentation. The MBR (Minimal Bounding Rectangle) methods defined the space for each lead node. It search a shape of data object with first visited it root. Then go to the leaf which contain this shape by iteration. Insertion operation work with split operation when nodes are overflow. The operation is same as the B+ Tree operation. Deletion reverses operation of the insertion with condense tree. The splitting happened when one node has been full and need to insert a new entire. The Exhaustive algorithm obvious is the most intuitive and slowest one. The Quadratic-cost algorithm fixed the time, but is not guaranteed to find the optimal solution. Linear-cost algorithm complexity is same as quadratic.

After testing 3 methods of splitting algorithm. Linear-cost algorithm works well on large memory per page with inserting and deletion operation. Quadratic-cost algorithm has better performance with large M and big size pages.

R tree structure is very effective and widespread use. It has become the main structure of index for searching shape on the website. The paper has a convincing argument that database derived the data structure.

The Figure 3 1 (b) used to obscures the definition of R-tree search algorithm. It confused me because each node has a blank except its child node. Based on this figure. If the object is on the blank, the search algorithm will never find this shape.


Review 22

This paper explores a special indexing method for spatial databases since the traditional indexing methods are not well suited to data structures with multidimensional spaces. This dynamic structure called R-tree is explained and algorithms for searching and updating are specified as well.
The author defends this data structure R-tree with the idea that structures using one dimensional ordering of key values do not work because the search space in itself is multi-dimensional.
An R-tree is a height balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects.

Spatial databases consist of collection of tuples representing spatial objects and each tuple has a unique identifier tuple_identifier and the bounding box of the spatial object I=(I0,I1…In-1) where either of these values could be infinity since that value could be boundless.

The search algorithm descends the tree from the root in a manner similar to a B-tree. The update algorithm maintains the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space and examine only data near the search area. This algorithm also seems to use dimensional spaces to store the indexes as well in order to search it efficiently. The algorithm is dynamic because with every insertion and deletion it checks if the nodes can be optimized by splitting or by joining under-full nodes together where the area will be increased least right until the root.

Since searching for the least amount of area wasted while modifying index is too costly, the author suggests a quadratic cost algorithm that calculates the difference between the total area encapsulating all blocks and the area of the nodes themselves. Linear cost algorithm is linear in M (max amount of leafs) however, the search results were almost identical to quadratic split.

Search was insensitive to use of different node split algorithms and fill requirements so as a search structure, this structure succeeds for multidimensional database index searching. However, posing stricter node-fill requirements resulted in the nodes becoming under-full a lot more often and with more entries inserted.

More experimental data with a lot more ranges of values could have provided a better in-depth understanding of the performance of R-tree in terms of spatial data. It would also have been beneficial if the authors could have given more examples in terms of spatial data in the database. In summary, this algorithm definitely seems to be an effective way of being able to visualize, index and access multidimensional data a lot more intuitively.



Review 23

This paper introduces a new data structure called R-tree for DBMS to support search in spatial data. It is a height-balanced index tree, which is similar to b+tree. But each Node is representing a geometric space, with attributes specifying the boundaries of the geometric space and pointers to subnode which resides in the geometric space of its parent. Each node carries pointers to more than m subnode(m
According to the test data, this algorithm provides great performance in the spatial data manipulation, and the paper believes that this structure should be easily implemented in relational database and would well in in conjunction with abstract data types.

And I think this is the first algorithm that efficiently manages spatial data in a scalable way, it brings more functionalities to the relational data base. One concern about this algorithm is that since the entries under the same parent node can be intersected, what if all of them are all overlapped, would this affect the efficiency of algorithm?



Review 24

As the traditional one dimensional indexing methods are not well suited to data objects of non-zero size located in multi-dimensional spaces, the author introduce a dynamic index structure - R-tree. The author also provide algorithms for searching and updating it.

First, the author introduces the idea of R-tree which is a balanced-tree that all node in the tree represent a rectangle in spatial and all parent node’s rectangle will contain all the rectangle represented by child nodes. The leaf node will contain index record entries. This structure is designed for visiting less node when doing mutlidimensional data searching.

Then the author talks about the algorithms for the R-tree like searching, deletion and insertion, which I think is somewhat similar to B-tree. As the insertion will meet the problem of Node Splitting, the author then introduces three algorithms for Node Splitting. The idea for splitting algorithms is to minimize the total area of two new rectangles.
(1) Exhaustive Algorithm: Brute force way to split.
(2) Quadratic-cos Algorithm: Not guaranteed the best way, each step of the algorithm do a greedy way.
(3) Linear-cost Algorithms: Choosing the greatest normalized separation along any dimension pair.
Later, the author implement R-trees using these three splitting algorithms according different page sizes and page numbers for inserting, deleting and searching, etc.. From the experiment we can conclude that Linear-cost algorithm has best quality-cost ratio for splitting.

As the paper is to introduce R-tree for multi-dimensional indexing, the author doesn’t provide a experimental comparison with the indexing that widely used (i.e B-tree). From the paper, the tricky part for implement the R-tree is about how to deal with reducing overlapping and make the rectangle as small as possible, so I think some more algorithm should be designed other than splitting to achieve this goal.


Review 25

The paper presents a data structure known as an R+ tree. The purpose of the data structure is for searching multi-dimensional data. The canonical example used is map data. Users will want to search for objects or points in space. Objects are represented as "boxes" in space (ranges in K dimensions). Updating the location of an object in an R-Tree requires removal of the corresponding node and re-insertion with the updated data. For splitting nodes, the authors develop three schemes for splitting nodes: an exhaustive algorithm, a quadratic-cost algorithm, and a linear cost algorithm. The exhaustive algorithm is found to have bad scaling properties. When comparing the quadratic and linear cost algorithms, the linear cost algorithm is shown to have better overall performance properties. The authors close by asserting that the R-Tree is useful for storing multi-dimensional spatial datasets that require searching.

Overall, I found the paper to be compelling. However, I was hoping that there would be a comparison against other indexing methods. The authors make arguments for the strengths of the R-Tree but without a performance comparison they expect the user to believe this on trust.


Review 26

This paper proposes a new type of indexing for spatial data on relational databases. Spatial data is most commonly used in CAD programs and geolocation programs. This new type of indexing uses R-trees and can revolutionize the industry because of the following points:

1. There was no solution before R-trees to efficiently index spatial data. In order to index spatial data before R-trees, one would have to index each dimension with a B-tree index and do multiple checks. Some queries would also be impossible to do efficiently.

2. R-tree indexing can be added to any relational database system without rewriting any other part of the system. In other words, it does not cost companies much to include R-tree indexing and the benefits are non-trivial, so companies will be willing to adapt this. This is crucial because many new concepts are not implemented in industry because they are not cheap and practical, but this idea is.

The main idea and concept behind R-trees is very similar to that behind B-trees. Each leaf node is contained spatially within the parent node. Where it differs, though, is that each node can have multiple pointer to different children nodes with at most M pointers in each node. Using this structure, we can efficiently search, insert, and delete items.

However, there is still one limitation with R-trees: the update function takes a delete and another insert to complete. This is also true with B-trees, but unlike B-trees, the insert and delete take longer in R-trees. This will be a problem if update will be used very regularly in the program. Unlike trees, heaps can have very efficient updates, such as Fibonacci heaps, but very poor searching. If there is a way to combine the two data structures to improve the update, programs with tons of updates will run a lot more efficiently.



Review 27

This paper introduces R-Trees as an efficient means of indexing spatial and multidimensional data. While traditional database structures such as B-Trees and ISAM indexes are very efficient means of storing data that can be sorted numerically, lexicographically, or by some other one-dimensional comparison, they are not well suited for multidimensional comparisons. Therefore, queries that were intended to find all objects that overlap a certain area of land or that contain a specific point were exceedingly inefficient when run against databases using the aforementioned indexing structures. This paper suggests the R-Tree as an efficient solution to this problem.

An R-Tree is similar in a structure to a B-Tree, but each non-leaf node is represented as (I, child-pointer), where I represents the n-dimensional rectangle that contains all the bounding rectangles associated with the nodes in its subtrees. Each leaf node contains I, which is the n-dimensional bounding rectangle for the object defined in the associated record, and a tuple-identifier that points to a tuple in the database.

This paper presents a very strong argument. It begins by providing references to past attempts to solve the problem of indexing multidimensional data, indicating the faults of each previous implementation. It then proceeds to describe algorithms for inserting, deleting, and searching for records stored in an R-Tree. These algorithms are elegant and are quite similar to those used for managing the records in a B-Tree. The paper then describes three different algorithms for splitting R-Tree nodes. Test data showing the relative efficiency of each splitting algorithm across a range of page sizes, record quantities, and node filling requirements are presented as evidence for the efficiency of R-Trees.

The primary fault that I found with this paper was that, despite its wealth of graphs and test data, it failed to provide a direct comparison between R-Trees and older indexing methods. The first section of the paper is devoted to decrying the inefficiencies of prior indexing structures, yet the author provides no test data to demonstrate the superiority of R-Trees, asking us to take his conclusion for granted.


Review 28

This paper proposes a dynamic index structure, R-Trees and its related search, insertion and deletion algorithms. Traditional indexing methods, such as B-trees and ISAM indexes, are not well suited to data in multi-dimensional space. Therefore, the paper introduces the structure of R-tree and its algorithms.

Leaf nodes in an R-tree contain index record entries of the form (I, tuple-identifier). The R-tree indexing method can support efficient search, insertion, deletion and updating algorithms.

The search algorithm does not guarantee good worst-case performance because more than one subtree under a node visited may need to be search. However, in most cases, the tree will be maintained in a form that can eliminate irrelevant search regions; The insertion and deletion are similar to B-tree. The paper compares some performance between exhaustive algorithm, quadratic algorithm and linear algorithm. The linear node-split algorithm proved to be efficient by several testing results.

The R-tree indexing method is useful for data in multi-dimensional space, which is not supported by other traditional indexing methods. Most importantly, it would be easy to add R-trees to any relational database systems. For the above reasons, the paper proposes the R-tree indexing methods.



Review 29

This paper describes a structure called an R-Tree and describes algorithms for searching and updating it. The purpose of the R-Tree is to contain spatial data in databases and provide an efficient way to access and query it. This type of data is extremely important in computer aided design as much of that is saying do x with anything in this spatial range y.

Searching an R-Tree is simple as each child is contained entirely inside of its parent’s n dimensional range. So just analyze the nodes at the leaf and see if they are inside what you need. However, more than one node in a subtree might need to be visited so a good worst-case can not be guaranteed.

Insertion is similar to that of a B-Tree where new nodes are inserted into the leaves and if a page is filled it will propagate up a level and create new pages that aren’t full. For an insertion if a covering rectangle is changed and needs to be split (page filled up) it will split the covering rectangles up in a way that minimizes the sum of the areas. This algorithm is done in quadratic time (relative to M, which is the maximum number of entries that will fit in a node. There is also a linear cost algorithm with respect to M, however neither guarantees to find the smallest possible area.

Deletion works in the opposite way of insertion in that it will remove a node and condense the tree if need be. The condensation will need to do the reverse of the splitting.

The paper tested performance of the structure and algorithms and found that for insertion the splitting of the nodes did not contribute to much of the run time (which was good). This was determined by CPU time hardly increasing with page size changes. In terms of deletions the cost of deletion was largely influenced by the minimum requirement for how full a node must be. The larger the requirement the more often condensing and splitting was done (thus upping runtime), but the lower it is the more memory might be wasted.

In summary the paper was successful in demonstrating the abilities of an R-Tree and convincing that it should be added to databases as it is very helpful for spatial data objects.

One suggestion for change I would have liked to see in this paper is I would have liked a little more explanation on section 3.5.3 (A Linear-Cost Algorithm). Taking an algorithm down from quadratic to linear coast is significant and I didn’t fully understand how that was happening and I would have liked a little more depth explanation on that.



Review 30

This paper attempts to address a need for an efficient data structure for the purpose of storing and searching spatial data. In addition to “traditional” data types being stored, geolocation data and other data indicative of spatiality are being used more and more, and current methods of indexing, searching, and updating are not conducive to multi-dimensional (e.g. spatial) data. Many initial data structures propose for handling multidimensional data are either inefficient or slow, so the self-balancing R-tree is shown to be more efficient and lest costly by comparison.

The cleverness of the algorithm lies in the intuition of the 2-D ordering of objects. Instead of using some sort of direct hashing or indexing, the query element is compared to each node to see if there is overlap between the n-dimensional structures. An insightful figure is shown where each node is represented as a 2-dimensional rectangle, and each parent node’s rectangle either wholly or partially contains the rectangles corresponding to the child nodes. Many of the operations of the R-tree are similar to that of a B-tree’s structure, except they are tweaked for the comparison of multidimensional hyperrectangles. The ability to compute overlap efficiently is exploited in order to traverse the tree as needed for search, insertion, etc. Empirical tests were conducted to evaluate the comparative performance of their different algorithms on cycle costs.

One of the main drawbacks of the paper in our current context is that the sample data test size and scales for performance are nowhere near what would be standard in modern computing. For example, the number of rectangles considered is on the order of thousands, and the difference in storage space efficiency is roughly ten megabytes, which is practically negligible by todays computing standards. It is within the realm of possibility that the implementation and architecture of the machines being used for analysis affected the results of the studies. It would be interesting to see a comparison of this structure besides other structures on modern devices with larger datasets, or, on a different note, if any of their methods were parallelizable and seeing what effect that has on efficiency.


Review 31

This paper talks about R-tree, a dynamic index structure which represents data objects by intervals in several dimensions. The talk about R-tree cannot be separated with the need of having an index structure that accommodates multi-dimensional spatial searching. This paper discusses R-tree’s structure and algorithms as well as the result of R-tree performance test. The structure of R-tree is not that different with B-tree, with the exception that R-tree could accommodate spatial data as spatial object For example, in traditional B-tree, 2-dimensional spatial data would be stored as 2 separated attribute, but in R-tree it would be stored as one object that consists of 2 attributes. Further, this paper explains R-tree algorithm in searching, inserting, deleting, and updating – along with more detailed algorithm for each (i.e.: ChooseLeaf, AdjustTree, CondenseTree, etc). Next, the paper explains algorithm for node splitting using three types of algorithms: Exhaustive, Quadratic-Cost, and Linear-Cost. Last, result of performance test gives a rough proof of the effectiveness of each node splitting algorithm.

This paper provides an alternative of the spatial data problem that was faced at that time. Using R-tree, it is possible to search using range as predicate without all the hassles of the 1-dimensional data structure. Indeed, R-tree has been proven to be useful for indexing spatial data object and path the way to more advanced improvement such as R*-tree. In addition to that, this paper also mentions that R-tree would be easy to add to any relational database system that supports conventional access methods.

Another important thing is the node splitting algorithm. Node splitting is crucial in R-tree because R-tree needs to maintain the data from overlapping (the division should be done in a way that makes it unlikely as possible that both new nodes will need to be examined on subsequent searches), which is also why the performance test focuses on the three node splitting algorithm that are proposed previously. This paper suggests that Linear-Cost Algorithm as the best alternative for node splitting, knowing that it is fast and it does not affect search performance noticeably despite the lessening quality of the split.

However, this the performance test uses three different M and m values. I see how m value affects the search performance. If so, then how do we determine the optimum M and m value? Another thing is that the Linear-Cost algorithm, although fast, results in worse node structure from the split. The performance test uses Circuit Cell CENTRAL that consists of 1057 rectangles, but I assume it was static data. How does Linear-Cost algorithm perform for dynamic data?



Review 32

The problem addressed in this paper is that, before its publication there was not an efficient way to handle multidimensional data for database queries. The paper does run through other structures that have been utilized to represent multidimensional data in the past and rules them out one by one.

This paper is the seminal paper on R-trees. Its main purpose is to introduce a new data structure that allows for the efficient indexing and search of multidimensional (especially spatial) objects such as geographical coordinates or rectangles.

The contributions of the paper are a new R-tree structure and its accompanying basic algorithms. The paper presents algorithms for basic search, insertion, and deletion and thus basic manipulation of the R-trees. The author also mentions possible extensions of the search algorithm which could be useful for certain types of database queries.

The paper presents some empirical evaluation of the various components of the algorithms to identify the performance bottlenecks as well as to tune some of the parameters (e.g. m, the minimal branching factor). It also presents empirical evidence to accompany the basic theoretical evaluation of the algorithms. This is a good justification of the performance of the algorithms on real-world systems. The author presents the design specifications of the system as well as assumptions that are made in the evaluation stage very clearly.

I found there to be a few drawbacks in this paper. While i do think that the graphs presented are explained and the trends are justified, there are several design decisions within the algorithms sections that are presented without explanation or justification. One example is the QuadraticSplit algorithm to determine the tie-breaking between the entries. There seems to be an arbitrary order of tie-breaks and there is no discussion of these design choices in the body of the paper. The paper additionally presents both a quadratic and linear-cost algorithms for picking seeds. However, neither of these designs decisions are justified before they are presented, though there is some discussion on the performance in the results section. Overall, I found that the discussion of the algorithmic design choices in the paper to be lacking.



Review 33

This paper proposes a dynamic index structure called R-Tree that can handle spatial data efficiently. Spatial data, especially multi-dimensional data are poorly supported prior to R-Tree, but have some very important applications such as geo-data or CAD. R-tree address this issue by using structure that is similar to B-Tree but with its non-leaf nodes’ frame not only store the pointer to its children, but also contains an n dimensional data structure that each of its dimension stores the close bounded interval that describes the extent of its containing objects along the dimension. This paper also describes the algorithms of the potential functionalities of R-Tree, including searching, insertion, deletion, updates and node splitting. For node splitting, this paper describes an exhaustive algorithm, a quadratic algorithm and a linear algorithm with several experiments that shows the time complexity and space complexity.

Strength:
1. This paper proposed an innovative index structure - R-Tree, which can address the issue of inefficiency search for the spatial data.
2. This paper gives step-by-step instructions of the algorithms of operations in R-Tree. This helps its reader to understand R-Tree more effectively.
3. This paper completes several experiments for the performance of its different node splitting algorithms with results showed in graphs, which is both convincing and comprehensible.

Weakness
1. Though this paper provides several comparison result of its proposed algorithm, it did not provide any comparison between R-Tree and other index structures. I believe it would be more convincing if this paper provided results to show the improvement between R-Tree and the previous index structures on spatial data.
2. Though the algorithms of the R-Tree operations are detailed and clear, it would be better if they were in pseudocode.



Review 34

The paper proposed R-tree, a dynamic index structure for spatial data objects that have non-zero size. It achieves fast spatial searching and updating in multi-dimensional spaces.

Traditional one-dimensional index structures are not suitable for multi-dimensional search. Structures based on exact matching cannot achieve range searching. Those based on one-dimensional ordering, such as B+-tree, do not work in higher dimension.

The R-tree is a height-balanced tree similar to B-tree, with index records in its leaf nodes with pointers to data objects. Instead of storing one dimensional key as in B-tree, R-tree uses n-dimensional rectangle as its index. The leaf node use the smallest rectangle that contains its spatial object as the key. Each internal node stores at most M (I, child_pointer) pairs, in which the key I is the smallest rectangle that contains all keys in child, called covering rectangle.

To search the nodes that their rectangles overlaps the search rectangle S, starting from the root it recursively searches the nodes that overlaps with S, until leaf nodes are reached. Insertion and deletion are achieved similarly to B-tree through splitting nodes and condensing tree. Updating is done by deleting, updating and inserting that node. An important operation during splitting node is to divide the rectangles of that node into two groups. To achieve best performance, we need to minimize the total area of covering rectangles after splitting. The exhaustive algorithm finds minimum area in time complexity of O(2^M). There are also two approximate algorithms that takes quadratic and linear time.

The evaluation showed that R-tree configured with reasonable disk page size of 1024 byte and M=50 produce good performance. The approximate linear time split algorithm has quite good performance compared to more expensive exhaustive algorithm. Overall the investigation suggested R-tree would be suitable for spatial database systems.