This paper introduces skip lists, a data structures to use in place of balanced trees. Balanced tree is widely used, but it has some drawbacks. If a binary tree is used, then some query would give very poor performance (like a sorted insertion); if a self balancing tree is used, then it includes some amortized but non-negligible overhead. Skip lists, as a probabilistic alternative to balanced trees, aims to provide an easy-to-implement data structure with better performance than balanced trees. The paper first introduces skip lists and search/insert/delete algorithms. The main idea of skip list is the use of random level for nodes, where node levels are generated using an algorithm to provide an approximate of p^n probability of the nth level. The paper then analyzes the search complexity in a skip list, and prove that the complexity is O(log n). Compared with balanced tree whose complexity is also O(log n), skip list is easier to implement and has lower constant factor. Compared to binary tree whose worst case complexity could be O(n), skip list is more suitable for online query processing since it can provide more robust performance. The author also claims that there could be worst cases in skip list performance, but the occurrence probability is so low and can be neglected. Overall, this paper has a good structure, first introducing how skip list works, then analyze the performance, and finally compare it with trees. The part I don't like about this paper is the analysis of expected search cost, since it doesn't illustrate why it uses the reverse search path to do analysis and it's quite confusing to read. The part I like is that the author claims that the idea of probabilistic balancing could be adapted to other problems in data structures, and the idea is interesting. |
Binary trees are widely used for representing abstract data types. However, it’s not efficient enough under certain operation sequence like inserting elements in order, produce degenerate data structures. The paper aims at introducing skip lists, which is a probabilistic alternative to balanced trees. After a brief overview, the paper presents the detail of skip list algorithms. Then some alternative data structures like AVL trees and self-adjusting trees are compared with skip list in several aspect including implementation difficulty, constant factor before complexity, performance bound and non-uniform query distribution. Some of the strength of the paper are: 1. Skip lists are easier to implement than other algorithms including balanced tree algorithms and self-adjusting tree algorithms 2. Skip list algorithm are faster by having a smaller constant factor compared to algorithms with same big-O complexity 3. The paper used lots of well-made figures to help explain the algorithms more clearly. Some of the drawbacks of the paper are: 1. The paper structure is bad. There are no well-organized titles and numbers before each part. There’s no walk through of the logic of the following sections in the beginning section. 2. Skip lists are not achieving anything significantly better than balanced trees. 3. Skip lists have bad worst-case complexity. The paper states that no input sequence consistently produces the worst-case performance, but it doesn’t provide enough support for that argument |
Balanced tree algorithm maintains certain balance conditions and assures good performance, but its implementation is too strict and complicated, hence the insertion and deletion are quite slow. Therefore, this paper proposed skip lists, which are balanced by consulting a random number generator. Skip lists are linked lists with extra pointers that skip over intermediate nodes, which makes insertions and deletions require only local modifications. Hence, the level of a node, chosen randomly when the node is inserted, need never change. Skip list algorithms consist of Search, Insert and Delete. The Search operation returns the contents of the value associated with the desired key or failure if the key is not present. The Insert operation associates a specified key with a new value (inserting the key if it had not already been present). The Delete operation deletes the specified key. Additional operations such as “find the minimum key” or “find the next key” can also be easily supported. Besides, concurrent skip list algorithms are much simpler than concurrent balanced tree algorithms by allowing an unlimited number of readers and n busy writers in a skip list of n elements with very little lock contention. Using skip lists, it is easy to do most the sorts of operations with a balanced tree such as use search fingers, merge skip lists and allow ranking operations. The advantages of skip lists are as follows: 1. Skip lists are a simple data structure and are a more natural representation than trees. 2. Skip lists algorithms are easy to implement, extend and modify. 3. Skip lists are space-efficient. 4. Skip lists are about as fast as highly optimized balanced tree algorithms and are substantially faster than casually implemented balanced tree algorithms. The main contribution of this paper is it proposed a new data structure which has the similar performance to the optimal solution - balanced tree, but with much easier and simpler implementations and algorithms, and skip lists are more amenable to concurrent access/modification. The most frequently used implementation of a binary search tree is a red-black tree. The concurrent problems come in when the tree is modified it often needs to rebalance. The rebalance operation can affect large portions of the tree, which would require a mutex lock on many of the tree nodes. Inserting a node into a skip list is far more localized, only nodes directly linked to the affected node need to be locked. The main drawbacks of skip lists are as follows: 1. Skip lists take more space than a balanced tree. 2. Skip lists are not cache friendly because they don't optimize locality of reference i.e. related elements tend to stay far apart and not on the same page. |
Problems & Motivations B-tree is a ubiquitous data structure in database and system; it becomes more and more ubiquitous nowadays. The reason why B-tree is so prevalent is that it efficiently supports range queries and enable sort-based query execution algorithms such as merge join without an explicit sort operation. However, if user inserts ordered data in sequences, the B-tree will be degenerated into a list and results in bad performance. Therefore, tree balancing needs to be done for this situation. Main Achievement: The author proposes the skip list, which is a data structure that can be used to achieve tree balancing. Each element is represented by a node, the level of which is chosen randomly when the node is inserted without regard for the number of elements in the data structure. It supports search, insert, choosing a random node and deletion operations. The main data structure is the forward array. Drawbacks: There is no truly random function in computer science. Therefore, if the random algorithm is known by others, that person can generate the test case that will result in the bad performance of the algorithm. |
Binary trees are a widely used data structure that is employed to represent dictionaries and ordered lists, among other abstract data types. While they work well in most cases, it is possible for some insertion sequences, such as inserting in order, to result in degenerate data structures that give very poor performance. Balancing the binary tree is the common answer to this potential drawback, but this introduces additional overhead. The paper “Skip Lists: A Probabilistic Alternative to Balanced Trees” proposes an alternative probabilistically based data structure, skip lists, that is easier to implement and provides significant performance and space usage improvements compared to balanced tree implementations. While there are situations when skip lists perform poorly, the author claims that they are rare, and are outweighed by the benefits they bring. The paper begins by describing the implementation of various methods in skip lists. They are composed of nodes linked by pointers to each other in a linked-list fashion and occupying one or more levels. For each level, the list is terminated by an element NIL that has a key beyond the bounds of any legal key. In order to search for a value, we start on the highest level and traverse pointers until the element is found or the key is exceeded. In the latter case, we would then try the next level below. One key feature that when going down a level, we start at the same node rather than going back to the first node in the level. This is repeated until the element is found or no more levels are left to search. To insert data, search is simply done until it finds the correct region where the preceding key is less than or equal to than the inserted node’s key, and the following key is greater than or equal to its value, after which the element is inserted. Deletion is done by searching for the value and deleting once it is found. Another important feature of skip lists is that the initial level to begin searching is determined probabilistically rather than starting on the top or other schemes. In the authors’ analysis, the expected search cost turns out to be O(log n), which is in line with a binary tree. After establishing the algorithms for their skip list, the authors present performance results for skip lists in comparison to other data structures like self-balancing trees. In general, skip lists were very similar to the best trees in search time, and achieved much better results for insertion and deletion times. The primary strength of this paper is that it presents a new type of data structure based on probabilistic balancing that is capable of supporting insertion, deletion, and search like balanced binary trees, but with the advantage of significantly better performance and easier implementation. The key insight that the authors noted in this paper is that when performing balancing, it is often the case that “good-enough” balancing using probabilistic methods can often achieve the same search performance results as explicitly balanced trees, without having to incur the expensive computational overhead associated with balancing the tree every time a record is inserted or deleted. One weakness of this paper, probably common to most probabilistic methods, is that it is not possible to guarantee maximum bounds when it comes to worst case performance, which may limit its applicability in certain mission-critical applications. Whether this matters enough to matter (especially if like the authors claim, worst case scenarios are very rare) is unknown. Also, a more comprehensive description of their timing results for the algorithm comparisons would have been helpful, particularly in the types of data that they are inserting, etc. Also, it does not address issues like scaling or worst case scenarios in this timing section, perhaps because of space considerations. Including this would have been interesting in characterizing skip list performance across a variety of different workloads. |
This paper’s contribution was to explore the skip list data structure as opposed to the more common binary trees, which have slower and more complicated insert and delete algorithms. The probabilistic balancing of skip lists eases implementation and speeds up balancing as over maintaining the balance of a balanced binary trees. The idea behind skip lists are that they are essentially linked lists that have additional pointers at certain nodes that skip nodes. This feature of skipping nodes can improve search on a sorted skip list down to logarithmic time. A node with k forward pointers has level k which are all indexed 1 through k and a node’s level is randomly selected. The header of the skip list has forward pointers at levels 1 through MaxLevel, where MaxLevel is the cap level of any given node. The paper then looks into how algorithms are implemented for the skip list. The first one the author explains is initialization. A new skip list is initialized by allocating the special NIL node which has a key higher than any legal key, and a new list is given level 1 with all forward pointers in the list pointing to NIL. Next we look at search, which essentially skips nodes as long as the skip pointer does not point to a node with a value that overshoots our goal. When you can no longer skip you simply only traverse the immediate next pointers. For insertion and deletion we search and make our modification, updating pointers as well as MaxLevel, if necessary. We also look into how we determine the levels of nodes, which randomly selects a value less than or equal to the MaxLevel. We then looked into different ways to consider starting our search algorithm, and they settled on the node with the highest level. Next we analyze the algorithms we’ve implemented. Search came out to be an O(logn) operation where n is the size of the list. The number of comparisons came to be 1 plus the length of the search path. Next the paper compares the effects of using a value of ¼ for p versus ½ where the latter provides less variability in the running times. We also noted that searching for the same element multiple times is duplicated effort in that we will see similar run times, with the variability that is built into this data structure. As this relates to databases, we looked into how self-adjusting trees can adapt to non-uniform query distributions, so they are faster than skip lists in that sense. This is only the case in highly skewed query distributions, and otherwise skip lists may perform better in their average case. I like how this paper leverages probabilistic methods to allow an easily implementable balanced data structure compared to the balanced binary tree data structure that predates it. Naturally, on the other hand, we trade of this ease of implementation and speed for the chance that it can perform much worse than balanced binary trees, albeit with lower probabilities. And that brings me to my critique of the paper in that in that it acknowledges the very bad worst case of skip lists for balancing, but dismissed it as being low likelihood, as was the best case. I do like however, how later in the paper, the numbers show that it is very very unlikely like 1 in 200 million that a search path will be more than 3 times the expected length. |
skip lists This paper introduces a new data structure call skip list. It is an algorithm that deals with tree balancing. It is an alternative to previous tree balancing algorithm but the differences are existed. First of all, it uses different balancing strategy. Previous tree balancing algorithms are using strong constriction which enforcing the actual depth of the each path of the tree are balanced. Skip lists however, uses a probabilistic mode in which skip lists are balanced by consulting a random number generator. Second of all, from implementation perspective, the insertion and deletion are much faster and easy to implement. Finally, from performance perspective, although the worst case of the skip list is worser than previous algorithm, but the occurrence of worst case is unlikely to happen with inducing that the expected performance is better due to loose constrict of balancing. Skip lists is the algorithm that with the easy implementation like basic linked list but it realizes the same performance as AVL tree and Red Black tree. the overall idea of Skip lists is to use extra space to improve the performance. From this paper, the basic principle of the skip list is clear: the algorithm adopts a hierarchy of indexes which is a little like segmentation tree and it consists of many levels and each level contains a sorted linked-list. The liked list in the lowest level has the whole items. For each node there are two pointers, one of the pointers points to the next item in the same level and another pointer points to the item in the lower level. Therefore, we can find some features of skip lists that 1)if there is an item x in level I, which means any lower levels will contain x. 2) each level is sorted linked list. 3) INI_MIN and INI_MAX both occur at the end of list for each direction. As a data structure for storage, insert, delete, and search need to be considered. For all three basic operations, the time complexity is O(logn). Because for insert and delete, it needs to first find the place to insert or to delete, and due to this hierarchy structure, the find operation(the same as search) is O(logn), and since this is a linked-list structure, the operation will only take O(1). Therefore, these basic operations have the same performance of tree structure but take twice the space. This a classic algorithm that borrow space for time performance. The main contribution of this paper is, the introduction of skip lists give people a new idea of easy implementation of balanced tree structure. And this algorithm is easy to modified and extend. As the drawbacks of this paper, skip lists is first space wasting which takes as twice space as normal AVL trees. Second, skip lists has implemented too few functions. Third, as the previous papers stated, a new algorithm, without big changes on performance or super easy implementation, it is hard to be successful in the industry. Finally, though worst case seldom happen, it makes people feel uncertain. |
According to the title of the paper, SkipList was originally designed as an alternative to the balanced tree. We all know that AVL trees have strict O(logN) query efficiency, but because of the possibility of multiple rotations during the insertion process, the insertion efficiency is low, so there is a red-black tree that is more practical in the engineering field. However, there is a problem with red-black trees that is inconvenient to use in a concurrent environment. For example, when you need to update data, Skip needs to update less, and locks less, and the red-black tree has a balanced process. There will be more nodes involved, and more nodes need to be locked, which reduces the concurrency performance. The author introduced how to construct a Skip List construction 1. Given an ordered list. 2. Select the largest and smallest elements in the connected table, and then select some elements from other elements according to a certain algorithm (random), and then form these elements into an ordered list. This new linked list is called a layer, and the original linked list is called the next layer. 3. Add a pointer field for each element you just selected. This pointer points to the element whose value in the next layer is equal to itself. The Top pointer points to the first element of the layer 4. Repeat steps 2 and 3 until you are no longer able to select elements other than the largest and smallest elements. The author also introduced how to insert an element, how to delete and how to search. Also the author analyze the time complexity of those operations. The main contribution of this paper is that it proposed skip lists to replace balanced tree. Secondly, though the worst case performance of skip list is bad, it guarantee that no input sequence always produce worst case performance. Also this skip list guarantee that the data structure will very unlikely be significantly unbalanced. The skip list achieve similar balance property without requiring random insertion. One thing need to mention is that skip list is much more easier to implement than red black tree. |
The paper describes skip lists, which as the title states are an alternative to balanced trees. They are a type of linked list that maintains additional pointers in order to increase the speed of lookups. The traversal mechanism allows nodes to be skipped over in an efficient way, providing for an average of O(logn) lookups. While an immutable data structure could easily be created that strictly provides the property of maintaining log(n) lookups with these additional pointers, probability is used in order to optimize for insertions and deletions. A level is randomly determined for each node, which corresponds to the number of forward pointers that node will maintain. The number of nodes decreases at each level. This structure allows for the level of a node to remain constant, even during insertions and deletions, which provides a significant boost in runtime. Skip lists can be compared to balanced and self-adjusting trees, although the type of time bounds for each are different. Skip lists have probabilistic time bounds. I thought that this paper, especially the beginning, was extremely well written. Figures 1-5 in particular complemented the text quite well. The fact that this paper is well written and easy to understand corresponds to the best thing about skip lists: they are simple. The simplicity and the author’s desire to maintain the simplicity is stated many times. One other property that was interesting was that skip lists are at least hard (if not impossible) to mess up with adversarial input. The order in which values are inserted does not matter for performance - the adversarial user would need to know the level of nodes to force worst-case runtimes. Presumably knowing this is unlikely, although I did wonder if running and timing a sufficiently large number of lookups could somehow expose the levels to an adversary. The authors themselves state on the last page “From a theoretical point of view, there is no need for skip lists. Balanced trees can do everything that can be done with skip lists and have good worst-case time bounds (unlike skip lists).” In the conclusion that follows, their justification for skip lists is primarily the simplicity of implementation vs. implementing and optimizing a balanced binary tree. While I find this to be a compelling paper, it seems as though the author’s primary justification is that the data structure is easier for an early career CS student to implement. That is hard to reconcile with any notion that this should be used in a high-performance system that presumably will be implemented by much more experienced programmers. Despite this, I found that the runtimes in the results section, particularly for insertion and deletion, presented a coherent argument for why skip lists are useful. I found myself wondering why the authors didn’t discuss this at all in the conclusion. |
This paper introduces a new data structure called Skip List. This is a probabilistic alternative to balanced trees. The problem with the existing balanced tree algorithm is that for certain input sequences, for example, ordered sequences, the performance of the algorithm will degenerate to linear. Self-adjusting trees enforce a balanced tree by adjusting internal data arrangement (for example, rotate the trees, etc.). This solution, however, makes the algorithm very difficult to implement and incurs high constant time complexity factor. Skip list finds a balance in these two extremes, it’s easy to implement but only ensures good performance for most of the time (not always). However, we should notice that it’s very unlikely for a skip list data structure to be significantly unbalanced. The main idea of skip list is inserting additional pointers to a sorted linked list. The inserted pointers allow the search algorithm to skip several elements and thus improve performance. More specifically, every node in the ordered list has several next pointers and each pointer belongs to a certain level. A pointer of level L points to the first node on the right which also has a pointer of level L. The number of pointers itself is a random number. The paper shows as long as # of nodes with L pointers is a fraction # of nodes with L-1 pointers, the expected cost of a search operation is O(log N), where n is the number of elements in the list. For insertion and deletion operations, once the insert location/element to be deleted is found, only local operation is needed, so the time complexity is still sub-linear (though some logging is needed during the search, it has no effect on the overall complexity). In my opinion, the most important advantage of skip list is still its ease of implementation. However, if self-balancing tree algorithms are highly optimized and provided as a library, then will people still use skip lists? |
In the paper "Skip Lists: A Probabilistic Alternative to Balanced Trees", William Pugh designs a new data structure, Skip Lists, which act as an alternative to balanced binary trees. Currently, balanced binary trees suffer from many issues due to their anatomy - most notably, poor performance when items are inserted in sorted order. Contrary to this, they have better performance when items are inserted in random order. Naturally, one would think "Why don't we just make the list of items random before insertion?". Even though this is a valid assertion, it is not possible to permute the list of items before insertion because most modern queries need to be answered on-line. Thus, Pugh proposes Skip Lists, a tree that uses probabilistic balancing rather than strictly enforced balancing in order to reduce overhead for insert and delete operations. Indicated as a more "natural representation of a tree", Skip Lists are space-efficient, have simpler algorithm implementations, and have an easier time balancing than its explicit counterparts. Furthermore, much like quicksort, Skip Lists have a terrible worst case run time, but is unlikely to occur. Thus, when observing the benefits in direct comparison to a balanced binary tree, it is clear that this is an interesting topic to explore. Skips lists are implemented similar to the way linked list with additional forward pointers are designed. However, instead of having forward pointers to every 2^ith node, we have them point to random levels. Thus, there are major changes to how operations are employed: 1) Initialization: Nil is given a key greater than any legal key. All levels terminate with Nil. We create list level 1, and all forward pointers point to Nil. Throughout its lifetime, the max level we can have is 2^L(n) where L(n) = (log BASE 1/p (n)). e.g. when p = 1/2, the max level is 2^n. 2) Search: Traverse forward but don't overshoot nodes that you are interested in. However, we can be optimistic and start our search at the highest levels first. If we terminate search at a level, we simply move to the next level. The total expected cost of performing a search is <= L(n)/p + 1/(1–p). 3) Insertion and Deletion: Using a search and splice method, we find the appropriate place for the element using (2) and then insert/delete while also fixing the pointer locations. Likewise, the cost of running insertion and deletion heavily depends on the optimization of search. Even with great benchmark results and easier algorithm implementations, there are still some drawbacks to the design of Skip Lists. The greatest flaw observable is the lack of demand for such a data structure. Balanced binary trees can do everything that a Skip List can do, and offer reasonable good worst case time bounds. Furthermore, balanced binary trees behave in expected ways which allow analysis and debugging to become a much simpler task. One thing that I found quite misleading was the assertion that search was much faster, on average, in Skip Lists than balanced binary trees. However, when observing the results, search performed in relatively the same amount of time. Since search is something that is used a lot in fields like data mining, there seems to be no need to change their internal data structure since their workload revolves around search. |
This paper describes skip lists, an alternative to binary trees used for quick access to database records. Binary trees are useful, but they have very poor performance on certain input orders, like when all keys are inserted in order. Trees can be self-balanced to deal with this, but that requires extra overhead. Skip lists are a way to store keys that keeps the easy access of binary trees and don’t have poor performance on any predetermined insert order. The basic idea of a skip list starts with an ordered linked list, where each node has a pointer to the next node. This requires little overhead, but has linear complexity for searches, inserts, and deletes. If each node has more pointers to nodes beyond the next one, then searches can “skip over” subsequent nodes, and speed up searches. In the final setup, each node has a randomly assigned level. Each level 1 node just has one pointer, which points to the next node. Each level 2 node has a pointer like a level 1 node, and an extra pointer that points to the next level 2 node. Each level 3 node has 2 pointers like a level 2 node, and one more pointing to the next level 3 node. This continues for arbitrarily high levels. When a node is created, it’s given a random level according to a predetermined proportion p, usually ½ or ¼ . Approximately 1-p nodes will be level 1, p(1-p) nodes will be level 2, p^2(1-p) nodes will be level 3, and so on. Most nodes will be low-level, and the list will have a header node that points to the first node of each level in the list. This approach has several advantages. Compared to a binary tree, it’s easier to implement and searching is simpler. Most nodes only have a single pointer. The complexity of searches, inserts, and deletes remains logarithmic in this model. Because node levels are randomly generated, no single insert pattern is guaranteed to have bad performance. On the downside, randomness can make the performance harder to evaluate, and its more difficult to tell if the structure is working properly. Also, when comparing skip lists to other data structures, it looks at performance in already constructed trees, and doesn’t compare filling trees with certain difficult insertion orders, even though that was one of the original reasons given for creating skip lists. |
The paper introduced skip list, an alternative approach for balanced trees that are faster, more space efficient and easy to implement. A skip list is a data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. Via the linked hierarchy, these two elements link to elements of the next sparsest subsequence, where searching is continued until finally we are searching in the full sequence. The elements that are skipped over may be chosen probabilistically or deterministically, with the former being more common. The first contribution of this paper is absolutely the design of skip list, which is easy to implement, faster and competes balanced trees in many applications. Most importantly, it introduces the idea of using probabilistic approaches to design data structures. The machine learning based data structures emerged recently seems to inherit a bit from this idea. The drawback of the paper is that first of all, it’s quite straightforward and go directly to the algorithms, which is hard to read. Drawbacks of skip list are mentioned in paper that it has bad worst-case performance, no input sequence consistently produces the worst-case performance |
Binary trees can be used to represent abstract data types. It works well if data is randomly inserted, since random input order tends to produce balanced tree. The balanced tree ensures the cost of searching, inserting, and deleting a node to be log(n). However, when elements are inserted in order, it will produce a degenerate tree as shown here, which is like a linked-list. In this case, searching a node would take O(n) instead of log(n). In most cases, queries must be answered on-line, so randomly permuting the input is not practical. Therefore, binary tree is not an ideal data structure to use. The main issue of binary tree is that it cannot guarantees a balanced tree structure. The balanced tree solves this problem by re-arrange the tree as operations to maintain certain balance conditions. Even if the elements are inserted in order, the tree will still be balanced as shown here, which ensures searching to be efficient. The disadvantage of balance tree are inefficient insert and delete operations. Since balanced tree needs to continuously re-arrange the nodes to maintain balanced structure, any insertion or deletion could cause recursive node rearranging. Also, each node needs to store balance information of its children, which is not space efficient. Therefore, we need an alternative to balanced tree to have more efficient insert and delete. This new data structure is called skip list. The inspiration comes from linked list. For linked list, the operation of inserting a node is efficient, but locating insertion point and searching for a node is sequential. To search for a specific node, we have to examine each node starting from the head until the specific node is found. But, what if we add more pointers as shortcuts to speed up the searching process? As shown in the second list, we add a pointer to the node two ahead of it for every other node in the sorted list, in this case, we have to examine no more than half of the nodes. If we give every fourth node a pointer to the node four ahead, then we only needs to examine approximately quarter of the nodes. This data structure could be used for fast searching, but it would complicate insertion and deletion if it tries to maintain the pointer pattern here. What if number of pointers of each node is chosen randomly but with same proportion? Then we would end up with something like list D shown here. Since this data structure is essentially linked list with extra pointers that skip over intermediate nodes, this data structure is named skip lists. The level of each node is chosen randomly during insertion, therefore, skip lists uses probabilistic balancing rather than strictly enforced balancing. Balancing a data structure probabilistically is much easier than explicitly maintaining structure balance. For search operation, we traverse forward pointers that do not overshoot the node containing the element being searched for. When no more progress can be made at the current level, the search moves down to the next level. When no more progress can be made, we are in front of the desired element (if it is in the list). This search operation takes O(log n) because we cut the # of items in half at each level. The paper derives the cost of searching by analyzing the search path backwards. Insertion is very similar to search. The only difference is that we store pointer to the current node to an update array for future insertion. The delete operation is the same with search. Once the element to be removed is found in the list, the delete process is exactly the same on a linked-list. One thing to point out is that when an element is inserted, a random level algorithm is used to determine the level for the element. Summing up the advantages of skip lists, it uses probabilistic balancing instead of maintaining explicit structure balance, therefore implementation is direct and easy. The memory requirement is less because it does not require balance information to be stored in each node. Insertion an deletion do not need to be balanced because it uses probabilistic balancing. And skip lists provide expected O(log n) time for all operations as we derived earlier, which is very efficient. Also, the level structure of skip lists is independent of inserted nodes, therefore, there is no bad key sequences that will lead to degenerate skip lists. The disadvantage of skip lists is that it potentially has bad worst-case performance. However, these degenerate skip lists are very unlikely. In conclusion, skip lists are a simple data structure to replace balanced tree because it is significantly faster, easy to implement, extend and modify. However, from a theoretical point, there is no need for skip lists, because balanced tree can do everything that can be done with skip lists. Also, balanced tree has good worst-case time bounds. I like this paper for it is very concise and touch the most important features of skip lists. However, the paper does not state that if skip lists has any value and whether it is used in reality. The conclusion that author made in the paper basically deny the value for both balanced tree and skip lists. I wonder what is being used in industry. |
This paper proposed a data structure, called skip lists, which can be used in place of balanced trees. The main idea of skip lists is that they are balanced by consulting a random number generator. And it is trivial that balancing a data structure probabilistically is easier than explicitly maintaining the balance. Also, it can be implemented more easily and provides significant constant faster performance over balanced trees. The paper gives an algorithm to search for, insert, and delete elements in a dictionary or symbol table. Also, how much "probabilistically" is also discussed into detail in the paper. I think the contribution of this paper is obvious, an alternative data structure of balanced tree is proposed. The algorithm of such data structure is also proposed and analyzed completely. There are also several drawbacks of the paper, or more precisely, the skip lists. Skip lists take more space than a balanced tree. Also, as the result of my search online shows that there seems a lack of implementations on skip lists. Balanced tree has more optimized tree implementations. This paper proposed a data structure, called skip lists, which can be used in place of balanced trees. The main idea of skip lists is that they are balanced by consulting a random number generator. And it is trivial that balancing a data structure probabilistically is easier than explicitly maintaining the balance. Also, it can be implemented more easily and provides significant constant faster performance over balanced trees. The paper gives an algorithm to search for, insert, and delete elements in a dictionary or symbol table. Also, how much "probabilistically" is also discussed into detail in the paper. I think the contribution of this paper is obvious, an alternative data structure of balanced tree is proposed. The algorithm of such data structure is also proposed and analyzed completely. There are also several drawbacks of the paper, or more precisely, the skip lists. Skip lists take more space than a balanced tree. Also, as the result of my search online shows that there seems a lack of implementations on skip lists. Balanced tree has more optimized tree implementations. |
In this paper, the author proposed a novel probabilistic alternative to balanced trees. The problem they are going to solve is to create a new data structure which can be used for range indexes for DBMS. Although there is already some mature data structure for DBMS indexes like hash tables for hash index and B+ tree for range index, it is still a significant question is DBMS because different indexes have different performance on different operations, thus exploring some novel structure for some special workload is valuable. Thus, in this paper, the author introduced skip lists where insertion and deletion are much simpler and significantly faster than equivalent algorithms for balanced trees, next I will go over the crux of this novel data structure. Comparing to the traditional balanced search tree, the biggest difference for skip lists is that they use probabilistic balancing rather than using an enforced balancing. As they mentioned in their paper, tree structure work well when the elements are inserted in a random order, however, for some sequence operation, they give a very poor performance. Intuitively, one can randomly permute the list of insertion so that trees can work well in high probability. Skip lists are multiple levels of linked lists with extra pointers that skip over intermediate nodes. It maintains keys in sorted order without requiring global rebalancing. This is achieved by using a random number generator. Although the worst-case time complexity of search can be O(n), however, it is very unlikely a skip list data structure will be significantly unbalanced, so the amortized performance of skip list is comparable to balanced search trees. In their paper, they thoroughly discussed the search, randomLevel, insert and delete operations and they also provide a probabilistic analysis for the average time complexity for search, which is O(log n). From the experiments, we can find that skip lists are promising in doing insert and delete compared to traditional tree indexes. This paper is definitely a successful paper and the skip lists are utilized by some commercial systems like MemSQL, RocksDB, WiredTiger and etc. The biggest contribution of this paper is the introduction of the skip lists data structure in which the insertions and deletions do not require rebalancing, this greatly reduces the overhead in a traditional B+ tree when doing such operations. Besides, if you do not include reverse pointers in a skip list, the memory consumptions of a skip list are lower than the B+ tree. Another big advantage of the skip list is that balancing a data structure probabilistically is easier than explicitly maintaining the balance like the red-black tree or AVL tree. From an engineering perspective, the skip lists algorithms are very easy to implement, extend and modify, thus it is much flexible than traditional balanced search tree structures. The simplicity of skip list algorithms makes it easier to provides significant constant factor speed improvements over the balanced tree and self-adjusting tree algorithms. However, there are also some limitations to this data structure for database indexes. First of all, I think the skip lists are not disk/cache friendly because they do not optimize locality for reference, for insertion skip lists random select a level and perform insert on that level, it is very hard to utilize the cache usage on a random environment. Second, it involves random generator multiple times per insert which is slow. |
This is a very straightforward paper that introduces a new data structure called a skip list. Skip lists are used in the same situation that you would use a balanced search tree—skip lists’ primary advantage is that they are extremely simple compared to balanced search trees and are thus easier to optimize. They can also be faster in the expected case by constant factors, although they have a worse worst-case time complexity. Skip lists are basically linked lists where each node is assigned a “level” that defines additional pointers to other nodes in the list. Thus when searching, instead of following each node’s incremental pointer, the algorithm can search by the largest pointer skips, then narrow the search until single node-skips are performed and the target is found (or deemed to be not in the list)—conceptually analogous to binary search. The benefits to skip lists are very intuitive—their relative simplicity allows for easier implementation, which leads to more motivation to optimize and improve the algorithm. Also, the expected case runtime is good compared to balanced search trees. There are 2 main weaknesses, however: the first is that worst-case runtime is worse than balanced search trees, due to the probabilistic nature of SLs. Also, while simplicity is an advantage and it is better than balanced search trees in runtime, there are no order-of-magnitude improvements, and skip lists don’t solve any major problems that had been plaguing people, so the impact is not quite as big as some of the other papers we have read. |