This paper provides a survey for B-tree index concurrency control methods. The paper first introduces the two forms of concurrency control: concurrency among transactions, and concurrency among different threads, and introduces that locks and latches are used for these two controls. The next part is a detailed survey on latches, which is used to coordinate multiple execution threads while they access and modify database contents and its representation. One particular situation to avoid is the case where root-to-leaf searches and leaf-to-root modifications happen concurrently, and several different methods are used to resolve such situations. Besides, latches are optimized for frequent use, and should be held for short periods. Then a survey on locks is presented. Locks provide concurrency control when there are multiple transactions, and they protect database contents. The main takeaways I get from the section are key-range locking and the use of ghost records. Key range locking allows lock on keys and gaps, and reduces lock manager invocations by using additional lock modes; the use of ghost records allows better concurrency, and could improve insert/delete performance. I like this paper because it has a clear structure - first introducing the difference between latch and lock, then illustrating each in detail. The idea I like best from this survey is the use of ghost records, since they can not only help delete, but also help improve insertion performance, which is interesting. |
B-tree indexes has been applied in most of DBMS. The concurrency control of B-trees operations is not as well understood as the basic operations like search, insertion and deletion. The paper aims at clarifying, simplifying and structuring the topic of concurrency control in B-trees. The paper starts with clarifying and defining lock and latch techniques. After showing the history and preliminaries, the paper discusses the lock and latch in depth and presents a variety of B-tree indexes locking techniques. The paper ends with future research directions and summary. Some of the strengths of this paper are: 1. The paper clarifies the difference of several confused concept pairs including protection of the B-tree contents versus protection of the B-tree structure and separation of threads against one another versus separation of user transactions against one another. 2. Latch implementation in protecting B-tree structure is optimized for frequent use by avoiding modes beyond shared and exclusive and pay more attention to deadlock avoidance than deadlock detection. 3. Key range locking technique enables high concurrency. Ghost records minimize lock footprint in user transactions and prevent phantoms for true serializability. Some of the drawbacks of this paper are: 1. The paper lacks examples while presenting the techniques, which make the concept abstract to the readers. 2. There is no experiment conducted in the paper. Presenting some experiment or referenced experimental results will present the advantage of the locking techniques presented in the paper more intuitively. |
Database servers usually run in many threads to serve many users as well as to exploit multiple processor cores and, using asynchronous I/O, many disks. Even for single-threaded applications, for example, on personal computing de- vices, asynchronous activities for database maintenance and index tuning require concurrent threads and thus concurrency control in B-tree indexes. Therefore, concurrency proves to be important in databases. However, concurrency control of operations in B-trees, which are ubiquitous in not only database management systems but also many other storage systems, is perceived as a difficult subject with many subtleties and special cases. Hence, this paper provided a survey of B-tree locking techniques to clarify, simplify, and structure the topic of concurrency control in B-trees. Concurrency control for B-tree indexes in databases can be separated into two levels: concurrent threads accessing in-memory data structures and concurrent transactions accessing database contents. These two levels are implemented with latches and locks. Latches support a limited set of modes such as shared and exclusive, they do not provide advanced services such as deadlock detection or escalation, and they can often be embedded in the data structures they protect. Therefore, their acquisition and release can be very fast, which is important as they implement short critical sections in the database system code. Locks support many more modes than latches and provide multiple advanced services. Management of locks is separate from the protected information, for example, keys and gaps between keys in the leaf page of a B-tree index. The hash table in the lock manager is in fact protected itself by latches such that many threads can inspect or modify the lock table as appropriate. The principal technique for concurrency control among transactions accessing B-tree contents is key range locking. Various forms of key range locking have been designed. The most recent design permits separate locks on individual key values and on the gaps between key values, applies strict multigranularity locking to each pair of a key and a neighboring gap, reduces lock manager invocations by using additional lock modes that can be derived automatically, enables increment locks in grouped summary views, and exploits ghost records not only for deletions for but also for insertions. The main contribution of this survey is it clarified and simplified the concurrency control topic in B-tree by distinguishing protection of the B-tree structure from protection of the B-tree contents, and by distinguishing separation of threads against one another from separation of user transactions against one another, which benefited developer education, code maintenance, test development, test execution, and defect isolation and repair. Besides, this paper also clarified assumptions and defined the two forms of B-tree locking that are often confused, followed by detailed discussions of a variety of locking techniques proposed for B-tree indexes. The main advantage of this paper is it was well-structured, clearly-explained with much details. The paper first gave a background introduction of why the concurrency problem is important in real-life applications, and discuss two forms of B-tree locking in depth, and emphasized their differences in functionalities and implementations both during normal transaction processing and during “offline” activities such as crash recovery. Finally, this paper also gave the most urgently needed future direction - simplification. The main drawback of this paper is it did not describe actual systems involving concurrency control in B-tree, which could not give reader a view from industrial field. Besides, this paper left some implementation techniques for optimal use of modern many-core processors undiscussed. |
Problems & Motivations B-tree is a ubiquitous data structure in database and system; it becomes more and more ubiquitous nowadays. The reason why B-tree is so prevalent is that it efficiently supports range queries and enable sort-based query execution algorithms such as merge join without an explicit sort operation. The basic data structure of B-tree is well understood, as well as the locking principle; however, no actual system implements the concurrency control model 100 % as the paper described because of different reasons. Main Achievement: Firstly, the paper describes two tools to achieve concurrency control. One is the latch. The latch can be viewed as a “lightweight” lock (but it can be used to implement lock). Latches only provide the basic functional APIs like acquisitions and are usually embedded in the data structures they protect. Therefore, it is simple yet faster. On the contrary, locks are comprehensive structure yet slow. The latch is used to protect the B-tree pages cached in the buffer pool, management tables and all in-memory data structures shared among multiple threads. The lock separate transactions using read and write locks on pages. In short, the lock is for the transaction – level thing while the latch is for in-memory, caches and lock manager tables. There are many other things need to be concerned. For example, the recovery of latch and lock from a crash. During recovery without concurrent execution of new transactions, locks are not required, because concurrency control during forwarding processing prior to the system crash already ensured that active transactions do not conflict. Latches, however, are as important during recovery as during normal forward processing if recovery employs multiple threads and shared data structures such as the buffer pool. Drawbacks: Most parts are well clear. No obvious drawbacks to me. |
B-trees are a very important class of data structure that forms the foundation for information access for database systems. They map search keys to values/information, and efficiently support range queries as well as sort-based query execution algorithms like merge join without an explicit sort operation. Given that one of the most important tasks in database systems (especially OLTP) is concurrency control, achieving this goal using B-trees is an important area of study. The methods for B-tree locking are vast, including row-level locking, key value locking, key range locking, lock coupling, latching, latch coupling, and crabbing, and it can seem daunting to learn. This article “A Survey of B-Tree Locking Techniques” provides background knowledge on this topic, and discusses the various considerations and features for these various locking methods. It starts off by discussing what B-tree locking is. In short, it stands for concurrency control for concurrent database transactions and/or parallel threads that access the same information (stored as a B-tree representation). This is usually done by using locks or latches, where locks separate transactions by enforcing read and write locks, and latches separate threads accessing B-tree pages cached in shared memory accessible by multiple threads. While their purposes are similar, the implementation details between locks and latches, as well as their effects on recovery, etc., can be quite different. The article further elaborates on this point for latches, especially as to how multiple threads need to be coordinated with each other when accessing common B-trees, as well as the challenges associated with maintain concurrency (e.g. preserving B-tree structure) is this environment. For example, latches are usually only use shared and exclusive modes, and they tend to favor avoiding deadlock rather than detecting and dealing with deadlock after the fact. After this, the paper discusses the use of key range locking for preserving the logical contents of B-trees. Topics that are discussed include how to select an appropriate lock scope, how to deal with ghost records, locking schemes with higher granularity, and the inclusion of more types of locks than just read and write locks, like incremental locks. Key range locking is widely used to enforce concurrency control, since it is adaptable, well understood, has a variety of useful features, and can be implemented efficiently. The strength of this paper is in providing a clear and concise summary of some important concepts in B-tree locking, namely latches and locks. While this article does not present new or original research findings, it performs an important role in summarizing the information and bringing readers up to date on current (and possible future) trends when it comes to B-tree based concurrency control. For the most part, this paper was well written, which aided in making it easy to understand. There are no specific weaknesses with this particular paper, especially given the fact that its role is in presenting past information rather than proposing a new system or method. One potential improvement that would help this paper, though, is the inclusion of references and/or discussions of real-world database systems, and the types of concurrency control schemes that they employ. Additionally, further discussion of even more types of B-tree locking methods, which they touched on in the introductory segments, would be welcome to anyone wanting to gain more knowledge in this domain. |
The purpose of this article was to examine concurrency control using the widely used B-tree data structure. First we explore background of the area, including historical methods of locking in B-Tree during insertion and deletion of data. Next the author discusses some assumptions and distinguishes two forms of B-Trees. The assumptions are our definition of B-Trees and the common distinction of read transactions versus write transactions. Our definition of B-Trees is that leaf nodes contain the data and interior nodes contain information for search on the data. The two forms of B-Tree locking are (1) concurrency control on concurrent transactions that modifying the database itself as well as the B-Tree representation and (2) concurrency control on different transactions in different threads modifying the B-Tree concurrently. Next we compare and contrast locks and latches. Locks separate transactions acting on the same data concurrently while latches separate threads accessing the same B-Tree pages concurrently. Locks and latches are also important for system recovery in the event of a crash for roll back of the history of transactions. The physical structure of the B-Tree must be preserved when multiple threads have access to the data structure. Next we explore issues that may arise in multiple threads having access to in-memory data structures and some of the workarounds to those issues like lock coupling. I liked how this paper was very detailed and the order in which it presented the overview was very logical, from a history of the topic of B-Trees in concurrency control to a more detailed look into each of the issues within the area. I did not like how this paper was not very visual. I find diagrams more helpful but there was a lot of text per table or diagram. The diagrams that were present were not very helpful, even when I knew what the diagram was trying to explain. |
A Survey of B-Tree Locking Techniques this paper gives overall introduction of concurrency control operations in B-Tree data structure. B-Tree techniques have been widely used for several decades. However, real situation is that database servers often run in many threads to serve multiple clients. The terms includes row-level locking, key value locking, key range locking, lock coupling, latching and crabbing. Those complex components are simplified into two aspect in this paper. B-tree locking has two aspects of meaning. 1)concurrency control among concurrent database transactions querying or modifying database contents and its representation in B-tree indexes. 2)concurrency control among concurrent threads modifying the B-tree data structure in memory, including in particular images of disk-based B-tree nodes in the buffer pool. concurrent threads accessing in-memory data structure and concurrent transactions accessing database bot need latches and locks. For the latches and locks. Latches support shared and exclusive, but do not provide advanced services such as deadlock detection or escalation. The acquisition and release can be fast, which is important as they implement short critical sections in the database system code. Locks support more modes and advanced services than latches. Management of locks is separate from the protected information. The hash table in the lock manager is in fact protected itself by latches such that many threads can inspect or modify the lock table as appropriate. The main contribution of this paper is the comprehensive narrative of concurrency control in B-tree in a simplified manner, other than merely introducing complex terms. The paper is structured and neat. There is no obvious drawbacks of this paper. One thing that maybe need to improve is to use graphs frequently so that people will understand paper with this length shorter. |
B-Tree and various variant data types have been used as database indexes for decades. Although such data structures are simple in function, including querying, inserting, and deleting nodes. However, concurrency control is extremely complicated, especially when it comes to database transactions. This paper systematically summarizes the concurrency control and outline of database indexes based on B-Tree. Concurrency issues in the database fall into two categories: Synchronization issues with multi-threaded concurrent access to in-memory data and synchronization of multiple transactions concurrently accessing database content. The basis of all latches is CAS (Compare and Swap): the CPU atomic instruction, for a given memory address M, compare its value A with the given value B. Database latch implementations are as follows: Mutex provided by the operating system, Reader and Writer Lock, Test-and-Set Spin Lock, Queue Based Spin Lock MCS. Regardless of the database read and write, it may be necessary to traverse the index tree. Search, insert, delete all need to use the latch. Although the latch can guarantee the thread safety of multi-threaded modification of the critical memory, since the release latch is at the inode level, not the transaction level, using only the latch may cause the problem of phantom read. Unlike latches, locks are used to guarantee and resolve concurrency issues between database transactions, rather than concurrency issues with critical memory data structures in database memory. Locks have the following types, Key Value Locking, Gap Locking, Key Range Locking, and Hierarchical Locking. MySQL's InnoDB engine supports row and table locks, where row locks are small and concurrency with respect to table locks. Row locks include: Key Value Locking, Gap Locking and Key range lock. The main contribution of this paper is that it gives a thorough overview on the main properties of B tree locking technique, and discussed the existing design and implementations of databases. The weak point of this paper is that is a little bit too long, maybe we can save some length in examples |
As implied by the title, the paper describes locking techniques for B-Trees. The paper is separated into two main sections: protecting a B-tree’s structure and protecting a B-tree’s logical contents. The B-tree structure encompasses obvious things like the B-tree data structure itself, but also encompasses the lock table. For these applications, latches are used. These are lightweight (stored in the data structures they protect), and short-lived. They protect the data structures as concurrent threads try to modify them, preventing intermediate data from being seen. Meanwhile, the logical contents are actual data that is being stored in the database system. The logical contents are protected by heavy-weight locks, which are typically held for the entire transaction. These go hand-in-hand with transactions and serializability. One primary type of lock used for database contents is the key range lock. These locks are able to lock on the absence of data, preventing phantoms. The main contribution of the paper is the taxonomy presented for separating the use cases of latches and locks. The authors stay true to this goal successfully throughout the paper, and are therefore able to give clearer descriptions, because the reader will not confuse the two main areas being discussed. This is very well done, and is a good approach for a survey paper. I’m sure that this isn’t the first and won’t be the last paper to discuss B-Tree locking techniques, but this piece of the presentation is elegant and therefore provides value that previous papers likely did not. Perhaps this was just because the paper was a survey, but I found that it covered a very wide variety of topics without focusing much on the links between them, besides the main theme that I already mentioned. This means that much of the information presented feels disjoint, and does not always seem like it is contributing to an overall goal. I think that this was exacerbated by the fact that the authors rarely presented their own opinions at all; rather, they frequently laid out many ideas without stating which was superior. This can be a good technique, but it did not help the flow of the paper. |
This paper aims to clarify and structure the topic of concurrency control in B-trees. It explains some common confusions in this topic, for example, the difference between the separation of threads and separation among user transactions, the difference between locks and latches, etc. The key idea is that latches are used to separate threads and protect in-memory data structures, while locks are used to protect database contents and its representation in B-tree indexes. Then the paper starts to talk about issues note-worthy in the design of latches and locks in detail. For latches, we need to make sure while following a pointer in a B-tree index, the pointer must not be invalidated by another thread. Also, since latches rely on deadlock avoidance, the design must ensure that during cascaded split and index scans, no deadlock will occur. For the first issue, one way to solve it is through what so-called “latch coupling”, basically retaining the latch on the parent until the child node is latched. For deadlock avoidance, we can retain latches on nodes along the root-to-leaf search path until a less-than-full that guarantees split won’t propagate beyond it is encountered. For locks, the author focus on the idea of key range locking and its variants. The idea of key range locking is simple: lock a key and the gap to the neighbor as a unit. This is actually a locking mechanism for the absence of data. Therefore we can ensure the same result if one predicate is evaluated twice. The author then talked about ghost records, hierarchical locking, increment lock mode, etc. All of these extensions gives better performance. For example, ghost records avoid space allocation during transaction rollback and improve the performance and concurrency of insertions. Hierarchical locking allows us to use a single lock to lock a key range much larger than a single interval between neighboring keys. However, I found the paper a little bit hard to follow, especially in the fifth section. In many places, the authors just listed some facts and conclusions of a design without further explanation or justification. It’s probably fine for readers who are already familiar with the topic, but it’s quite difficult for the others to really understand. |
In the paper "A Survey of B-Tree Locking Techniques", Goetz Graefe discusses two subtopics within B-Tree concurrency control, protection of the B-Tree structure and protection of the B-Tree contents. He explores each of these in depth and highlights the use cases. B-trees are a universal access path structure in databases and file systems - so much that it retains its spot as number one for many decades despite improvements and other innovative proposals. There are a plethora of operations that it supports as well as a magnitude of research devoted towards B-Tree concurrency control. However, even though research has come a long way, Graefe, like Gray and Reuter, believes that “the last word on how to control concurrency on B-trees optimally has not been spoken yet.” Since such a structure is widely used but not fully comprehended, it is clear that this is an important problem to analyze for the future of database architecture. B-Tree locking and B-Tree index locking are noted as two different concepts. The former being concurrency control among concurrent database transactions that are modifying the database contents and their respective representations in the B-Tree indexes. The latter is defined by concurrency control among concurrent threads that modify the B-Tree in memory which include images of disk-based B-Tree nodes in the buffer pool. The paper is organized into several components that compare and contrast these two subtopics: 1) Locking vs Latching: locks separate transactions using read/write locks on pages, B-tree keys, or even gaps between keys. Latching refers to separating threads accessing B-Tree pages cached into the buffer pool. Both use different implementation primitives and can be seen in both normal transaction processing and offline activities. 2) Protecting a B-Tree's physical structure: Latches coordinate multiple execution threads while they modify and access the database contents and its representation. This implementation is optimized so that they can be used frequently - shared and exclusive modes are avoided and deadlock avoidance is preferred over deadlock detection. Accordingly, latches are only held for a short amount of time to coordinate and protect the B-Tree data structure. 3) Protecting a B-Tree's logical contents: The most popular method used is key range locking for concurrency control among transactions. It is well understood, has high concurrency, permits ghost records to minimize effort in user transactions, adapts to any key type and key distribution, and prevents phantom cases for true serializability. Even though the paper was quite informative, it still had many drawbacks that accompanied it. One drawback was that the paper did not attempt to discover something new - it was a retrospective study on B-Trees. Thus, the language and setting made it feel more like a textbook than a paper. Another drawback was the lack of examples used to clarify some of the points made when describing different lock modes. I feel that, in general, humans learn from a concrete example and are able to make abstractions afterwards. Explaining a concept the other way around does not have a good success rate (especially for me). Lastly, I would have enjoyed to see some modern techniques in the future work section. Even if B-Trees are very popular, that could change given enough time and research in another data structure. |
This paper describes the different types of locks used on B+ trees for concurrency control. Due to multiple threads and transactions simultaneously accessing B+ trees, some form of locking is needed to keep the trees in a valid state. The paper makes a distinction between locks, which control concurrency between transactions, and latches, which control concurrency between threads. Separating design and control of locks and latches helps simplify design and debugging. Latches, from a DBMS perspective, are what most other applications would call locks. They’re low-level, lightweight structures. They’re used frequently, with very high concurrency, and are generally held for short periods of time. Only two types of latches are implemented: shared (read) and exclusive (write) latches. They’re used to protect against 4 situations: - One thread reads a node, while another thread writes to it - A thread follows a pointer from parent to child, and the child is invalidated - A thread follows a pointer to a sibling node, and the sibling is invalidated - Nodes are split To protect against these, the entire B+ tree could be latched, or a thread could latch the root node, and continue latching children down the tree, or it could latch nodes starting from any given node and going up to the root. The locking policy should avoid deadlocks, however. A way to make splits easier is using a modified B+ tree called a Blink tree. In a Blink tree, any node can have a sibling that is not pointed to by the parent, effectively increasing that node’s size. As such, a split in a Blink tree can divide a node without inserting the new node into its parent, which simplifies locking. Locks, on the other hand, are used to separate transactions from each other. Locks tend to have much more overhead than latches, and there are many different kinds as well. They ensure that different transactions can’t read/write database contents at the same time. The DBMS could lock individual key values, but this would allow the insertion of “phantom” records with values in between locked values. As such, the DBMS should implement key range locking, where any transaction can lock individual key values, as well as the empty ranges in between values. The transaction can also choose to lock either keys, or key-record combinations, in the case of multiple records that have the same key. This paper had the advantage of being very easy to read. The organization was very clear; there was a basic introduction, including the difference between locks and latches, and then distinct sections on the uses of locks and latches. Describing the difference between them is useful as well, as many readers are probably only familiar with locks from a lower-level perspective. On the negative side, the paper is more confusing when describing how the many types of locks interact with each other. There’s a table describing how 15 types of locks interact, and it’s not especially helpful for understanding how it all works. |
The paper summarizes the locking of B-trees by introducing the difference of locking and latching, discussing their implementation scenarios and introducing key range locking, the concurrency control technique for transaction accessing B-tree content. B-tree locking, or locking in B-tree indexes are different. These two levels are implemented by latches and locks. Latches support a limited set of modes, such as sharing and exclusive. They do not provide advanced services such as deadlock detection or upgrades, and they can usually be embedded in the data structures they protect. Therefore, their acquisition and release can be very fast, which is important because they implement short key parts in the database system code. Locks support more modes than latching and offer a variety of advanced services. The management of the lock is separate from the protected information, for example, the keys and gaps between the keys in the leaf pages of the B-tree index. In fact, the hash table in the lock manager itself is protected by latches, so many threads can check or modify the lock table as needed. The primary technique for concurrency control between transactions accessing B-tree content is key range locking. Various forms of key range locking have been designed. The latest design allows for separate locking of gaps between individual key values and key values, applying strict multi-granularity locking for each pair of keys and adjacent gaps, and reducing lock manager calls by using other locking modes that can be automatically exported. Incremental locking is enabled in the group summary view, and the ghost record is used not only for deletion but also for inserting. The paper is beneficial for learners grasp a picture of B-tree locking by pointing out the difference of locks and latches and introducing key range locking. It would be better if it gives more descriptive examples to help understand the confusing concept. |
Basic function of indexes is to map search keys to associated information. So, indexes are used to quickly locate data without having to search every row in a table every time a table is accessed. B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. Therefore, combining these two, B-Tree indexes is a type of order preserving indexes, which has B-tree structure that maintains keys in sorted order and supports logarithmic time searches. Database servers run in many threads to serve many users concurrently. So intuitively, we need some types of locks in B-Tree to support concurrent threads. We already know how to use locks to protect objects in the database. However, concurrency control in B-Tree is more complex because not only the logical database content can change, the physical structure can also change. The main differences between locks and latches are shown here. Locks protect database content from other transactions. However, latches protects index’s internal data structure from other threads. To make this clear, one transaction is usually processed by multiple threads. For example, a insertion transaction may involves a thread splitting nodes and another thread reading nodes. Also one thread can serve multiple transactions. Locks are held for the entire duration of transaction, while latches are held for critical sections. For example, when a split operation is done on a tree node, the latch is released, even if the transaction has not finished yet. Locks has more locking modes than latches. We will discuss the locking modes in more details later. There are only two latching modes, read and write. Read latch is compatible with read latches, but write latch is exclusive. Lock information is kept in lock manager’s hash table, while latches are embedded in data structures. The reason is that locks often protects data that is not present in database, therefore it cannot be embedded in data that it protects. However, latches protect the actual data structure, which is always present, so it can be embedded in the data structure it protects. The first challenge here is while following a pointer from parent to child, the pointer must not be invalidated by another thread. When traversing from parent node to child node, once the latch on parent node is released, there is a possibility that another thread comes in and modify the parent node and invalidate the pointer to child node. To avoid this, we use a latching technique called latch coupling. This technique is retaining latch on the parent node until the child node is latched. This ensures that the pointer to child is valid in the traversing process. The second challenge is pointer chasing. Concurrent threads may perform ascending and descending index scan, which could lead to deadlocks. We have illustrated an example of deadlock situation earlier. Threads that traverse the tree in opposite directions could wait for one another to give up latches, in this situation, no threads can proceed. The solution here is to have latch acquisition code providing an immediate failure mode. If such a failure occurs during forward or backward traversal, the scan must release the latch to let the conflicting scan to proceed, and then reposition itself to restart the scan. The last challenge here is that during a B-tree insertion, a child node may overflow and require an insertion into its parent node, which may also overflow and require n insertion into the child’s grandparent node. In the worst scenario, the root node would overflow and needs split. The solution here is to retain latches on nodes along root-to-leaf search path until a lower, less-than-full node guarantees that split operations will not propagate up the tree beyond the lower node. In other words latch is released on parent, if child node is safe. Safe means that the node is not full so that the node won’t split when the element is inserted. Aside from B-Tree, the paper presents an entirely different approach that relaxes the data structure constraints of B-trees by dividing a node split into two independent steps. This is called B link-Tree. In case of splitting a node, the first step is to create a new right neighbor. The second step places fourth child pointer into the parent and abandons the neighbor pointer. The advantage of this approach is that it makes allocation of new node a local step. However, there might be long linked lists due to multiple splits. This can be prevented by restricting the split operation. For locking, the challenges include serializability requires locking not only the presence but also the absence of data, a transaction may require multiple locking modes, multiple row identifiers may be associated with each value of the search key in non-clustered index, and new type of lock to enable concurrent transactions to increment and decrements of counts. To deal with these issues, paper presents key range locking, hierarchical locking, and increment lock modes. |
This paper emphasizes the concurrency control of operations in B-trees. It aims to simplify, clarify and structure the concurrency control in B-trees by dividing it into two sub-topics, which are the concurrency control among concurrent database transactions querying or modifying database content, and the concurrency control among current threads modifying the B-tree data structure in memory. The paper gave a detailed description of locks and latches, which are used to accomplish the aforementioned two sub-topics. The paper also points out that the difference between latches and locks during recovery from a system crash. The protection of a b-tree's physical structure and logical contents are also discussed, including the issues about read and write latches, lock coupling, b-link-trees, load balancing and reorganization, key range locking, ghost records and hierarchical locking. The strong part of the paper to me is that it is a good survey of b-tree locking techniques. It gave very detail explanation on every aspect of the problem it is surveying on. The main drawback of the paper to me is that: (1) there are too many technical explanations but lacks some concrete examples to help me understand the problem (2) the figures used in the paper is somehow not well drawn and don't help me much when I read the paper. |
In this paper, a comprehensive survey is proposed on the B-tree locking techniques. The basic structure and operations for B-tree are well and widely understood, however, the concurrency control of operations in B-tree is perceived as a difficult subject with many subtleties and special cases. In order to solve this problem, they write a survey which clarifies, simplify and structure the topic of concurrency control in B-trees. This problem is important because though there had been many innovative proposals and prototypes for alternatives to B-tree indexes, B-trees are still the most important access path structure in the DBMS nowadays, the traditional operations of B-trees are well understood while it is far more challenging to enable correct multithreaded execution and even transactional execution of B-tree operation, so it is worthwhile to explore and summarize this field. This paper summarized stuffs related to the concurrency control of B-tree indexes and I will reiterate the crux of this paper. This paper distinguishes the protection of the B-tree structure from the protection of the B-tree contents and distinguishes separation of threads against one another from the separation of user transactions against one another. First of all, they give a thorough introduction of the history of B-tree locking. For B-tree locking, it means two things, the first aspect means the concurrency control among concurrent databases transactions; the second aspect means concurrency control among concurrent threads. Locks and latches are used to accomplish those two purposes. Locks separate transactions using read and write locks on pages, on B-tree keys, or even on gaps between keys. Latches separate threads accessing B-tree pages cached in the buffer pool, the buffer pool’s management tables, and all other in-memory data structures shared among multiple threads. The main difference between latches and locks is that redundant indexes that point into nonredundant storage structures. All inter-transaction concurrency control is achieved by locks on nonredundant data items, however, latches are required for any in-memory data structure touched by multiple concurrent threads, including of course the nodes of a no clustered index. Latches and locks also differ both during system recovery and while waiting for the decision of a global transaction coordinator. For protecting a B-tree’s physical, we learned that latches coordinate multiple execution threads while they access and modify database contents and its representation. Latches should be held only for very short periods. None should be held during disk I/O. Thus, all required data structures should be loaded and pinned in the buffer pool before a latched operation begins. Key range locking is the technique of choice for concurrency control among transactions. Enables high concurrency, permits ghost records to minimize effort and “lock footprint” in user transactions, adapts to any key type and key distribution, and prevents phantoms for true serializability. By strict separation of locks and latches, of abstract database contents and in-memory data structures including cached database representation, and of transactions and threads, previously difficult techniques become clear. Although this is a survey, it still has some technical contributions to this paper. First, they start from a point to clarifying and simplifying the concurrency control in B-trees by distinctions. These distinctions resolve the unnecessary confusion for developer education, code maintenance, test development, test execution and etc. Besides, this paper contains great details and make a good comparison between locks and latches, this comparison makes people understand the difference between those techniques and how they should be utilized for the different scenario. The downside of this paper is minor. Personally, I think they should include more figures or examples to illustrates concepts and techniques in their paper. I feel a little bit tedious when reading this paper because it contains too many texts :( |
This paper is a survey of B-tree locking techniques. The paper first discusses the differences between locks and latches. The most important difference that locks are transaction-level and latches are thread-level. Because of this, latches are in general more lightweight. The paper then splits into 2 sections; the first discusses the use cases and types of latches. Latches are primarily used when multiple threads may be in contention for pieces of data or parts of an index. Two advantages of latches are that they are much more lightweight than locks and they can be embedded in the data structures they are for, as they will be totally in-memory. There are several issues to consider, though, such as pointer chasing and overflow—there are a wide variety of techniques to deal with these issues, such as latch coupling and modified versions of B-trees called Blink trees. The second half of the paper covers different types of locks. Locks can be separated into classes of granularity, with coarse granularity locks being more powerful but limiting concurrency and vice versa with fine granularity locks. The paper pays specific attention to key range locks, which lock ranges of keys rather than just the value of keys. These are good for increasing performance during ghost-bit-style deletion and permit more concurrency in the cases when open intervals are used in predicates. Increment-locks were also discussed, which are for operations that perform increments to attributes. Increment locks don’t block each other for this reason and so are good for increased concurrency in workloads that have high numbers of these types of queries. I think the paper’s biggest strength was its differentiation between locks and latches; I found it very clear, intuitive, and easy to understand. The paper also did a good job of highlighting common misunderstandings relating to locks vs. latches and differences between latch/lock types. The paper did not have many weaknesses in my opinion, but I did think that its explanation of “issues” with latches—i.e., relating to latch coupling and Blink trees—wasn’t explained particularly well. I had a very difficult time understanding the motivation for the issues and I think figures would have been extremely helpful here, but none were present. |