To implement concurrency control, databases use various methods of locking, timestamps, or buffers. Each method has implications for the overhead the DBMS incurs, depending on its workload. Each method has its own subtleties that developers must understand, to ensure that theoretical guarantees are satisfied in implementation. For an example design trade-off, optimistic concurrency control is useful when conflicts are rare, because it has no locking overhead; but if methods that write large amounts of data often conflict, two-phase locking is more efficient, because it requires less restarting of transactions and writing to buffers.|
The authors of the chapter on “Concurrency Control” discuss mechanisms used by DBMSs to implement concurrency controls, including locking, timestamp-based, and optimistic. Strict two-phase locking requires each transaction to obtain a shared lock before reading an object, or an exclusive lock before writing to an object, releasing locks only on rollback or commit. Timestamp-based concurrency control assigns a timestamp to each transaction, and a read and write timestamp to each object. These timestamps are compared when a transaction wants to read or write to an object, after which the transaction either takes its desired action or aborts and tries again. Finally, optimistic concurrency control makes each transaction write to a private buffer instead of the database, then abort and restart if a potential conflict is detected, or else flush its buffer to the database for a successful commit.
This chapter does a good job of explaining different styles of concurrency management. The chapter addresses problems that can arise for concurrency managers, such as the phantom problem, deadlock, and convoys. There is a deeper discussion of deadlock detection and removal methods in this chapter than in the previous one. The authors describe how a waits-for graph is maintained, representing which transactions are waiting for locks held by which other transactions. When a cycle exists in this graph, its member transactions are in deadlock. One of these transactions must be aborted so it will release its locks, resolving the problem.
Unfortunately, the authors sometimes make unsupported claims or apparent contradictions. For example, on page 553 they note that “conflict serializability is sufficient . . . for serializability,” but earlier they stated that if tuples can be inserted into a table, a schedule can be conflict serializable but not serializable. Now example is given, however, of how this can be.
The chapter extends the the concept of transactions from Chapter 16 and highlights different lock mechanisms. It starts off by highlighting how two-phase locking, serializability and recoverability work when a conflict within transactions arise. For this purpose, a precedence graph is suggested which tracks the lock requests and exhibits the dependencies of various transactions. A variant of strict 2PL is introduced called as 2PL, which is flexible on the release of locks before transaction commits or aborts its execution. Latches (read or write operation is atomic) and Convoys (when most of the CPU’s cycles are spent switching from one process to another) are detailed which are different forms of locks.|
Lock conversions and upgradations are discussed with the conclusion that it is better to have an exclusive lock and then downgrade from there so that it prevents deadlocks. Deadlocks are usually rare but can still be a nuisance. These are prevented with the use of various techniques which address the phantom problem (modification of data without the initial transaction modifying the table). Concurrency control is also exhibited via using B+ trees. Sufficient explanation is given with relevant diagram to highlight how the issues are resolved. Locking can also be done on different hierarchy level such as only specific columns of the table. Parallelism based on Timestamps is also detailed within the chapter.
The chapter is successful in providing the readers with the concepts of locks in simple language. These are supported with variants which address different issues faced during conflict encounters.
The reading had some typos here and there. Apart from that, with the exception of concurrency control in B+ trees, no full-detailed examples are provided in support of the concepts.
In this chapter, we look at concurrency control in more detail. Begin by looking How does Strict 2PL ensure serialisability and recoverability in 17.1. |
Section 17.2 is an introduction lock management. Lock table keeps track of the locks issued to transactions. The lock manager maintains a lock table, and it’s a hash table with the data object identifier as key. It also maintains a descriptive entry in a transaction table. This list is checked before requesting a lock, to ensure that a transaction does not request the same lock twice.
Section 17.3 discusses the issue of lock conversions. A transaction may need to acquire an exclusive lock on an object for which it already holds a shared lock. There are three way upgrade and downgrade the lock, and introducing new kind of lock: update lock would increase concurrency .
Section 17.4 covers deadlock handling. Deadlocks tend to be rare and typically involve very few transactions. In practice, therefore, database systems periodically check for deadlocks. The lock manager maintains a structure called a waits-for graph to detect deadlock cycles. An altered detection method is a timeout mechanism. To prevent deadlock, we can give each transaction a priority and ensuring that lower-priority transactions are not allowed to wait for higher-priority transactions.
Section 17.5 discusses three specialised locking protocols—for locking sets of objects identified by some predicate(phantom problem), specialised locking techniques: for locking nodes in tree-structured indexes(concurrency control in B+tree), and for locking collections of related objects(multiple-granularity locking).
Section 17.6 examines some alternatives to the locking approach: optimistic concurrency control, timestamp-based concurrency control, Multi-version concurrency control.
This paper (more precisely a chapter from the book Database Management Systems by Ramakrishnan and Gehrke) provides a detailed description about concurrency control, which is one of the important elements in transaction management in DBMS. In short, it takes a closer look at the details of concurrency control and provides answers to related questions such as how DBMS resolve deadlocks and how multiple-granularity locking works. It first introduces locking protocols and how it guarantees serializability and recoverability, as well as implementation in DBMS. Then it talks about how to deal with deadlocks, following with that is the discussion about some specialized locking techniques including techniques to solve Phantom problem. Finally, it discusses server methods to do concurrency control without locking. |
There is not much general problems here since it is an instructing purpose chapter for further discussion about concurrency control. The motivation here is that there are many problems in concurrency control, and there are different ways to handle them where each way might perform better for one aspect but not for another aspect. For example, the traditional way to deal with concurrency control is through locking, however, there might be other ways such as timestamp-based concurrency control and or optimistic concurrency control mentioned in this paper.
The major contribution of the paper is that it provides a closer look to concurrency control with detailed explanation for several related issues. It provides detailed examples to introduce new concepts and techniques. Here we will summarize the key components below:
1. Locking protocol and implementation
a. Strict 2PL, and its variant 2PL (relax the rule to allow transactions to release locks before commit or abort action)
b. precedence graphs (help capture potential conflicts between transactions)
c. atomicity of locking and unlocking
2. deadlock (periodically check for cycles using wait-for graph, wait-die, wound-wait)
3. specialized locking techniques
concurrency control in B+ trees (lock node if a split can propagate up to it from modified leaf)
multiple-granularity locking (IX, IS, SIX lock)
4. Concurrency control without locking (optimistic concurrency control, time-based concurrency control)
One interesting observation: I like the way the author introduces new concepts and techniques. The detailed examples are also very helpful. I found it very interesting that real systems are taking different methods to do concurrency control (IBM DB2: strict 2PL or variant, Oracle8: readers never wait). It seems that there is no “best” solution and it varies with different data or read/write operations.
This chapter deals with concurrency control. I must say, I got a lot more out of reading this chapter now than I did the first time while taking 484 - after having taken 482, many of these concurrency issues make more sense. The primary idea of 482, and this chapter, is to use locks to limit the possible interleavings of threads so that only correct outcomes are possible. However, there is another issue that we must consider - recovering the state of the database, should crash occur. Concurrent schedules must be both serializable and recoverable - both of these things are guaranteed by strict two phase locking. A lot of what this chapter covers should be review to those of us who have taken Operating Systems. It explains lock queues, and how transactions (threads) are put to sleep if they are waiting on a lock that is not available, as well as how locking and unlocking are implemented as atomic operations.|
Deadlocks are another issue that must be considered. There are two popular ways to deal with this issue - try to prevent deadlocks using some avoidance techniques, or look for deadlocks and fix them after that fact. Since deadlocks happen to be very rare in practice, most DBMS choose to look for deadlocks, rather than pay the computational cost of avoiding them. They generally all do this using precedence graphs, and use some policy to kill transactions that are causing deadlocks. There are various different strategies that can be used to pick which transaction to abort, and these have their trade-offs.
The chapter also covers come more complicated issues, such as the "phantom problem" - what happens if a tuple is inserted into a table as the table is being queried? Running the query once, and them once again immediately after, will yield different results. This has led DBMS developers to use techniques such as locking the index, or perhaps the whole table, to prevent modification the tuples the transaction is using. The chapter discusses in detail the complications of grabbing locks at different granularity - while it can improve performance, it can further complicate the code, and lead to potential bottlenecks. For example, the cost of obtaining an exclusive lock on the root table of a B+ tree is very high.
The chapter discusses some other options that are alternatives to lock based concurrency control - Time stamp based, and optimistic concurrency control. At the very end of the chapter, they mention the dominant method used in practice - MVCC, multi-version concurrency control. Each user sees a snapshot of the database at the time of their query - that is, reads are never blocked. The transaction may have to role back, but it will never block waiting to read data. Since deadlocks are rare, this works well in practice. The only negative thing I can say about this chapter is I wish it had spent more time on this protocol - it's what many databases use in practice, and is more powerful than many of the other methods discussed here.
The paper is the continue of last paper, which is the chapter 16 of a textbook provides an overview of concurrent transactions. The paper is proposed because more detailed information about concurrency control should be delivered. The concurrency is controlled by locks in database. So the paper discusses the locking protocols, lock conversions, deadlock handling and shows three locking protocols.|
In the beginning the paper shows some definition and basic idea. 2PL guarantees the serializability and recoverability of schedules. Conflict equivalent means two schedules involve the same set of actions of the same transactions and they order every pair of conflicting actions of two committed transactions at the same way. The conflict serializable schedule is also serializable. The strict 2PL protocol makes sure the schedules are conflict serializable. And the 2PL protocol has a looser requirement. It only requires that a transaction cannot request additional locks once it releases any lock. So that the number of locks for a transaction keeps decreasing after it reaches a peak. View serializability is a more general condition for a serializable schedule. But it is not that valuable in practice use.
When talks about locking protocols, we know that locks is handled by the lock manager who mains a lock table. When a transaction aborts or commits, the lock manager releases all the transaction’s locks. The lock request must be an atomic operation to prevent conflicting locks. The idea of lock conversion is that when a SQL update statement is processed, The transaction obtains exclusive locks initially and downgrading to a shared lock instead of upgrading to an exclusive lock. This approach reduce the deadlocks.
Deadlocks do not happen frequently. The general approach to solve it is detecting deadlocks by timeout. While the prevention mechanism can work better. The deadlock is prevented by assigning every transaction a rank and making a rule that the lower ranking ones can not wait for her ranking ones.
Then the paper talks about three different locking protocols. One is the predict locking. Its drawback is that it is hard to implement. Another is placing locks in a binary tree. If a transaction needs to access a leaf, it should acquires all the locks in the path from root to leaf. It is good algorithm, but is no commonly used now. The other is the multiple-granularity locking. It must be used with 2PL to ensure the serializability.
And the paper also provides some control methods without the help of locking protocols. One is the optimistic concurrency control. It stills has overhead and high cost, which makes it not very attractive. Another is timestamp-based concurrency control. In this method, the actions are given a timestamp instead of a lock. The other is the multi version concurrency control. It is also a method to use timestamp. It creates multi version of object with different timestamps to cut down the waiting time. It is not practical because it costs a lot to manage multiple versions, and the read operations are not blocked in the method.
The strength of the paper is that it is very concentrated on the topic of concurrency control. It gathers very comprehensive and detailed knowledge about this topic. Without this paper, it hard for the readers to know the topic thoroughly.
The biggest drawback of this paper is that its type setting is so terrible. A bunch of words are not typed correctly. It takes time to figure out what the word means sometimes. That makes me feel that the resource of the paper is not rigorous.
In total, this paper is very good. I learned a lot about concurrency control, especially for the control without locking. It is hard to learn by myself. The knowledge is deep and professional.
This paper introduces the various aspects of concurrency control in DBMS; how the concurrency is managed, various locking techniques, alternative management schemes, etc. Concurrency control is important in any parallel computation, especially in large databases where efficient, low overhead parallelism is very important in the overall performance of the system. |
The locking protocols in DBMS guarantee serializability and recoverability in the system. The Two-Phase Locking protocol (2PL), is a protocol where a schedule is conflict serializable if its precedence graph is acyclic, and transaction cannot request additional locks once it releases any lock. To manage the various locks and transactions, DBMS keeps a transaction table to contain pointers to a list of locks held by each transaction. Locks are used to maintain concurrency where necessary, and deadlock is prevented by giving each transaction a priority and using the priority to ensure that certain transactions will be favored over others. Other than standard locking, there is optimistic concurrency control that tries to maintain concurrency without locks by performing the operation first, then validating it later to check for any conflicts before committing. While it removes the locking overhead, the overhead is replaced by the read and write lists that need to be maintained for validation.
The paper provide good insights on the different ways concurrency can be managed, what properties need to be maintained for correct operation of the system. However, it does not give us much quantitative data on the performance implication of these various protocols, which can be very important in choosing one scheme to another.
This chapter discusses some issues of the lock management, like implementation, deadlock, multi-granularity locking and introduces several kind of concurrency control without locking. The summary follows:|
1.2PL guarantees conflict serializable schedules as strict 2PL, but 2PL is not recoverable.
2.Locking in strict 2PL protocol is implemented similar to reader-writer lock with FIFO order to ensure that no transaction starves. Lock and unlock operation itself should be atomic and this is provided by semaphore and latches. In contrast to reader-writer lock, the strict 2PL locking is implemented on user process and blocked transaction may have a dangled I/O process. When process switching between those dangled processes take much time , a convoy occurs.
3.Lock conversions sometimes are needed in performance concern. However, lock upgrade may cause a deadlock and lock downgrade can still be 2 PL only if no modification on the object locked. To enlarge the concurrency, update lock is introduced, it is similar to downgrade lock but it is compatible with shared lock at first place, which also prevents read conflicts.
4.Two main approaches to deal with dead lock: checking or prevention.a) In practice, the DBMS periodically check for deadlocks by maintaining a waits-for graph since deadlocks are commonly rare and deadlock cycle in the graph is small. The alternative is a timeout mechanism. For prevention, b) One way to prevent deadlock is assign priority to transaction(maybe by timestamp) and low-priority transaction cannot wait for higher-priority transaction(wait-die or wound-die). The alternative way is conservative 2PL. Under conservative 2PL, a transaction acquire all the locks it may need to use at once or acquire nothing and wait. It brings a huge overhead and never used in practical.
5.Multi-granularity locking is introduced. The multi-granularity locking introduces IS and IX locks. All ancestors gets a IS(IX) lock when a node acquire S(X) lock. This strategy make the object to lock as small as possible and reduces locking overhead considerably.
6.This chapter introduces several concurrency control without locking:
a) Optimistic concurrency control. Transaction proceeds in thread phases, read, validation ,write. It may come with considerable overhead in validation.
b) Timestamp-Baserd concurrency control. Each transaction is assigned a timestamp when it starts. For each object, it will keep updating a RTS and WTS, which stores the maximum timestamp among the transactions that read / write the object. It compares the timestamp in each operation to check the serializability . The comparison is simple but it has too much overhead in updating the RTS and WTS.
c) multiversion concurrency control. This approach never block read. This approach maintains several versions of each database object, with a write timestamp, which may result in considerable overhead.
In all, the strategy to be chosen is highly dependent to the workload.
This chapter goes more in depth on lock management issues in a DBMS. This is important because locks are used in 2-Phase Locking, which is a very common concurrency scheme for assuring the ACID properties for transactions in a database. Then the chapter talks about other concurrency schemes that do not involve locks.
Database locks are managed similarly to locks in operating systems. Transactions can acquire reader or writer locks, and are queued if they cannot immediately acquire the lock they want. Sometimes transactions will also want to upgrade their reader lock to a writer lock, which is vulnerable to deadlock. A solution to this is to have transactions grab writer locks first, and then downgrade to reader locks if writing is unnecessary.
Hierarchical structures like indexes can be traversed using hand-over-hand locking, where locks are only held if modifications need to be made. The organization of data inside the DBMS can also be seen as a hierarchical structure, where files contain pages, and pages contain records. Transactions can then choose to lock files, pages, or records depending on the granularity of data it wishes to access.
Deadlocks can be dealt with by assigning priorities to each transactions, and then enforcing that low priority transactions never wait on high priority ones, or vice versa- this is done by aborting transactions that violate these constraints. Both schemes are safe from deadlock and starvation, but both waste work when transactions are aborted.
There are some schemes that do not use locks to enforce serializability. These schemes generally involve some way of knowing if a transaction has or will conflict with another transaction, and restarting one of the transactions. This can be done by keeping track of what data items each transaction has read or written (Optimistic Concurrency Control, Improved Conflict Resolution), or assigning timestamps which act like priorities for each transaction, or by keeping multiple copies of data items around to keep transactions isolated.
This was from a textbook, so it is easy to read and provides helpful examples.
The chapter covered a broad range of techniques, but didn’t have as much depth about comparing the benefits and downsides to each. The chapter was a survey that jumped around different topics, instead of the logical progression of topics that most papers present.
Continuing from the previous paper, this paper is again a chapter from a database textbook, “Database Management Systems”. The paper discusses concurrency control in a DBMS. The topics include: how it implements locking protocol, lock-related issues such as lock conversion and deadlocks, and different types of locking protocols along with other alternative approaches to locking protocols.|
The chapter begins with explaining the notion of conflict equivalent and conflict serializable. I was slightly confused by the description of the two terms in the paper, but in other words, if a schedule S can be transformed into a schedule S` by swapping any of non-conflicting actions, then we say S and S` are conflict equivalent. Also, a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. I personally think that the paper is inadequate in explaining the significance of these two terms. For me, it was difficult to the significance of these two notions when discussing other locking protocols and techniques mentioned in the later subchapters.
Nonetheless, the paper continues with the topics that are related with lock management. In a nutshell, a DBMS has a component called “lock manager”, which handles all lock requests. To do so, it maintains a lock table and a transaction table, which keeps information on who is acquiring what types of a lock on which of the objects in the DBMS. The paper also discusses the issues with lock conversions, where a transaction requests to upgrade/downgrade its lock, and handling deadlocks with both detection-based and prevention-based schemes.
The paper then proceeds to talk about locking protocol itself and also other concurrency control methods that are not lock-based. All of these different concurrency control protocols have trade-offs between the performance and the degree of serializability that the protocol ensures. It is interesting to see there are many concurrency control approaches that do not involve locking, even though locking is still the norm in concurrent control in most database systems.
The paper provides a good overview of concurrency control in a DBMS. It might seem that it has too much emphasis on locking, but it is understandable, considering most commercial database systems do utilize locking for their concurrency control. However, we need to be aware that the discussion of concurrency control goes beyond the paper with numerous non-traditional, non-transactional database that exist today.
Chapter 17 of Database management systems discussed how database management systems supported concurrency control. It mainly focuses on locking, which is a pessimistic approach to concurrency. The chapter discusses basic locking and lock management within the DBMS. It also introduces some more advanced lock types and techniques such as intention and index locking as solutions to the hierarchical nature of databases and the phantom problem.|
After covering locks, the chapter finishes with some more optimistic concurrency techniques, namely optimistic concurrency control and timestamp-based concurrency control. Both techniques label database transactions with timestamps in order to establish a logical ordering of transactions for serialization.
The chapter, although giving a basic over view of concurrency control, could have been written a lot better. I was confused with the examples presented, especially regarding timestamp based concurrency control and the Thomas Write rule. I felt the example referenced in the book did not actually apply to the Thomas Write Rule.
I also disliked the fact the chapter did not give me practical examples of systems, which were using the concurrency methods discussed in the book. For example, the chapter discusses timestamp-based concurrency control and then simply states that it is used in distributed databases and refers to chapter 22. Some questions I am left with is why it is more applicable to distributed databases than centralized ones. Furthermore, chapter 22 doesn't seem to talk about timestamps. Instead, it refers to locking methods to enforce conccurency. Finally, I think I would have like a brief summary of locking and the optimistic concurrency techniques and their pros and cons at the end of the chapter. I think it would have wrapped the chapter up nicely and also reinforced the ideas in the chapter.
This chapter follows the previous chapter and focuses on the concurrency control for transactions. It is already mentioned in the previous chapter that the interleaving of actions in multiple transactions is inevitable and that it is critical for the DBMS to provide safe scheduling to preserve the ACID properties. This is why concurrency control is so important.|
The main ideas presented in this chapter are:
1. How locking protocols guarantee serializability and recoverability?
- Precedence graph (Serializability graph)
- Both non-strict 2PL and strict 2PL guarantee that there is no cycle in the precedence graph; therefore the schedules must be conflict serializable (thus "serializable").
- Strict 2PL improves on 2PL by guaranteeing that every allowed schedule is "recoverable".
3. How to perform deadlock detection, resolution, and prevention?
- detection: The lock manager maintains a "waits-for graph"; it is periodically checked for cycles, which indicate deadlock.
- Resolution: Abort transactions that result in a deadlock.
- Prevention: Wait-die and wound-wait
4. Other non-lock-based concurrency control methods.
- Optimistic Concurrency Control
- Timestamp-Based Concurrency Control
This chapter introduces many different approaches to provide concurrency control and gives several transaction/action orderings as examples, which is very clear and helpful for the readers to understand.
It would better if the author can also provide some information about which of these concurrency control techniques are actually employed in which DBMS.
This chapter in Database Management Systems discusses concurrency control. The motivation for including this topic in the text is because concurrency is vital for performance and reliability. The text approaches the topic by discussing lock management, lock conversion, deadlocks, specialized locking techniques, and concurrency control without locking.
2PL locks, for which a transaction cannot request additional locks once it releases any lock, are less strict than Strict 2PL locks, for which the precedence graph for any schedule that it allows is not cyclic. This is important because Strict 2PL doesn't allow other transactions to modify an object until the transaction currently working the with object is complete; this may block other transactions for longer than necessary. Locks are kept track by the lock manager, which keeps a lock table (a hash table with key of the data object identifier) and a transaction table (which maintains a descriptive entry for each transaction). A latch are short-duration and ensure that the physical read or write operation is atomic before reading or writing a page. Lock conversions occur when a transaction needs to upgrade to write privileges, or when the transaction no longer needs exclusive write abilities and can downgrade to a read lock. Deadlocking occurs when two transactions wait for a resource that the other holds, and can be avoided with wait-die (a transaction is only allowed to wait if higher in priority, and dies if lower in priority) or wound-wait (a transaction with higher priority can abort another transaction, and waits if it is lower priority). Other techniques for locking include concurrency control with B+ Trees, in which the highest levels of the tree performs only direct searches and a node must be locked on inserts only if the split has the possibility of propagating up from the modified leaf, and multiple-granularity locking, in which we set locks on objects that contain other objects. Concurrency can be managed without locking if we are optimistic about conflicts in transactions and pessimistic about allowing transactions to execute. To do this, we read the transaction, validate whether there could be any possible conflicts with the concurrent transactions, and writes the changes to the database if there is no possibility for conflicts. The time saved from removing blocking in the lock approach is replaced by the work wasted from restarting transactions.
Some limitations of this chapter are that I would've liked to see more examples. For instance, when discussing 2PL vs Strict 2PL, it would've been beneficial to have an visual example for how a 2PL system would behave versus a Strict 2PL implementation. It would have also been helpful to see a visual example for how latches work.
This is the 17th chapter of Ramakrishnan and Ghrke's database textbook. This chapter explores concurrency control in more detail. It talks about locking protocols and issues with locks and deadlocking. Finally, it concludes with a discussion of locking alternatives.|
They discuss many important points about deadlocking. A schedule is conflict serializable iff its precedence graph is acyclic and this will occur if the DBMS follows strict two phase locking. If you relax the strict rule and allow it to release locks as it uses them but not acquire more until the transaction completes, it will still form an acyclic graph.
The DBMS keeps a table of locks issued to transactions. This lock manager has rules for issuing locks. It implements a queue for transactions for locks so that transactions don't starve. The lock manager has to implement atomic actions! The OS can cause the same problems that the DBMS causes. Lock conversions can occur and can cause deadlocks. They should be avoided by initially requesting exclusive locks and eventually downgrading to shared locks.
Deadlocks can be detected with a waits-for graph or a timeout mechanism. It is better to just prevent them though. They can be prevented often by implementing priority for older transactions. The paper then goes on to discuss B+ trees, phantoms, optimistic concurrency control and timestamp-based control.
I thought the discussion of dynamic databases and the phantom problem was excellent and described problems I didn't understand before. The discussion of how locking pages doesn't lock a whole table that you care about and can create phantoms was clearly written, as was the two potential solutions. The concurrency with B+ trees is really cool to look at and understand the performance improvement over naive 2PL. This chapter effectively covers a broad range of important issues in some detail.
There were a few parts of the paper that were confusing. It was good sometimes that it went into depth of a wide range of topics but there were points where it felt confusing to discuss in such little detail. In particular I did not like the discussions of view serializability, update locks, or conservative 2PL. The go on to say that conservative 2PL is not practical. I don't think it needs to be in the chapter. The chapter also at one point tells the author to go back to 16.3.3 and figure out a problem to which the author does not provide the answer. An example would have been more helpful on page 552.
Part 1: Overview|
This chapter discusses methods of concurrency control. Concurrency control is crucial in such as operating system, database system or any system that allows concurrent programs (transactions). The strict 2PL locking scheme ensures serializability as well as recoverability as it prevent conflicts by not allow the conflicting transactions to be interleaved. To be specific, in 2PL, a transaction cannot require any other lock after releasing any lock so that in the precedence graph conflict orders are retained. View serializability is also mentioned as the necessary condition for serializability.
Lock and unlock requests are implemented in the way that is pretty much alike to operating systems. There are shared locks and exclusive locks that can be immediately granted on request. As in operating systems field, reader writer locks are often used for scenarios where there are two types of user programs while one of them can be concurrently executed. Upgrade operations are also provided for reader writer locks that transactions can request an upgrade operation on their currently holding reader locks. Deadlock cycles are detected by the database system in practice, however when the contention level becomes high, deadlock prevention is needed. Priorities are assigned to transactions and wait-die or wound-wait policies are taken into use. Locking scheme specialized for dealing with phantom problem as well as concurrency control in B+ tree structure. As for nested objects, multiple-granularity locking scheme is introduced. Users can lock pages instead of tables in Microsoft SQL Server.
In the situations where there is light contention between transactions, locks may not be a good solution as the overhead brought by locking schemes may undergo the goodness of concurrency control. The basic idea of an optimistic concurrency control is that the database system would just let the transactions go without requesting locks and restart if conflicts occur. This mechanism may be improved by avoiding unnecessary resarts. Timestamp based concurrency control and multiversion concurrency control are also introduced that timestamp ordering is used for transaction execution as well as validation checks.
Part 2: Contributions
A variety of locking schemes are discussed in the first part of this chapter, including the reader writer locks etc, which gives a complete overview of different methods suitable for different situations. The discussion implies an important idea that a good mechanism must be associate with its best environment settings. That is, on one hand, for heavy contention databases, locks and deadlock prevention are crucial for concurrency control. While on the other hand, when the contention level is low, we could reduce the overhead by getting rid of locks.
Different lock free concurrency control mechanisms are discussed, which compliments the scenarios where contention level is low. Notice that in the timestamp based concurrency control section, The Thomas Write Rule is included as a theorem proof for the discussion, which makes the plain introduction much more convincing.
Part 3: Possible drawbacks
The scan of the book is really bad! Typos are almost everywhere!
Although the atomicity of locking and unlocking is mentioned, the book does not introduce the hardware implementations such as atomic test-and-set bits, which can guard the atomicity of locks. This should be mentioned as some background information for some curious readers.
Conversions between reader locks and writer locks are not so clearly discussed, even no reference literature is provided.
The paper, Chapter 17 of the “Database Management Systems by Ramakrishnan and Gehrke” book, provides a detailed description of various kind of concurrency control techniques. Concurrency control can be achieved using locking or without locking scheme. A locking protocol is required to guarantee serializability. Whether a particular schedule is serializable can be determined by checking if it is conflict serializable. Conflict serializability can be checked by drawing a serializability graph which captures all potential conflicts between transactions. Two actions conflict if they operate on the same data object and at least one of them is a write. If there is no cycle in the graph then it means the schedule is conflict serializable. For example, the Strict 2PL protocol allows only conflict serializable schedules. In addition, the paper discusses about lock conversion. For example, obtaining exclusive locks even for read access and downgrading to a shared lock once it is clear that this is sufficient, improves throughput by reducing deadlock. In order to handle deadlock, the paper suggests to construct a wait-for graph which maintains lock access dependency. The DBMA periodically check for a cycle on the wait-for graph to detect a deadlock. A deadlock is resolved by aborting one of the transaction in the cycle and releasing its locks which allows the other transaction to proceed. |
If the database table is not fixed, but can grow and shrink during execution of multiple transactions, then there will be phantom problem. This problem happens because while transaction 1 accesses a set of rows in a table, transaction 2 might add new row to the same table without needing the lock hold by transaction 1. When transaction 1 again reads the rows , it will get new added row which causes inconsistency. One solution to this problem is index locking which involves locking the index pages and effectively preventing any modification by a new transaction until the corresponding lock is released. The paper discussed as well concurrency control in B+ tree based index structure. An important observation mentioned is that a transaction doesn’t need to lock all the nodes in the B+ tree hierarchy. For example, a lock on a node can be released as soon as a lock on a child node is obtained, because the leaf nodes, not the parent, are the one which holds the real data. Another specialized locking strategy, called multiple-granularity locking, allows to efficiently set locks on objects that contain other objects. For example, if a transaction require access to only few records of a page, it should lock just those records than the entire page. Whereas if it requires a lot of records , locking the entire page could be better.
For a system with light contention for database objects, following locking protocol might affect performance considerably. In this case, an optimistic concurrency control can lead to better performance. In this approach, transactions continue executing without requiring locks but writing their intermediate result into their private workspaces. Once when a transaction finished its execution will be validated and if it doesn’t cause any conflicts , it will be committed by writing the data objects from its private workspace to the database. Otherwise it will be aborted. The paper describe other lockless concurrency protocols such as timestamp based concurrency control which involves assigning a timestamp to each transaction and use this time to detect violation of ordering during execution.
The main strength of the paper is its extensive coverage of different approaches in concurrency control. It discusses both lock based and lockless concurrency control mechanism. In addition, it identifies generalized lock based concurrency control technique such as Strict 2PL and specialized locking techniques including those applicable to B+ tree based index structure. Furthermore, it discusses deadlock detection and recovering mechanism.
The main drawback of the paper is the practical implication and implementation of different techniques are not thoroughly covered. I think it could have been better if most techniques are discussed based on a particular SQL implementation to get more practical insight. In addition, a more detailed explanation of as which techniques are implemented in which DBMS would have been more useful. i don’t think the brief ,single paragraph, explanation at the end of the paper about “what real DBMS uses” is enough.
In this chapter, an overview of transaction management is provided. The concept, transaction, is critical in concurrent execution and system recovery in a database management system. During concurrent transactions, the state of the database is required to be valid all the time. Being valid means the state is reachable through a serial list of actions. The properties of database transactions are concluded as ACID or atomic, consistency, isolation and durability. |
It is surprising, but the chapter says that the users are responsible for ensuring transaction consistency. The DBMS is not able to detect inconsistencies due to logic errors in user’s program. For example, interaction between transactions are assumed not to happen, but database cannot make sure of that, users should. Database consistency follows from transaction atomicity, isolation, and transaction consistency.
Then the paper covers the topic about schedule, the list of action, and lock, which are topics raised by concurrent execution. Lock is used to protect shared resources. There are three kinds of anomalies to be solve with different kinds of locks, and they are write-read, read-write, and write-write conflicts. Stick two phase locking is then introduced. It is safe but at the cost of performance. Also, by using locks, we need to face the problem of deadlocks. We should decide what to lock and what kind of lock (different isolation levels) shall we choose depending on the transactions we are performing.
As you might have noticed, it is a fairly old textbook. A lot of problems to be solved here are shared in operating system and parallel computing. The chapter is easy to understand and has a good flow with plenty of examples and explanations.
This chapter discusses all problems that happen during concurrent transaction. This chapter provides details for some concurrency topics in chapter 16.|
This chapter starts with how locking protocol guarantee properties of schedules. Then lock management is of DBMS is discussed. The lock manager maintains a lock table and responds to Lock and Unlock requests. Atomicity of locking and unlocking is guaranteed by operating system.
Naturally, deadlock might happen once locks are introduced. In Chapter 16, only lock expiration is mentioned to resolve deadlock. And here methods for preventing and resolving deadlocks are discussed. The DBMS will check periodically to resolve deadlocks. Policies such as wait-die and Would-wait would help lock manager to prevent deadlock from happening.
If we do not treat DB as a fixed collection of independent data objects, then some consequences appear. The first one is phantom problem. The second different is that we can use B+ structure to reduce locking overhead. And we can use multiple-granularity to further improve performance.
Finally, this chapter covers concurrency control methods without using locks. This includes: 1) Optimistic Concurrency Control, 2) Timestamp-Based Concurrency Control and 3) Multi-version Concurrency Control.
This chapter provides an overview of techniques used in concurrency control, and it mentioned a lot of special techniques that will be useful in improving concurrency performance.
What I like about this paper is that it actually writes many concurrency control methods that do not use a lock. It even discussed the good and bad part of such approaches.
It will be better if this chapter can give a measurement on performance improvement when talking about specialized locking techniques. This will give a sense of how much improvement we gain if we implement these techniques.
Chapter 17 talks more concept and implementation details in terms of the concurrency control in DBMS. And it mainly talks about two categorizes of concurrency management ideas, the locking based method and analytical method.|
In the first part, this chapter introduces the idea of strict 2PL and the lock management in DBMS. Precedence map is used to express the conflict relationship between concurrent transactions. A schedule is conflict serializable if and only if it has an acyclic precedence map. And strict 2PL only allows schedule with acyclic maps, i.e. a transaction cannot request additional locks once it releases any locks. But one major drawback is that, the existence of deadlock can slow down the 2PL schedule.
In the DBMS, the lock manager is the part that deals with the locking issue to transactions. It uses lock table entries and the lock request queues to support the function of shared lock and exclusive lock. But usually, the system needs to upgrade lock level from shared to exclusive. Some lock manager uses the downgrading method to support this need, while others provide the new lock, upgrade-lock. The lock manager keeps a waits-for map to monitor the locking dependence between transactions and detect deadlocks. Once the deadlock is found, most lock manager deal with them by wait-die or wound-die method. In both of them, the higher priority transactions are given higher execution order.
If not treating the DB as a fixed collection of data objects, the phantom problem would possibly arise. So for B-tree structures, the method of treating each node as a lock object is preferred. Instead of just use fine-grained locking scheme, if a child node is full during lock process, its parent should also be locked just in case of the upwards splitting propagation. Another specialized locking strategy called multi-granularity locking that could efficiently set locks on objects that contains subobjects is also mentioned in this chapter. And it maintains a path locking scheme to support its feature.
The last topic is the no locking concurrency control method. In this chapter, both the optimistic control method and the timestamp based control scheme are assuming that the working environment are consists of high frequency short transactions with little locking contention. And they both view the locking overhead as the primary inefficiency source. Similar to multiversion concurrency control, the timestamp analysis is used to prevent RW, WR and WW conflict and abortion-restart is the main method used to handle conflicts.
In this chapter, the book talks about the concurrency control in more detail. It talks about the locking strategy, how the locking is implemented and strategies to prevent deadlock. Then, talks about three specialized locking protocols. And in the last part talks about some concurrency control method with not using locking.|
In 17.1, the book first introduce concept of conflict serializable, which involve same actions and same order of every pair of conflicting actions with a serial schedule. Conflict serializable is always serializable but serializable is not always conflict serializable. We can use the procedure graph to tell if a schedule is conflict serializable .
Then it talks about the 2PL that the only difference from strict 2PL is 2PL can release locks before the end but once release, cannot obtain new lock. The strict 2PL can avoid cascade abort.
In 17.2, it talks about the locking management. Instead of manipulate lock queue inside the lock object in OS, it locking manager will have a lock table and transaction table. For the locking in mutex like read/write lock, there exists starving, but for locking introduced by the book will not starving exists. Latches will protect physical read and write, and convey is the drawback of building DBMS on top of OS.
In 17.3, it first talks about a situation which read locks may need upgrade but can lead to deadlock. One way to solve this problem is obtain the exclusive lock initially and downgrade to shared lock once find shared is enough. Another method is to initially obtain a new kind of lock - upgrade lock(compatible with shared locks but not other update and exclusive locks).
In 17.4, talks about using graphic to check the lock. Two deadlock prevention strategy: wait-die and wound-wait. Another is conservative 2PL: atomically grab all locks needed.
In 17.5, it talks about some more locking technique. To deal with the phantom problem, the book introduce index locking. If there are indexes on predicate column, it will lock the index page, thus there will no new record match the predicate inserted during this time. Then talks about how to go down B+ tree and have better concurrency performance. Later, it introduces multiple-granularity locking, the purpose of this is to grab the lock for objects as small as possible to provide better concurrency.
At last, it talks about some concurrency control strategy without locking. Optimistic concurrency control will write to private workspace and check conflicts with other transaction before loading into database. Main idea of multi version concurrency control is to maintain several versions of each database object and using timestamp.
The first several part of chapter talks about the concurrency control problems using problem-solution-limit orders, and uses example and graphs to show the idea which is easy to understand.
1. Too many typo errors in pdf and influence the understandability.
2. In the last two part, the author talks about the problem briefly and thus a little harder to understand.
In the chapter, the author discusses the issue of serializability and ensuring conflict serializability through two phase locking, deadlock resolution, a number of concurrency control mechanisms used in databases, and a number of other issues that come up when dealing with concurrency in a database system. The material was presented clearly with examples demonstrating serializability violations and deadlocks with simple diagrams. The author closed with a quick mention of which mechanisms are used in practice.|
The author discusses that deadlock prevention can be useful in cases where there is high contention for locks but does not elaborate on when (if at all) this occurs in practice. More generally, I felt that more justification of when certain techniques are more useful would be appreciated.
This paper describes in more detail about how databases deal with concurrent transactions. As described in Chapter 16, strict two-phase locking ensures conflict serializability and can be proven using precedence graphs. If a precedence graph does not have any cycles, there are no conflicts and therefore it is conflict serializable. The paper then mentions a variation of strict two-phase locking called two-phase locking that also ensures conflict serializable schedules. The only difference between the two methods is that in two-phase locking, the transaction is allowed to release locks in the middle of the transaction.|
For both of these methods, the implementation of lock and unlock requests is more complicated than one would think. We need to make sure that lock requests are not starved and that the locks themselves are atomic. Since acquiring a lock involves a multi-step process, the implementation requires the use of operating system synchronization mechanisms.
Next, the paper covers how databases deal with deadlocks. There are two ways to solve this problem: deadlock resolution and deadlock prevention. Deadlock resolution is generally easier to implement and involves the database aborting a certain transaction when it is in a deadlock cycle. Deadlock prevention involves the database giving transactions priorities and grants locks based on the transaction’s priority.
Finally, the paper covers specialized locking mechanisms and concurrency control without database locking. There are three special locking scenarios that we need to consider: locking sets that share some predicate, locking nodes in B+ trees, and locking sets of objects that contain other objects. The three non-locking concurrency controls are as follows: optimistic concurrency control, which does a read, validate, write for each transaction; timestamp based concurrency control, which does validation in the order of the timestamps; and multiversion concurrency control, which maintains multiple versions of a database object.
Overall, the paper does a great job of describing the specifics of how databases deal with concurrent transactions. It gives examples of each problem and also provides alternative solutions that are not necessarily used in mainstream database management systems. However, there are still a few weaknesses about the paper:
1. For each of the non-locking solutions, the performance should be better than locking solutions under specific circumstances. I would have liked to see real world examples with both systems and when one would outperform the other.
2. The paper presents a problem with locks called convoys, where a transaction with a heavily used lock is suspended, creating a queue called a convoy. However, it does not explain how convoys can be avoided when a DBMS is implemented on a general-purpose operating system. Many DBMS, including MySQL are implemented on general-purpose operating systems, so how do they deal with convoys?
This chapter discusses several forms of concurrency control and how they can be used to ensure serializability and recoverability in transactions. There are many ways to ensure that data remains consistent while still allowing for as much concurrency as possible in a DBMS. Schemes such as lock-based, timestamp-based, and optimistic concurrency control all provide different benefits and drawbacks that make them more suited for certain scenarios.|
A locking scheme of some sort is required in order to preserve consistency and atomicity in transaction execution. Strict 2-phase locking guarantees serializability and recoverability by releasing acquired locks only once a transaction commits. A relaxation of this scheme known as 2-phase locking eases this restriction, specifying only that, once a transaction has released a lock, it cannot acquire any other locks. This allows for higher concurrency, but also removes the guarantee of recoverability. All DBMSs must be careful to ensure that deadlocks do not cause transactions to block indefinitely. The two main approaches are deadlock detection and deadlock prevention. In deadlock detection, the DBMS will detect deadlocked transactions (generally by means of a timeout or by checking for cycles in a “waits-for graph”), abort one of the competing transactions and restart it. In deadlock prevention, the DBMS detects when a transaction requests a lock that conflicts with another action. Then, depending on whether the wait-die or wound-wait policy is implemented, the DBMS will either abort one of the transactions or allow it to wait, depending on its priority.
The paper also discusses how locking is handled in structures such as B-Trees and in nested objects like such as pages and files. In each of these cases, we want to lock the structure in a way that allows for the most concurrency. For the example of inserting into a B-Tree, we would prefer not to acquire an exclusive lock on every node from the root to the affected leaf, as this would prevent other transactions from reading or writing any part of the tree until the first transaction completed. To accomplish this, many DBMSs implement a system in which shared locks are obtained on each node as the tree is traversed, and exclusive locks are acquired only for nodes which may be split during the insert.
Other DBMSs implement concurrency control using timestamps. Optimistic control assigns a timestamp to each transaction and does not implement any locking. Transactions undergo a read, validation, and write phase, and are allowed to commit if:
1) T1 completes all three phases before T2 begins
2) T1 completes before T2 starts it write phase, and T1 does not write nay object read by T2
3) T1 completes its read phase before T2 completes its read phase, and T1 does not write any object read or written by T2
My biggest complaint about this chapter is the lack of visual examples. I am a very visual learner, so it was frequently hard to follow the pseudocode-like examples intermingled with the text. The conflict graphs and transaction schedules were very helpful to me, but these were not used for some of the more complicated topics such as Timestamp-based concurrency control, which made it difficult for me to understand at times.
This article describes concurrency control in detail. The topic covers how locking protocols guarantee various important properties of schedules, how locking protocols are implemented, and deadlock. Concurrency control has been an important topic because DBMS handle interleaved actions of transactions for performance. Thus, managing the ACID properties of transactions becomes essential in DBMS, and concurrency control covered in this article provides an approach to deal with it.|
First, the article talks about how locking protocols guarantee various properties of schedules, including serializability and recoverability. It begins by introducing some important definitions. Two actions conflict if they operate on the same data object and at least one of them is a write action. A schedule is conflict serializable if it is conflict equivalent to some serial schedule. In order to capture all potential conflicts between transactions, we can use precedence graph. We can find that strict 2PL protocol allows only conflict serializable schedules. Therefore, the lock-based protocol provides a useful approach for concurrency control.
Second, the article talks more about how locks is implemented and deadlock issues. There are two kinds of locks: shared lock and exclusive lock. Shared locks are used for reading data, while exclusive locks are used for writing data. In DBMS, there is a lock manager that manages locks. If a requested lock cannot be granted immediately, the lock request is added to the queue of lock request for the object. Deadlock may occur when two transactions are waiting for each other to get the other’s lock while holding locks. Wait-die and wound-die are two policies that can prevent deadlock. Thus, the second part of the article describes locks in more details since lock is one of the most important elements in concurrency control.
To sum up, this article covers how locking protocols guarantee properties of schedules, how locking protocols are implemented, and deadlock issues. Concurrency control is usually a topic that is difficult for readers to understand. This article provides many examples to illustrate these ideas, which is good for reader to understand concurrency control.
This “paper” is an overview on concurrency control, what the issues are, and how they are solved. It focuses primarily on locking techniques and implementations of locks, and it is from a textbook so it is just used for teaching, not introducing any new ideas. |
The paper starts by recapping Strict 2PL from the previous paper and discusses ways to visualize conflicts better. It introduces precedence graphs, which are graphs where each node is a transaction, and there is an arrow from node A to node B if node B is dependent on something from node A happening first. You can use these graphs to determine if a schedule is serializable, which will be the case if the graph is acyclic.
It then discusses how locks are implemented and describes how there is a lock manager that has a table of who has what locks, and in order to change this you must be given the lock to the lock manager. This is something that was likely discussed in undergrad operating systems courses in depth and it is no different for DBMS’s. Things that must be considered (that are also considered in OS implementations) include upgrading locks and proper ways to deal with deadlock. This paper gives a brief overview of those and explains how they are handled.
The two most popular deadlock management techniques are wait-die, and wound-wait. For a summary on both of them if transaction A requests a lock and transaction B holds the conflicting lock: wait-die will allow transaction A to wait if it has higher priority, otherwise it is aborted; wound-wait will abort transaction B if transaction A has higher priority, otherwise transaction A waits.
It then gives an overview of the phantom problem, which occurs when a transaction updates a page that isn’t locked and could potentially make another query return the wrong values, and how locking is implemented in B+ trees. It then gives an overview of multiple granularity locking which introduces two new locks (intention shared and intention exclusive) to allow for more parallelism while still providing maximum protection.
Lastly, this paper talks about techniques to have concurrency without locking. Locking mechanisms have relatively costly overhead and sometimes if there aren’t many conflicts it might be a good idea to remove the overhead from locking and not have locks. One of them is optimistic concurrency control, which has three phases: read, validation, and write. Read is where each transaction reads values from database and does whatever it wants with them. Validation is when a transaction wants to commit it checks if any other transactions could have made changes to it’s data. If another transaction could have made chances the transaction in the validation phase fails and is aborted and retries again. If the validation phase passes the write phase commences and does exactly what you would think. If there isn’t much contention for resources and validation mostly passes this is a good setup, but if there is contention and validation fails and entire transactions must be redone often this is not a good setup. Other techniques are timestamp-based concurrency control and multiversion concurrency control.
In general this was a good paper and it did a good job providing introduction and overview to concurrency control. I think it went decently in depth which was helpful for understanding how concurrency is actually implemented.
Similar to the last paper I can’t really think of a downside to this paper as it is just meant to be an overview and introduction to concurrency… I guess it doesn’t touch on current issues of concurrency, but it’s just an overview and introduction so I don’t blame it. In general I’d say this was a strong paper without a major downside.
This “paper” follows up the previous chapter on transaction management by giving a more in-depth description of the specific problem of concurrency control. In order to manage multiple operations on the same data, there are a variety of fundamental concepts that must be addressed with any DBMS, and, additionally, there are a multitude of options for handling concurrency control with respect to application-specific requirements.|
The many different kinds of concurrency control protocols, divided into lock-based and timestamp protocols, were fascinating to read about, as they each address different concurrency problems (lock-based protocols manage order between pairs of transactions, and timestamp protocols generally allow for immediate transaction execution). The manner in which topics were discussed remind me of many OS-based problems, such as how race conditions are handled or how certain commands must be held or operated separately in order to avoid conflicts with other commands operating on the same data.
I was still unclear about the mechanism for multiversion concurrency control, and would have liked to see examples fleshing out the details a bit more thoroughly. On the whole, however, most of the paper was clear and easily presented, and builds off of many of the concepts presented in chapter 16.
This chapter covered the basics of concurrency control in DBMS. This entailed detailed discussion of locks (strengths, weaknesses and applications) and various different types of currency control. |
Much of the literature in this chapter was dedicated to various locking implementation details and issues. There are two types of locks discussed, shared and exclusive locks which can be thought of as read locks and write locks respectively. An effective locking scheme discussed is the two-phase locking scheme which can be defined by growing and shrinking phases. The basic rule is that once a transaction releases a lock, it cannot acquire any additional locks. Next, the issue of deadlocks is explained at length. A dead lock is when two or more transactions are waiting for each other to complete.There are different ways to avoid deadlocks, one simple scheme is to acquire locks in a pre-defined order; or to acquire all locks a the beginning before staring any transaction. It is also possible to detect deadlocks by using a wait-for graph which is commonly used in actual database systems. The issue of lock granularity is also introduced in this chapter. Fine granularity locking which can be thought of as tuples offers high concurrency, but also suffers from high overhead for lock management. Alternatively, coarse grain locking has much less overhead, but suffers from false conflicts.
The other large theme in this chapter was concurrency control. The first technique discussed is validation-based concurrency control. It has three phases, Read, Validate and Write. In the read phase, at transaction read from the database and writes to a non shared area. In the validate stage, the system does a validation check at the commit time. Finally, in the write stage, if no conflicts exist, transactions are completed into the database. Another technique discussed is timestamp based. Every object in the database has a read and write timestamp associated with it. Each transaction also has a timestamp associated with it. So, if a transaction wants to read a given object, it can only do so if its transaction time is greater than the write time associated with the given object. If a transaction wants to write to a given object, it can only do so if its timestamp is less greater than the both the object read and write timestamps.
This chapter elaborates on concurrency control that was mentioned in previous chapter. First, it looks at locking protocols and how they guarantee various important properties of schedule; then it explains how locking protocols are implemented in DBMS. Next, it covers the issue of lock conversions and deadlock handling. Then, it discusses three specialized locking. Lastly, some alternatives for concurrency control other than locking are discussed. |
After beginning the chapter with how locking protocols guarantees serializability and recoverability, this chapter talks about the precedence graph to portray the relation between schedules (which will be useful later in determining deadlock) and the concept of 2PL. As a variant of Strict 2PL protocol, the protocol allows transactions to release locks before commit/abort, therefore ensuring acyclicity and thus allowing (only) conflict serializability schedules. Next, it talks about deadlock: how to recognize it (by precedence graph) and how to prevent it (using Wait-die or Wound-wait policy for conflicting transactions).
Next, the chapter discusses specialized locking techniques: (1) General predicate locking to deal with “phantom problem” commonly found in non-fixed data collections/dynamic databases (but is expensive to implement); (2) Concurrency control in B+ Trees using naive 2PL and/or locking conversions (to improve performance); and (3) Multile-granularity locking that allows users to efficiently set locks on objects that contains other object, which requires that locks must be released in leaf-to-root order.
The last one is the implementation of concurrency control without locking. Here, it covers three alternatives. First, using optimistic concurrency control where the idea is to be as permissive as possible in allowing transaction to execute, but the downside is user cannot know when did the last transaction wrote the object. The second alternative, Timestamp-based concurrency control, seems to want to solve this issue by putting Read Timestamp (RTS) and Write Timestamp (WTS) and validate a transaction by comparing RTS/WTS with the transaction’s timestamp. The last is Multiversion concurrency control, in which several versions of each database objects are maintained and let the transaction read the most recent version of the object. However, there is cost of maintaining versions.
This chapter is important because concurrency control is one of the main concerns of transaction management (such that it deserves its own chapter!). Knowing the aspects of concurrency control is useful because those aspects correlate with each other. By managing one aspect (i.e.: deadlock), a DBA must be careful as to how the decision affects the overall serializability of the transaction. Another important thing to know is the granularity.
All of these concurrency controls that are discussed happen at DBMS level. The chapter does not discuss how this concurrency control interacts in operating system level. I think it should be covered since in the end, concurrency – especially in multiuser application – often deals with the OS first before it comes to the DBMS. Another thing is the multiversion concurrency control, which somewhat reminds me with google file system. I think it is too bad that the chapter does not delve deeper about it.
The purpose of this chapter is to introduce the reader to basic concurrency control in a DBMS. Concurrency control was first touched on in the chapter 16 selection that we also read for class today, but this chapter offers a much more in depth look. |
The chapter walks through lock management and how it is executed in typical database systems. There are, as in the previous chapter, not really any technical contributions as this is a chapter in a textbook and not a publication of novel research. Nonetheless, I think this chapter is stronger than the previous one. I think that it marches through relevant concepts more clearly without being redundant. It introduces the reader to variations on the typical 2 Phase Locking protocol that is often used and presents pros and cons of each extension. The chapter also presents two special-case modifications to concurrency control and ends the chapter by discussing ways in which concurrency control can exist in a system without the use of locks. In each section, downsides to certain assumptions and requirements are discussed with references pointing the reader to the bibliography section in order to read more about state of the art systems.
I think this chapter does a good job of covering not only the basic concepts of concurrency control, but also covering some interesting cases that are applicable in specific real-world instances of databases. I think that the complete presentation of concurrency control, including those methods which do not need to employ locking (and again are really special cases of general databases) presents the reader with an extensive and complete idea of what concurrency control looks like in a variety of kinds of DBMS’s.
As far as weaknesses go, I must again state that it is hard to specify weaknesses in a book chapter rather than a novel research paper with specific procedures and results. I don’t think there’s much to complain about in this chapter, however whatever tool was used to translate this text into a PDF really could use some work. Some sections were more or less unreadable. I know this isn’t a weakness with the content, but as stated above, I found that the content for this structure was complete with reasonable examples and pointers to related and more extensive recent research-based work.
Paper Title: Chapter 17 Concurrent Control|
Reviewer: Ye Liu
This chapter focuses on the discussion of how concurrent control is implemented in a DBMS. Specifically, it discusses about the implementation and protocols of using locks as concurrence control method. It also covers the issue handling in many situations of such implementations.
The chapter starts with explaining in what situation the order of actions may be permuted without changing the outcome and in what situation it will, which is the definition of conflict serializable. It also provides conditions for serializability being true.
The majority of the chapter discusses and describes the implementation of the luck and unlocks requests and locking and unlocking implementations. The implementation details and the working mechanisms of the locks are quite similar to the ones introduced in Operating Systems.
One issue in using lock mechanisms is the occurrence of deadlock, which means no action can take place due to some situation where all waiting actions are waiting to grab locks but no lock will become available in infinite amount of time.
As details of implementations, the chapter covers locking techniques, B trees and multiple-granularity locking. It also mentions advanced techniques such as conflict solving, deadlock prevention, etc.
An interesting find-out from reading these two chapters, 16, and 17, is that this e-textbook is probably converted by some OCR system from a hardcopy. That’s because in many places the character of “m” are misspelt as “rn”. Maybe with the advance recently made in OCR system, redo the electronicalization(yes this is highly possibly a made-up word) of the textbook can improve the quality.
In this chapter the book further introduces the advanced topic of concurrency control, including detailed version of Strict 2PL, deadlock recoverable and prevention, locking on index trees as well as some other concurrency control techniques. The summary of this chapter is as follow:|
1. The book first gives an in-depth introduction of the concepts that was covered in chapter 16, and also introduce precedence graph for detecting seriablity and recoverability.
1) Precedence graph: A precedence graph is a graph that represent each of the committed transaction as a node, and then each directed edge represent the dependency between transactions.
2) View serializable : A schedule is view serializable if it equivalent to a serial schedule
3) Conflict serializable :
a) A schedule is conflict serializable if there is no cycle in the precedence graph of the schedule
b) The difference between a conflict serializable schedule and a view serializable schedule is that a view serializable schedule contains a blind write whereas conflict serializable does not.
4) Strict schedule: : A schedule is a strict schedule if the schedule only reads data from committed objects.
a) 2PL is a less restrict version of strict 2PL. Strict 2PL requires to release all locks untill committed, whereas 2PL requires transaction cannot acquire any lock when it begins to unlock.
b) 2PL guarantees conflict serializable, whereas strict 2PL guarantees strict schedule.
a) Both conflict serializable and strict schedule are recoverable. View serializable is not necessary recoverable if it does not belongs to the previous 2 categories.
b) Strict schedule guarantees avoid cascading abort, which means no other transactions would be aborted if a transaction aborts.
2. Lock Management:
1) Data structures:
a) It maintains 2 hash table, one named lock table to record all data objects, the other is transaction table that holds each (on going) transaction, and each entry of the table holds pointer to lists of locks that the transaction currently holding.
b) It holds a lock table that records the number of transactions currently holding the lock on the object, the lock type and a pointer to the queue of lock request
2) Lock and unlock requests: It then introduces the lock mechanism in DBMS. It is prety much similar to reader/writer lock. However instead of using CV and mutex to implement the lock mechanism, most of DBMS choose to use request queue and semaphore to hold all the locked item for fairness.
3) Latches and Convoys:
a) Latches are used to ensure physical reads and writes operations is atomic.
b) Due to preemptive scheduling mechanism of operating system, a convoy issue may arise. A convoy is that most CPU spends on switching since many transactions are waiting for one single heavy transaction.
3. Lock conversions:
In this section, the book introduces two techniques to solve the issue when a transaction want to upgrade its operation from shared lock to exclusive lock, namely:
1) Lock upgrade:
a) If there is no other locks holding on the object, one can upgrade the lock from reader lock to exclusive lock insert the request in the front of the queue.
b) However this may leads to an issue of dead lock.
2) Lock downgrade:
a) It first acquires all the locks as exclusive lock and then downgrade the lock to shared lock if there is no need.
b) There is no deadlock issue for this mechanism, however it has more overhead.
4. Dealing with deadlocks:
In this section, the book introduces several techniques to detect deadlock and also to prevent deadlock.
1) Deadlock detections: This book introduces 2 deadlocks detection technique:
a) Wait-for graph: some of the database system maintains wait-for graph for the locks it's holding. It then periodically detects the graph and find if there is any cycle in the graph. If there is, it aborts one of the transactions which cause the deadlock and delete the edge of that transaction from the graph.
b) Time out: DBMS can also use timeout mechanism for deadlock detection, if one of the transaction hangs for too long waiting for a lock, we can assume there is a deadlock and then abort that transaction.
2) Deadlock prevention: The deadlock prevention used in this section are mainly breaking one of the 4 deadlock properties: 1) hold and wait 2) limited resource 3) circular wait 4) no preemption. For wait-die and wound-wait it uses breaks the circular wait by assigning orders for the locks by timestamp. for conservative 2PL it breaks the hold and wait property by acquiring all locks at beginning. More detailed version is as follow:
a) wait die: For two transactions that tries to acquiring the conflicting lock for same object, if one transaction has a higher priority can wait, low priority transaction just have to abort.
b) Wound-wait: For two transactions Ti and Tj that tries to acquiring the conflicting lock for same object, if one transaction has higher priority then it just get the lock whereas the other transaction has to either abort or wait depend on its order of acquiring the lock.
c) Wait-die is nonpreemptive so the aborted transaction may aborted several times, however the cons for wound-wait also has the issue that although one transaction has all its lock it might be aborted, which is also a lot of overhead.
d) A conservative 2PL is introduced: It first acquires all the lock it needs and then release them when don't need. This technique has the issues that locks may be held longer and it may acquire more lock than it need, which both are a lot of overhead.
5. Specialized Locking techniques:
In this section, the book introduces the phantom problems as well as the solution to solve it: predicate lock. It then gives a example of predicate lock: index locking on B+ tree. It also gives an overview of multiple-granularity lock.
1) Lock on B+ tree, steps:
a) For each node on the BTree accepts the child node, the transaction acquires a shared lock.
b) Once the transactions acquire the child node, it release the lock for the parent
c) For editing, delete or insertion, it would acquire a lock exclusive lock on the leaf node.
d) For some predicate, if there is no that predicate, the transaction would create a temporary node contains that predicate to prevent phantom problem.
2) Multiple granularity locking:
a) Customized locks:
I) IX locks: Intention exclusive locks. IX are conflict with shared lock and Exclusive locks
II) IS locks: Intention shared locks: ISs are conflict with exclusive locks.
b) Steps of granularity locking:
I) IX(IS respective) locks on all the parent node of the leave locks it wants to edit(Read, respective)
II) X(S, respective) lock on the leaf node.
III) Locks must be released in leaf-to-root order.
c) It must be used with 2PL
6. Concurrency control without locking:
In this section, the book introduces several concurrency control techniques without locking. It introduces the optimistic concurrency control and also the timestamp based concurrency control. Both of these control techniques have their strengths as well as their weaknesses.
1) Optimistic concurrency control: An optimistic concurrency control uses private workspace and validations to prevent conflicts of concurrently running transactions.
I) Read: it uses a private workspace write the data after read from the public workspace.
II) Validation: it uses 3 rules to detect whether there is a conflict between transactions
III) Write: If there is no conflicts, the transactions proceed to write back to database.
b) Validation (edit version with improved conflict resolution section): For transactions Ti and Tj (TS(Ti) < TS(Tj))
I) If Ti completes all 3 phases before Tj begins
II) If Ti completes before Tj starts its write phase
III) If Ti completes its read phase before Tj completes its read phase and Ti does not write any object that is either read or write by Tj.
c) Pros and cons:
I) Optimistic concurrency control is good when there is only few conflicts in the system since locking mechanism are heavy.
II) However if lots of conflicts happens this would cause a lot of overhead of the system.
2) Timestamp-based concurrency control:
1) It maintains two timestamp for every database object: a RTS(Read timestamp) and a WTS(Write timestamp), both are updated with the latest transaction accessed.
2) Each transaction maintains its own timestamp TS. For every write transaction, it checks that if there is:
I) TS(Transaction) < RTS(Object) which implies RW conflict
II) TS(T) < WTS(O) which implies WW conflict
3) For both of the two scenario in 2), the transaction is aborted, and the timestamp needs to updates.
b) Thomas write rule: For transactions that have the TS(T) < WTS(O) conflict, if the conflicting write transactions was committed before the conflicted transaction, then this write can be ignored since there is no side effect of that.
3) multiversion concurrency control: This protocol uses timestamp and also maintain every version of change from the database associated with the write timestamp. Others are pretty similar to Timestamp-based concurrency control. For every TS(Ti) < RTS(O), Ti aborted and updates its timestamp. This technique has all the drawbacks of the concurrency control plus it use a lot more extra storage for every version of the data.
1. This book gives a detailed overview of advanced techniques of the concurrency control system. It covers a lot of different approach of solving concurrency control problems for DBMS, and also list the pros and cons for each of them.
2. This book uses a lot of examples to illustrate the concepts as well as the algorithms, which gives its reader a clear understanding of the information.
1. It would be more interesting if this book can give some example of industrial DBMS who uses the techniques talks about in this chapter.
2. Although this books gives a good review of the techniques as well as the pros and cons of each technique. It would be more clear and convincing if it can gives some quantitative experimental result of these techniques.
This articles talks about concurrency control in detail.|
Two schedules are said to be conflict equivalent if they involve the (same set of) actions of the same transactions and they order every pair of conflicting actions of two committed transactions in the same way.
A schedule is conflict serializable if it is conflict equivalent to some serial schedule. The Strict 2PL protocol allows only conflict serializable schedules. Two-Phase Locking (2PL) relaxes the Strict 2PL by allowing transactions to release locks before the end. 2PL requires a transaction not request additional locks once it releases any lock.
In DBMS, the facility that keeps track of the locks issued to transactions is called the lock manager. The lock manager maintains a lock table, which maps the data object to its lock. The locks are also implemented as read-write lock. The read lock can be updated to write lock as requested. As deadlocks are rare in the practice, the DBMS prevents them by periodically checking for deadlocks and aborting the related transactions if there is any.
In a more general model where record insertion and deletion could happen, there is a phantom problem even Strict 2PL is followed. The reason is that Strict 2PL can only lock the current existing data records, but doesn’t prevent new records being inserted. This results “phantom” records to show up, thus making transactions non-serializable. Index locking is introduced to fix phantom problem. There are also specialized locking strategy for B+ tree structure and multiple-granularity locking.
Though locking is most widely used in concurrency control, there are alternative approaches. For example, the optimistic concurrency control make an assumption that most transactions don’t conflict with each other, so the system execute transactions in a private workspace, validate the conflict not exist, and copy the result from its private workspace thereafter. More advanced techniques include time-based concurrency control and multi version concurrency control. In real world, most database systems use Strict 2PL or its variations.