Review for Paper: 10-Rethinking serializable multiversion concurrency control

Review 1

This paper proposes BOHM, a multi-versioned database system. BOHM separates the concurrency control and transaction execution phases to make the system scalable with a serialized order of execution (with reads never block writes, and writes may block reads).

The paper first discusses the motivation for BOHM:
1. previous multi-versioned systems have centralized timestamp counter, which leads to contention when more concurrency occurs, affecting scalability
2. previous multi-versioned systems are limited by serializing requirements

The design of BOHM can be split into two parts:
1. Concurrency Control: analyzes transaction's write set and determine the serialization order of transactions (single thread), and allocates space (placeholders) for new versions (multiple threads)
2. Transaction Execution: executes transactions (multiple threads), and garbage collect un-needed versions (once after a batch is processed)

Experiments suggest that BOHM's throughput scales with increasing thread counts (good scalability, but need good allocation strategy for the number of concurrency control threads and transaction execution threads).

The main contribution of this paper is it proposes a highly scalable multi-versioned database system with relatively low-cost serializability. The thing I like about this paper is it illustrates its two-part design very well, with detailed explanation on how work is partitioned to the two phases and why the design can result in good performance compared to other systems.



This paper proposes BOHM, a multi-versioned database system. BOHM separates the concurrency control and transaction execution phases to make the system scalable with a serialized order of execution (with reads never block writes, and writes may block reads).

The paper first discusses the motivation for BOHM:
1. previous multi-versioned systems have centralized timestamp counter, which leads to contention when more concurrency occurs, affecting scalability
2. previous multi-versioned systems are limited by serializing requirements

The design of BOHM can be split into two parts:
1. Concurrency Control: analyzes transaction's write set and determine the serialization order of transactions (single thread), and allocates space (placeholders) for new versions (multiple threads)
2. Transaction Execution: executes transactions (multiple threads), and garbage collect un-needed versions (once after a batch is processed)

Experiments suggest that BOHM's throughput scales with increasing thread counts (good scalability, but need good allocation strategy for the number of concurrency control threads and transaction execution threads).

The main contribution of this paper is it proposes a highly scalable multi-versioned database system with relatively low-cost serializability. The thing I like about this paper is it illustrates its two-part design very well, with detailed explanation on how work is partitioned to the two phases and why the design can result in good performance compared to other systems.



Review 2

The tradeoff between serializability and concurrency control is an important problem. Serializable multi-versioned DBMS usually constrain read-write concurrency to avoid conflicts, which significantly slow down the transaction processing rate. The paper proposes a new concurrency control protocol BOHM, which ensure both serializability and no read-write conflicts.

BOHM determine serialization order of transactions and version control before the transaction actually executed. This design ensure that reads never block writes while it achieves full serializability at the same time.

The new protocol solves the vulnerability of serializability violations in previous protocols aimed at full serializability like snapshot isolation without adding concurrency constraints or maintain bookkeeping which harms performance. This design is scalable because it eliminates a central lock manager through multiple threads and make the data structures live in local thread. By separating currency control components from transaction processing components, it’s also easier to maintain the DBMS code.

However, the disadvantage of this design is obvious. In order to determine serialization order before transaction execution, all the transactions must be submitted to the system before the transaction processing start. The new design is not compatible with some of the traditional cursor-oriented database. In order to identify the write transactions from the whole transaction, we have to have some additional operations before processing transactions, which introduces additional work.



Review 3

Multi-versioned database systems perform poorly in providing transaction serializability while increasing concurrency, which are being used by the majority of database systems. In order to get close to full serializability, one popular option is snapshot isolation, which is, however, vulnerable to serializability violations. Besides, other solutions to making multi- versioned systems serializable either severely restrict concurrency in the presence of read-write conflicts (to the extent that they offer almost no additional logical concurrency as compared to single- versioned systems) or they require more coordination and bookkeeping, which results in poorer performance in main-memory multi-core settings.

Therefore, this paper proposed BOHM, a new concurrency control protocol for main-memory multi-versioned database systems. The key insight behind BOHM is that the complexity of determining a valid serialization order of transactions can be eliminated by separating concurrency control and version management from transaction execution. BOHM determines the serialization order of transactions and creates versions corresponding to transactions’s writes prior to their execution.

BOHM’s architecture consists of concurrency phase and transaction execution phase. The concurrency control layer is responsible for (1) determining the serialization order of transactions, and (2) creating a safe environment in which the execution phase can run transactions without concern for other transactions running concurrently. The transaction execution layer executes transactions’ logic, and (optionally) incrementally garbage collects versions which are no longer visible due to more recent update.

The main contributions of this proposed model is multi-versioned serializability, multi-core scalability, and a clean, modular design, which separates the currency control and transaction processing using separate threads. This improves database engine code maintainability and reduces database administrator complexity. The advantages of this method are as follows:
1. It guarantees full serializability while ensuring that reads never block writes.
2. It does not require the additional coordination and book-keeping introduced by other methods for achieving serializability in multi-versioned systems.
3. It is a scalable (across multiple cores) concurrency control protocol.

The main disadvantages of this approach are as follows:
1. The entire transactions must be submitted to the database system before the system can begin to process them. Hence, this method does not support cursor-oriented database access.
2. The write-set of a transaction must be deducible before the transaction begins.


Review 4

Problem & Motivations
The multi-versioned database system is pervasive nowadays because of it's high performance. By consuming additional space to store the previous versions of DB, the database allows the write and read operations on the same record to happen simultaneously. However, this DB designs will produce unserializable result under one situation called "write-skew anomaly": basically, it refers to the situation that two transactions have an overlapping read-set and disjoint write-set and the write-set of each transaction has a dependence on the read-set. To overcome this issue, the traditional database sacrifices a lot and the rules it adopts greatly influence the performance of the multi-version DB. Therefore, one new multi-version DB design should be proposed which can have high performance under a high isolation level.

Achievement
The authors have two main achievements. The first achievement is that they identify the bottlenecks of the multi-version DB - centralized timestamps and guaranteeing serializability. The second achievement and also the most important achievement is that they propose BOHM, which know all queries in advance and have an efficient performance by building the total order of transactions and batching the possible parts of the transactions.

Drawback
The main drawback of this paper, from my perspective, is that all algorithms within the paper based on the assumption - you can get all queries in advance. However, this assumption is so strong and usually not easy to achieve. And also because of this assumption, the applications of this new proposed DB will be narrow down a lot (basically all OLTP DB will not apply this strategy).


Review 5

Ensuring concurrency in transactions is a very important challenge facing database management systems, and historically, there have been two choices when doing record updates: update-in-place and multi-versioned systems. At the cost of additional storage overhead, among others, multi-versioned systems can ensure that reads do not block writes. Due to the falling cost of storage, multi-versioned systems have become more popular recently. When users ask for serializability in these types of systems, snapshot isolation is often used to achieve this requirement. Despite this, however, snapshot isolation implementations have to deal with potential serializability violations, such as the write-skew anomaly. Workarounds to these issues often take the form of either severely restricting concurrency in the presence of read-write conflicts (which takes away a lot of what multi-version systems can offer), or by increasing the number of checks and bookkeeping, which carries with it a performance penalty. In the face of these challenges, the authors of the paper “Rethinking serializable multiversion concurrency control” offer up a new concurrency control protocol, BOHM, that is able to guarantee serializability while ensuring that reads never block writes, all with scalability and improved performance. The key insight provided by the authors is that by separating concurrency control and version management from transaction execution, a great deal of the complexity in determining a valid serialization order of transactions is avoided.

Bohm addresses some of the sources of performance penalties of other multi-version control systems, such as the need for a global timestamp to keep track of transactions, which is not very scalable, since it presents a bottleneck in getting the value of the time. To deal with this, Bohm gives transactions a timestamp based on their position in the total order of execution. In general, Bohm achieves performance gains by being able to reduce the amount of overhead dedicated to coordinating database threads, especially when committing onto shared memory. Bohm does so by separating concurrency control logic from transaction execution logic, allowing better concurrency and scalability. The rest of the paper deals with how the serialization order is determined, how the actual transactions are executed, and the performance results from tests of their system.

The primary strength of this paper is that the authors managed to create a system that is able to get the best of both worlds, concurrency/serializability along with relatively good performance. Throughout the paper, they constantly referred back to their original design considerations of decreasing coordination overhead, and this common vision helped to make the system design easier to understand and more compelling. Additionally, their results section and discussion was noticeably longer than many other research papers of the same length, which goes to show the lengths that they went to test and characterize the system’s capabilities.

Probably the only weakness of this system is that its architecture seems to have been designed from the ground up, which potentially hinders its adoption by companies and other clients, as compared to a simpler, incremental improvement on the multi-versioned systems already in use. Also, the use of timestamps based on a transaction’s relative position rather than a global timestamp suggests that the entire pipeline must be submitted prior to running. Whether this has an actual impact on operations may depend on the nature of the data/transactions being handled, though.


Review 6

This paper’s main contribution is the BOHM concurrency control method, which is a serializable protocol that makes optimizations on read transactions, like not requiring bookkeeping and not blocking writes. BOHM is important since it also minimizes coordination across threads and is therefore very scalable across multiple cores. The article describes the former limitation of global timestamps and how those don’t scale very well, and also how multi-version methods for concurrency control support concurrent reads and writes on the same object.

The primary design feature of BOHM was to minimize the amount of cross thread coordination, so the logic for scheduling a serializable order of transactions was separated from the logic on actually executing the transactions. Given a transaction log consisting of a bunch of read and write transactions, concurrency control threads will allocate space for the writes to create new versions of the data. A separate set of transaction execution threads then execute all the reads and fill the allocated space with written data. Optionally, any unused versions may be garbage collected. The article analyzes the performance of BOHM when the number of execution threads increases and we see that it is very scalable with a high number of concurrency control threads.

I think this reading did a good job of introducing the reader to all of the necessary concepts for understanding the contribution. It made sure, for example, to define multiversion systems as opposed to update-in-place systems and give a quick look at the tradeoffs of one versus the other. I liked how it gradually went deeper with the material by giving a system overview and then in the following sections breaking down all of the previously mentioned components.

I did not like how this article introduced new challenges in section 3 regarding the concurrency control and execution threads. Given the outline they previously gave us, I had to wrap my head around relatively new ideas as the material started getting deeper like the section on batching in 3.2.



This paper’s main contribution is the BOHM concurrency control method, which is a serializable protocol that makes optimizations on read transactions, like not requiring bookkeeping and not blocking writes. BOHM is important since it also minimizes coordination across threads and is therefore very scalable across multiple cores. The article describes the former limitation of global timestamps and how those don’t scale very well, and also how multi-version methods for concurrency control support concurrent reads and writes on the same object.

The primary design feature of BOHM was to minimize the amount of cross thread coordination, so the logic for scheduling a serializable order of transactions was separated from the logic on actually executing the transactions. Given a transaction log consisting of a bunch of read and write transactions, concurrency control threads will allocate space for the writes to create new versions of the data. A separate set of transaction execution threads then execute all the reads and fill the allocated space with written data. Optionally, any unused versions may be garbage collected. The article analyzes the performance of BOHM when the number of execution threads increases and we see that it is very scalable with a high number of concurrency control threads.

I think this reading did a good job of introducing the reader to all of the necessary concepts for understanding the contribution. It made sure, for example, to define multiversion systems as opposed to update-in-place systems and give a quick look at the tradeoffs of one versus the other. I liked how it gradually went deeper with the material by giving a system overview and then in the following sections breaking down all of the previously mentioned components.

I did not like how this article introduced new challenges in section 3 regarding the concurrency control and execution threads. Given the outline they previously gave us, I had to wrap my head around relatively new ideas as the material started getting deeper like the section on batching in 3.2.



This paper’s main contribution is the BOHM concurrency control method, which is a serializable protocol that makes optimizations on read transactions, like not requiring bookkeeping and not blocking writes. BOHM is important since it also minimizes coordination across threads and is therefore very scalable across multiple cores. The article describes the former limitation of global timestamps and how those don’t scale very well, and also how multi-version methods for concurrency control support concurrent reads and writes on the same object.

The primary design feature of BOHM was to minimize the amount of cross thread coordination, so the logic for scheduling a serializable order of transactions was separated from the logic on actually executing the transactions. Given a transaction log consisting of a bunch of read and write transactions, concurrency control threads will allocate space for the writes to create new versions of the data. A separate set of transaction execution threads then execute all the reads and fill the allocated space with written data. Optionally, any unused versions may be garbage collected. The article analyzes the performance of BOHM when the number of execution threads increases and we see that it is very scalable with a high number of concurrency control threads.

I think this reading did a good job of introducing the reader to all of the necessary concepts for understanding the contribution. It made sure, for example, to define multiversion systems as opposed to update-in-place systems and give a quick look at the tradeoffs of one versus the other. I liked how it gradually went deeper with the material by giving a system overview and then in the following sections breaking down all of the previously mentioned components.

I did not like how this article introduced new challenges in section 3 regarding the concurrency control and execution threads. Given the outline they previously gave us, I had to wrap my head around relatively new ideas as the material started getting deeper like the section on batching in 3.2.



Review 7

BOHM is the first multi-version DBMS which can achieve serializable concurrency control without failing to use the multiple versions to ensure that reads do not block write implementing in multi-core system. The previous multi-version DBMS will compromise its performance to maintain serializability which obviously does not take fully advantages of advanced multi-core environment. These can be seen as the main contribution of this paper.

The paper first highlights the advantages of multi-version system because it supports parallel reads and writes but this efficiency takes a toll in storage complexity. And the previous works have drawbacks of either severely restrict concurrency in the presence of read-write conflict or requiring more coordination and bookkeeping. The key insight of BOHM is to eliminate the complexity of determining serialization order of transactions by separating concurrency control and version management.

The paper then gives the reasons why the performance of the modern multi-version concurrency control faces bottleneck by explaining from 1)obtain timestamps using global counters and 2)the cost from guaranteeing serializable execution which brings the design of BOHM, by attempting to reducing coordination among database threads. BOHM makes the concurrency control logic and transaction execution logic separated in two phase transaction: 1)concurrency control phase and 2) execution phase. But these phases brings cost of extra requirement like transaction with write-sets in advance that commonly occurs.

The drawbacks of this paper will be the disadvantages listed in this paper. Any proposed new algorithm or method will bring a tradeoff just like this method in which the entire transaction must be submitted to the DBMS before processing them. This may induce a big drop in efficiency. In particular cases, the influences by new algorithm is large.


Review 8

In the paper Rethinking serializable multiversion concurrency control, the author try to solve a problem that is Multi-versioned database systems is often outperformed by single version
Systems if it tries to achieve both heavy load concurrency and serializability. In order to solve this problem, the paper propose BOHM, a new concurrency control protocol that guarantee both serializable execution and ensure read never block write. ALso, this protocol does not require reads to perform any book keeping.

There are two layers in the design. The first layer is concurrency control. It will determine the serialization order of transaction and assign a unique count for each of them. It will then create a safe environment that the execution phase can run without any concern. It introduce intra--transaction parallelism in this phase. Basically, each thread is responsible for a partial of data entry and can be operating in parallel. Then, things become quite easy in Execution phase, each transaction has one of three date, Unprocessed, Executing and Complete. One thread is dedicating to execute one transaction, is found pre requirement not satisfied, it will go back to Unprocessed and execute later.

There are several main contribution of this paper. First, it proposed a high efficient multi versioned serializable protocol. Second, this protocol is very suitable for scale in multi core situation. Third, the design of this protocol completely separate the currency control and transaction process. After all, this protocol achieve very good performance.

I would like to point out one week point of this paper. It has introduced parallel processing and claim that it would boost the performance a lot and is very suitable for scale up. However, the time that takes to execute a transaction or a batch of transaction is determined by the slowest processing thread. As paper said itself, some thread might be processing 100th transaction but others might still processing 50th transaction. This defect of this protocol is not well analysed and optimized.



Review 9

This paper introduces BOHM, a new implementation of serializable isolation for multiversion database systems. The authors propose the solution to address two complaints that they argue apply to all existing serializable implementations for multiversion systems: concurrency ends up severely restricted, or there is a large amount of overhead. Fixing the concurrency problem is particularly important, as increased concurrency is one of the main reasons to use a multiversion system in the first place. The system is unique in that it never allows reads to block writes.

The approach taken by the authors is to add an extra phase to the transaction processing plan: first, transactions will go through a concurrency control phase, then the execution phase. In the concurrency control phase, a plan for executing the queries in a serializable manner is created. Then as expected, in the execution phase, that plan is executed. The authors do their best to reduce overhead by reducing the amount of communication needed between threads.

The authors state that the main contribution is the ability to not block reads with writes, which is allowed by the separation of concurrency control and transaction execution. The contributions are best told by the results section however, where BOHM was shown to perform quite well. I appreciated how the authors showed results for 2PL and OCC in addition to SI and Hekaton. As the authors were primarily trying to prove improvements over previous multi-versioned protocols, it may have been reasonable to only include SI and Hekaton. However, the authors gave clear descriptions of why the results ended up the way they did for each system-benchmark combination. Even though BOHM was not always the top performer, those statistics made clear the ways in which BOHM is valuable.

The authors bring up two limitations of BOHM: the whole transaction must be submitted to the DBMS before any processing can begin, and the write-set must be known before the transaction can go through the concurrency control phase. Speculative techniques to deduce write-sets are discussed briefly, but I noted that there was not too much discussion about the first limitation. The authors simply stated that many new applications do this already, and that BOHM can be used with those applications. I would have liked to know if the authors did any more research about the feasibility of converting old applications to a state where they would be able to leverage the new system.
Additionally, one thing about the paper layout that really bugged me was that there were a few figures that were placed on one page but not referenced until the next page. This made it a lot harder for me to easily look at the figures as they were being explained.


Review 10

Multi-versioned database systems have the advantage of significantly increase the degree of concurrency over lock-based single version database systems. However, it’s not an easy job to achieve serializable isolation level in multi-versioned systems. Current solutions either severely restrict concurrency (during read-write conflicts) or require more coordination and booking keeping. Both result in very poor performance.

To solve these problems, this paper purposes BOHM, a new concurrency control protocol for multi-versioned database systems. The core design idea of BOHM is the separation of concurrency control & version control from actual transaction execution. The whole system can be divided into three parts:
1. A log containing a list of all transactions: this log is maintained by a single thread which also assigns a timestamp to each transaction based on its position in the list.
2. Concurrency control threads: these threads examine every transaction in the log and create new record versions (with placeholder values) if the transaction’s write-set overlaps with the partition this concurrency control thread is responsible for. Note that the processing does not require any communication between the thread.
3. Transaction execution threads: Once all concurrency control threads finish processing a batch of transactions, that batch can be executed by execution threads. Transactions are partitioned into different threads and executed concurrently. This can be done since the order of record versions have already been determined in the second step. The only case needs to be handled is read dependencies where a transaction A tries to read value x produced by transaction B, but B hasn’t been executed. This problem can be easily fixed by recursively run any transactions the current transaction depends on.

The whole design reflects the design philosophy of avoiding writing & reading shared data-structure and avoiding coordination among concurrency control threads.

The main drawback of the purposed method is that the entire transactions must be submitted to the system before execution. Also, the write-set of a transaction should be deductible before the transaction begins. However, the author claimed that lots of applications have already utilized stored-procedures to submit transactions so this won’t be a huge problem.


Review 11

In the paper "Rethinking serializable multiversion concurrency control", Jose Faleiro and Daniel Abadi introduce BOHM, a new concurrency control protocol for main memory multi-versioned database systems. This was introduced because existing multi-version systems are hung up when a user requests full serializability. These systems have to constrain read-write concurrences due to potential conflicts and employ expensive synchronization patterns as a part of their design. As a result, single-version systems typically perform much faster in these scenarios. This is analogous to a single person working alone to complete a complex task versus multiple people working to complete the same complex task. Much more coordination is required for the latter and can often times be slower if the correct techniques are not applied. BOHM solves these issues through guarantees that reads never block writes and reads aren't necessary for book keeping. We end up with a system that scales well, has great performance, and functions well under conditions of high and low contention.

Serializability is an important concept that enables database users to get their desired result - regardless of the order of the transactions. This is often times obligatory and since it shows up in daily interactions, it is a problem worth solving/optimizing. Older solutions to the current problem at hand such as snapshot isolation systems severely restrict concurrency in the presence of read-write operations - a drawback for multi-version systems. Thus, BOHM is developed from scratch and has the previously mentioned benefits. However, with benefits, there are also two main weaknesses that follow. BOHM requires both entire transactions and write-sets of transactions to be fed to the database before processing.

BOHM strives to reduce the overhead for coordination between database threads that synchronize based on writes to shared memory. This is done through amortization - a method of redistributing "expensive" operations over the many frequent cheap operations to get essentially constant run-time. Separating concurrency control logic from execution logic enables users to avoid scalability bottlenecks and achieve the amortized cost of coordination across multiple transactions. At the concurrency control logic level, transactions are given timestamps and inserts for new versions are made for every record in the transactions write set. Then, for each element in the transaction read set, BOHM attempts to identify the version that the transaction will read. It then batches and performs a hand-off to the execution layer using an amortized algorithm. The execution layer, while being mindful of dependencies, coordinates the threads with a single writer and multiple readers.

One drawback to the paper was the way it presented itself to the audience. I felt that the representation of the problem in the introduction was stale with no real application to industry. Even the experimental results did not impress me either - the graphs were too hard to read and often times misled me to believe that there was not much of an improvement in performance compared to other methods. Another thing that I wished they did was give more concrete examples like previous papers in order to help me understand the paper better. Serializability is not my favorite topic and since it is categorized as a nasty problem, it is hard to wrap your head around.


Review 12

This paper describes an efficient ways to use multiple versions of database records to process concurrent transactions. The standard method for processing multiple transactions in a serializable manner is through using locks; reading data prevents another transaction from writing it, and vice versa. However, the large amount of blocking causes inefficiency. Having multiple versions of records allows writes to happen on a different version than reads, preventing blocking while still allowing serialization. The paper describes the BOHM multiversion system.

In order to accomplish this, as each transaction is received by the DBMS, a single thread accepts it and assigns it a timestamp. This allows a total ordering of the transactions, which gives a serial ordering and shows each transaction which versions of records it can interact with. Any transaction will only read records that have a timestamp before the transaction’s own timestamp.

After transactions has been received, a set of concurrency threads analyzes them. For any writes they will do, a new (blank) version of the written records is created. After processing, the transactions are given to execution threads, which actually run the transactions. When the transaction has to read a version that hasn’t been written yet, it will block until the write occurs. This prevents serializability violations.

The paper provides a (mostly) non-blocking, serializable system, which should have efficient runtimes at the cost of providing memory and handling multiple versions of records. It also only blocks reads waiting for writes, and never blocks writes waiting for reads, which could be useful for specific workloads. It also seems to do well with high contention among threads for record access.

On the downside, there are circumstances under which multiversion serialization doesn’t provide much of a performance improvement compared to standard locking, so the overhead of multiple versions can cause its own problems. This can require keeping track of old records for a disproportionately long time.



Review 13

The paper mainly talks about a novel way to realize serializable execution with the knowledge of read-set and write-set of the transaction by separating the concurrency control and the transaction execution. It performs quite better than other multi-version DBMS and some cases for single-version DBMS, which is crucial for more efforts.
The design is to separate concurrency control and transaction execution.
Because we know the write-set and read-set(optional)ahead of time, we can schedule the concurrency easily by setting placeholders for writing and reference for reading ahead of time. There is no conflicts since we get the order of transaction using centralized timestamp. Data parallelism and batching are approaches to accelerate the whole process.
There is no relationship in the execution order between concurrency control and transaction execution, but several read and write dependencies need to be satisfied, like we can not read unexecuted data. Besides, since the old version is invisible to exceed batches so we can do some garbage collection to save our memory.
The biggest drawback is also the biggest assumption of this paper: we need to know the write-set and read-set of the transaction ahead of time. But we can accept it since it performs much better than other multi-version DBMS.




The paper mainly talks about a novel way to realize serializable execution with the knowledge of read-set and write-set of the transaction by separating the concurrency control and the transaction execution. It performs quite better than other multi-version DBMS and some cases for single-version DBMS, which is crucial for more efforts.
The design is to separate concurrency control and transaction execution.
Because we know the write-set and read-set(optional)ahead of time, we can schedule the concurrency easily by setting placeholders for writing and reference for reading ahead of time. There is no conflicts since we get the order of transaction using centralized timestamp. Data parallelism and batching are approaches to accelerate the whole process.
There is no relationship in the execution order between concurrency control and transaction execution, but several read and write dependencies need to be satisfied, like we can not read unexecuted data. Besides, since the old version is invisible to exceed batches so we can do some garbage collection to save our memory.
The biggest drawback is also the biggest assumption of this paper: we need to know the write-set and read-set of the transaction ahead of time. But we can accept it since it performs much better than other multi-version DBMS.




Review 14

The paper brings up the topic of multiversion concurrency control because it has potential to increase the amount of concurrency in transaction processing. However, increase in concurrency comes at the cost of transaction serializability, and therefore is usually outperformed by single version system. This paper propose a new multiversion concurrency control protocol called BOHM, which guarantees serializable execution and avoids reads block writes. BOHM has to overcome some barriers faced by previous multiversion system: hard to determining a valid serialization order of transactions and scale using global counter to obtain timestamps. BOHM solves serializability problem by separating concurrency control and version management from transaction execution and determines serialization order transactions before their executions. The cost of this approach is requiring entire transactions with deducible write-sets in advance. To solve the global counter bottleneck problem, BOHM assigns timestamp to all transactions before execution. Author dives into details of the design of this two mechanisms. The key philosophy of BOHM is eliminating coordination between database threads, which help solve many bottlenecks previous multiversion systems experienced. However, this philosophy also introduces new bottleneck. One example is itra-transaction parallelism: it improves transaction throughput, eliminates coordinations between threads, but is hard to scale since each thread needs to examine every transaction in a serial order.

What I like about this paper is that author uses graphs, which is much better than language, to clearly show how each mechanism works. Also, the paper points out both advantages and disadvantages of BOHM. The evaluation experiments are very thorough, which makes the conclusion of good performance of BOHM credible.

What I don't like about this paper is that it is quite repetitive on concepts. For example, determining serialization order before execution is repeated in several sections.


Review 15

“Rethinking serializable multiversion concurrency control” by Jose M. Faleiro and Daniel J. Abadi presents a new approach (and system, BOHM) to serializable concurrency control in multiversion databases that separates the concurrency control and execution processes from each other in an effort to improve performance. The concurrency control process first determines a serialization order for a provided full set of transactions. It next outlines a series of versions for each record in the database, noting for what range of transactions each version is valid; this is accurately performed as all the version placeholders for each database record are created by a single (and the same) concurrency control thread. The paper discusses how transactions are processed by the concurrency control layer in batches (and then sent to the transaction execution layer) in an effort to reduce the time concurrency control threads are waiting idle for the remaining concurrency control threads to finish their processing of the batch. The paper then discusses how execution is performed for a batch, where each transaction will be executed completely by a single execution thread, and execution threads can try executing a transaction but return it if the appropriate version of a data record is not yet available. Tracking data record versions and which transactions they exist and are valid for also makes efficient garbage collection possible. BOHM makes the major assumption that all, and full, transactions that need to be processed concurrently are provided before any processing begins. Therefore BOHM is not appropriate for cursor-oriented database accesses, but could make sense for applications that use stored-procedures. The paper also presents experiments of BOHM against prior multiversion concurrency control protocols (an optimistic Hekaton and Snapshot Isolation) as well as single-version protocols (optimistic concurrency control (OCC) and two-phase locking (2PL)) and find that it performs better for both high and low contention workloads.

I thought that the ‘Motivation’ section did a good job of explaining prior approaches (centralized timestamps, and ‘track reads’ and ‘validate reads’ options for accounting for anti-dependencies), challenges with them, and giving a brief inline explanation of how BOHM overcomes those challenges. Also, as a non-domain-expert in DBMSs, I appreciated that certain information in the paper was reiterated multiple times, in different places, e.g., that records are partitioned across concurrency control threads and a single thread (always the same one) will update the concurrency versions for the record; seeing the same information a couple times helped me internalize it.

The paper explains that BOHM requires full transactions to be available before it can begin to process them. This is of course a limitation of the work, but it is good it is noted. I wonder how, or whether, BOHM could be adapted to support transactions that arrive at different times/in bits, and I think it would have been nice if the paper discussed this a bit. I imagine that perhaps BOHM could applied every X unit of time, or after a certain number of full transactions have arrived, or after a certain number of read/write operations (within the transactions, not yet executed) have arrived; BOHM could then be applied to this set of transactions, while new incoming transactions are processed the next time around.



Review 16

This paper proposes BOHM, a now concurrency control protocol for main-memory multi-versioned database systems.

The motivation of the paper basically lies in the fact that the additional constraints used to implement a serializable multi-version database system are very heavy so that the performance is often outperformed by single-version systems.

There are several significant contributions in this paper. The key insight behind BOHM is to eliminate the complexity of determining a valid serializable order of transactions by separating concurrency control and version control management from transaction execution. BOHM guarantees that reads never block writes. It proposes a scalable concurrency control protocol that doesn't contain a centralized lock manager, and all data structures are thread-local. And the cost of this work is requiring entire transaction with deducible write-sets in advance.

The authors are quite interesting, from my perspective. There's little paper discussing their disadvantages in such a straight way. And this advantage seems to be a trade-off according to their proposal. I really like their manner.


Review 17

In this paper, the authors proposed a novel concurrency control protocol for main-memory multi-versioned database systems called BOHM. Multi-versioned systems are a classical way to handle record updates, though additional spaces are required to store extra versions, it greatly increases the concurrency in transaction processing. However, it is hard to achieve serializability in multi-versioned systems that enable reads and writes of the same record to occur concurrently. Snapshot isolation level is an approach very close to full serializability, however, it is vulnerable to serializability violations and subject to anomalies. There is some work making multi-versioned system serializable previously, however, they are either severely restrict concurrency in the presence of read-write conflicts or require more coordination and bookkeeping result in poor performance. Making multi-versioned systems serializable is an important issue because multi-versioned systems are more scalable than update-in-place systems in handling a large amount of concurrency in transaction processing. Nowadays, the main-memory multi-core settings are very common, this allows better serializability with little burden. By applying serializability on multi-versioned systems, we can guarantee high performance with good consistency. In order to solve this problem, they propose BOHM. Next, I will reiterate the crux of the design of BOHM with my personal understanding.

The key design principle of BOHM is to reduce coordination among database threads. BOHM decouple concurrency control logic from transaction execution logic which means a transaction is processed by two different set of threads in two phases: 1. Concurrency control phase; 2. Execution phase. This separation improves the concurrency and scalability. In BOHM, transactions enter the systems are handled by a single thread which logs the transactions information to a shared memory with only one timestamp. These logs are then analyzed by several concurrent control threads, which can check whether the write-set of a transaction match its partition of records. For the transactions related to its partition, the concurrency control thread will create a new placeholder for the new version and link it to the previous version. The two main functionality of concurrency control threads is determining the order of transactions and create a safe environment for transaction execution. For the execution phase, each transaction execution thread read the same logs, it performs reads associated with transactions and fills in pre-existing spaces for writes. The two main functions of the execution layer are transaction logic execution and garbage collection.

BOHM is indeed an innovative protocol with many strengths. First of all, this model achieves great multi-versioned serializability and multi-core scalability. BOHM eliminate the complexity in computing serialization order by separating concurrency control and version management, which enable full serializability while ensuring that reads never block writes. Besides, inside the BOHM, there is no additional coordination and book-keeping which greatly improve the scalability. Another contribution is that, when comparing to the design of traditional database systems, BOHM decouple the currency control and transaction processing. From an engineering perspective, this design is cleaner thus make the code more maintainable. In addition, in the design of concurrency control phase, they use a form of intra-transaction parallelism, the concurrency control threads will never interact with each other, this pure thread local decision reduce cache coherence and increase multi-core scalability.

There are some drawbacks to their methods. First, just as they said in their paper, due to the design of two phases, the concurrency control phase needs to advance knowledge of each transaction’s write set. Although there are some techniques to avoid this, the entire transaction is still required for the system. So those transactions are submitted to the database in pieces are not supported, like cursor-oriented accesses. Besides, the constraints of BOHM can be avoided only in the main-memory multi-core scenario, which means that BOHM may not perform very well without this assumption since multi-versioned systems have more overhead than traditional update-in-place systems. Last, they can give more concrete examples of how they perform the garbage collection process.



Review 18

Then paper introduces BOHM, which is a concurrency control protocol for multi-versioned database systems that aims to provide serializability without the performance-related disadvantages that had plagued previous systems that tried to provide this. BOHM’s main strategy consists of dividing the regularly-monolithic concurrency control mechanism into 2 parts: a “concurrency control” layer and a “transactions” layer. The concurrency control layer determines a serializable order for transactions to be processed in—it does this by assigning a total order, rather than relying on global-counter-dependent timestamps—and once a serializable order is found, the transactions layer processes these transactions. This component-oriented architecture ensures that read requests don’t block write requests, and is in general easier to maintain as well.
BOHM’s main contributions are that read requests don’t block write requests, a global counter does not need to be maintained, it prioritizes thread-local decision-making whenever possible, and obviously its guarantee of serializability rather than a weaker consistency level such as Snapshot Isolation. Experimental results showed that due to this, BOHM performed well in terms of throughput compared to other multi-version database systems.
As with many papers/projects, most of BOHM’s features involve making tradeoffs—probably the most prominent of which that is before a transaction is processed, multiple things need to be already done, namely the determining of the read & write sets. Transactions also can’t be processed in small pieces using cursors; they need to be done at once. The paper acknowledges these, but also points out that there are many applications that already implement things such as stored procedures that would handle this kind of requirement. Finally, with respect to the actual paper itself, I found the high-level strategy of BOHM to be fairly understandable but found the details of each section to be more complicated than some of the other papers we have read, and am not completely confident in my understanding of them. I’m not sure if this is due to the writing of the paper or simply the natural complexity of BOHM, however, so I can’t really call it a weakness.