Review for Paper: 17-Hekaton: SQL Server's Memory-Optimized OLTP Engine

Review 1

Traditional commercial DBMSs are designed on the assumption that the database will reside on disk rather than in memory. Disk-based databases typically provide concurrency control via lock-based mechanisms such as strict two-phase locking, which ensures serializability. Today, memory has become less expensive to the point where most OLTP databases can reside entirely in memory rather than on disk. This makes it possible to redesign DBMSs (or table implementations), to provide orders-of-magnitude speedup under OLTP usage.

In “Hekaton: SQL Server's Memory-Optimized OLTP Engine,” the authors present a memory-based, high-speed table format that has been added to Microsoft SQL Server, called Hekaton. Hekaton tables are stored in main memory and provide concurrency control through an optimistic mechanism that combines snapshots and aborts of potentially conflicting transactions. Hekaton also allows the user to compile stored procedures into machine code (via a C code intermediate). OLTP data systems typically use stored procedures exclusively, instead of ad hoc queries, so making stored procedures faster is highly important. Hekaton's optimistic concurrency control allows Hekaton tables not to use latches or locks, allowing their throughput to scale up better as cores are added to a cluster than with S2PL, which is slowed by contention for locks.

The main contributions of the paper are the Hekaton implementation, which is a new table type optimized for main memory data, for use in SQL Server; a description of the optimizations that make Hekaton tables more efficient than regular tables; and performance tests showing the throughput gains of Hekaton tables. Results show that on a workload that mixes random row lookups or updates with table scans, Hekaton produces a 20x speedup over regular tables. This is largely because Hekaton does not use locks, but instead uses optimistic concurrency control, allowing reads to overlap with low overhead.

Hekaton has some drawbacks relative to conventional database tables. In order to determine if a concurrent transaction can commit safely, Hekaton repeats all reads before commit and checks for phantoms that were not present before; this could present significant overhead in some workloads. One downside of optimistic concurrency control is that it requires transactions that are ready to commit to be aborted in some cases, and the frequency of this issue would be workload-dependent. So Hekaton may not offer much speedup compared to regular tables for write-heavy workloads.


Review 2

The paper introduces a new database engine, Hekaton, which is optimized for memory resident data and OLTP workloads. It discusses the architecture of the engine and provides all the information about the various components it houses. It is integrated into SQL and is not a separate system making it simpler for the existing users to adopt it. The users have to just declare the table memory as optimized. The paper is motivated by the fact that increase in throughput using current database engines can only provide an improvement by 3-4 times and not more. The current databases were designed in a time when main memory was too expensive. Now this is no longer the case and a OLTP databases can fit entirely in memory nowadays.

The authors talk about the architecture of Hekaton which has no latches and locks, has a compiler which converts requests to native code and it optimizes indexes for main memory. It consists of a storage engine which manages user data and indexes, a compiler which converts the queries into native code and a runtime system which provides integration with SQL Server resources. The performance increase is majorly attributed to the conversion of SQL statements and stored procedures into highly customized native code. Transaction management along with validation of the same, logging and recovery procedures followed by Hekaton are explained along with the garbage collection feature of it.

The paper is successful in explaining the structure of Hekaton. It shows significant performance advantage over current database engines with speedups of 30 times in large collections. The paper also addresses various problems such as translation of T-SQL stored procedures into C code. The quantification of the premise with experimental results is also present.

However, the database engine has a long way to go if it wants to be adopted completely. It doesn’t support all operators like MERGE, EXISTS, UNION, DISTINCT and more. The isolation levels are not discussed in the paper which are not supported by the engine as well.


Review 3

What is the problem addressed?
This paper gives an overview of the design of the Hekaton engine, new database engine optimized for memory resident data and OLTP workloads, and reports some experimental results.

Why important?
SQL Server and other major database management systems were designed assuming that main memory is expensive and data re- sides on disk. This assumption is no longer valid. Even through there are multiple main memory database system, Hekaton engine has several features that set apart the competition. It’s integrated into SQL Server. Hekaton tables can be queried and updated using T-SQL in the same way as regular SQL Server tables. Hekaton is designed for high levels of concurrency but does not rely on partitioning to achieve this. The engine uses latch-free (lock-free) data structures to avoid physical interference among threads and a new optimistic, multiversion concurrency control technique to avoid interference among transactions.

1-­‐2 main technical contributions? Describe.
All Hekaton’s internal data structures are entirely latch-free (lock-free). There are no latches or spinlocks on any performance-critical paths in the system. Hekaton uses a new optimistic multiversion concurrency control to provide transaction isolation semantics; there are no locks and no lock table. The combination of optimistic concurrency control, multiversioning and latch-free data structures results in a system where threads execute without stalling or waiting.
While Hekaton is optimized for main-memory resident data, it must ensure transaction durability that allows it to recover a memory-optimized table after a failure. Hekaton achieves this using transaction logs and checkpoints to durable storage.

1-­‐2 weaknesses or open questions? Describe and discuss
Restrictions around use of indexes and constraints (no FOREIGN KEY) could be limitations on the range of data types supported for memory-optimized tables.


Review 4

This paper discussed a new database engine called Hekaton which is designed to optimize for memory resident data and OLTP workloads. It was developed inside Microsoft, and Hekaton is not a separate system (fully integrated into SQL server). This allows users easily to convert to the substantial performance improvements by Hekaton. Hekaton also has some features that are different from other existing main memory database systems, such as fully durable and transactional tables, ability to be updated using T-SQL, and high levels of concurrency not relying on partitioning. This paper first describes the design considerations and principles, as well as an overview of the architecture. Then it discusses several detailed design issues, including how to handle data (store, index, update), and how stored procedures are compiled into native code. Also, it covers the transaction management and concurrency control, as well as garbage collection. Finally, it provides the performance evaluation results.

The problem here is that traditional RDBMS does not fit for all in today’s technology, so there are a lot of new engines/systems developed for various environments. There are several main memory database systems, however, they also have some disadvantages. Thus, Hekaton came up. The most important advantage of Hekaton is that it is fully integrated into SQL server, which makes it easy for users to take use of Hekaton:

1. avoid hassle and cost for a new DBMS
2. not all tables need to be in main memory (only performance-critical ones)
3. stored procedures can be compiled into native MC for further performance gain
4. user can gradually convert to Hekaton.

The major contribution of the paper is that it provides a detailed description about the new database engine Hekaton, including the design considerations and concerns. Besides the integration with SQL server, it also utilizes a new optimistic multiversion concurrency control (MVCC). It is an innovative idea and it provides better performance. Also, it gives detailed examples to show how multiversion with timestamp works (insert, update, delete).

One interesting observation: at the end of the paper, it mentions how to do garbage collection, and I think the design consideration is very impressive. It is intuitive to delete records based on the timestamps, however, the paper has much more concerns including non-blocking, cooperative, incremental, and paralleizable/scalable. I learned that in any design, comprehensive considerations are necessary in order to provide better performance and quality.


Review 5

This paper a addresses a problem that many commercial databases have - they have been under development for 20 or more years, and the computer architecture landscape has changed dramatically since then. When these databases where in their infancy, RAM was incredibly expensive, and it was inconceivable that data would fit in memory - it would have to go on disk (or tape!). Since then, hardware has gotten much faster and cheaper - however, disks have not been increasing at the same speed as the other components. The cost of a disk access is *more* than it was in the past - the actual time is less, but more cycles are wasted waiting for the data. Because of this, the authors needed to do a complete rewrite of the database engine to optimize for in memory operation.

The system that authors propose has several differences from a traditional DBMS. For one, they choose not to use B-trees, a textbook example of indexes, as they are designed for block access devices and require a high number of instructions for key lookup(using their "novel" Bw-trees instead). They also eliminate locks, opting to use an optimistic multi-version concurrency system. They discuss the how store procedures are converted from SQL to PIT, and then further compiled to machine code. They then move in to a discussion of transaction management, and how this is achieved in the absence of locks. By looking at read times, commit end times, and checking for dependencies, transactions can determine if they need to roll back. They also check for phantoms by rescanning before commit, rather than locking. They cover recovery, durability, check points, logging, and garbage collection before getting to the experiments.

The results are impressive. The authors are claiming a 20X improvement, and a large portion of this improvement seems to come form the compiling to native code. The improvements possible from in-memory optimization are promising.

I'm skeptical of these results, primarily because they are comparing a B-tree to a hash index - to make this comparison fair, they should be comparing a hash-index to a hash-index. I think this is one of the primary weaknesses of the paper. Other than that, I would really had liked to see more experimental results in general.

This paper did a great job of showing how many database principles are outdated int he face of new architecture. They showed a dramatically different architecture, and explained how it worked in a high level way.



Review 6

The problem put forward in the paper is that the new database engine is needed to be created for large memories and many-core CPUs because of the development of hardware and the low price of memories. The solution proposed by the paper is that create a database engine named Hekaton for data in main memory. The engine is integrated into SQL to provide higher flexibility to the users. And it can solve queries and be updated in the same way as SQL server tables.

The approach to design Hekaton is to reduce the number of executed instructions since it is the only way to improve the current engine by 10-100x. The engine has three parts: the storage engine, the compiler and the runtime system. The storage engine manages everything about the data, including the real data, metadata, checkpoints, logs and so on and carries out the operations like I/Os, updates, garbage collections. The compiler translates stored procedure and converts the statements and queries into machine code to reduce the run time and improve the performance. And the runtime system provides support to the engine during the execution by managing resources and libraries.

The strength of the paper is that if provides a lot of pseudo code and flow charts to explain the idea, which is more accessible to the reader than redundant words. For example, the flow chart about the architecture of the Hekaton compiler demonstrates the components and the procedures in the compiler clearly. It helps to understand the word.

And the drawback is that the paper does not mention the shortcomings of Hekaton, which may bring difficulties for the future developers to efficiently improve the engine or understand the tradeoff between the engine and other ones. Since it mentioned in the paper that there are several existed main memory database systems, it will be better to make a comparison among them.


Review 7

This paper describes Hekaton database, a memory-optimized OLTP engine that runs on SQL server rather than acting as a separate DBMS. This engine takes advantage of the fact that majority of OLTP databases can now fit within the main memory of the server. The paper describes the basic organization and high-level architecture of Hekaton, and how it performs compared to a regular SQL server engine.

The high-level components of Hekaton are:
1) Storage engine: manages the user data and indexes in its own internal management.
2) Compiler: compiles the abstract tree representation of a T-SQL stored procedure into native code that use the storage engine
3) runtime system: provides integration with SQL Server resources and serves as a common library of functionality.
Hekaton achieves high performance by using very efficient latch-free data structures in its storage engine. Also, Hekaton uses optimistic multiversion concurrency control (MVCC) to manage its transactions without locking through snapshot isolation. The engine ensures that a transaction is serializable by checking if it holds read stability and phantom avoidance. Overall, Hekaton show very good performance compared to regular SQL servers, with good performance scaling for larger systems.

The paper does a good job giving us a comprehensive view of the Hekaton architecture, along with the details involved in its design and various components. The performance analysis is quite thorough, with good comparison to existing system along with scalability analysis. Something that could have been addressed in further detail by the paper are any drawbacks of Hekaton, and any aspects of the engine that needs to or can be improved upon.




Review 8

This paper introduces a new database engine in SQL Server, called Hekaton. Hekaton supports tables in main memory and corresponding compiled procedure calls with high concurrency performance. It implements latch-free data structures and a new optimistic multi version concurrency control technique.

One of the main contribution of Hekaton is that it brings main-memory DBMS features. The SQL Server is a traditional relational databases designed for data on the disk but does not utilize much main memory though now main memory gets cheaper and much larger. The supports for table stored in main memory helps it adapt to the changes. Meanwhile, the new features are still integrated on SQL engine, since the Hekaton engine communicates with SQL and the syntax for T-SQL is still SQL style. Hence, customers may still choose SQL Server because of the new features and less training cost for programmers and analysts.

This paper also implies the possibility that the traditional relational DBMS survive the technique changes in the modern era. Scholars in recent years are seeking a new design for the DBMS that fits much better the current hardware and markets. One main cause for this phenomenon is the main memory. New OLTP DBMS may fit well in the main memory and the performance of the RDBMS pales. Hekaton, introduced in this paper, may be considered as the other alternative in the new era.

Though Hekaton equips SQL Server a new weapon, it may still carry a big giant: its transaction still relies on logging to ensure durability, and the optimizer is still traditional SQL query optimizer which does not consider the cost of table in the main memory. Its performance compared with other in memory OLTP databases remains unknown. In addition, the Hekaton might not scale very well, since it does not solve the network communication problems in a multi-site situation.


Review 9

Problem:

Like many other database systems with a long lineage, the SQL Server was originally designed with the assumption that the database could not fit in main memory. This assumption does not apply anymore, and so the authors of this paper developed a new database engine called Hekaton optimized for in-memory operations. Hekaton allows tables to be stored in memory, and also compiles stored procedures to provide more speedup. It also uses an optimistic concurrency technique which does not use latches.

Tables that are created with a new option specified will be stored in memory. The table has indexes like normal, but the indexes are lock-free, which allows high concurrency. Concurrency management is done using multiversioning- the table may contain many different versions of each record, and each record has a “Begin” and “End” timestamp. Each transaction also has a timestamp, which allows it to know which version of the data it should be seeing in the global order. Even though the concurrency scheme doesn’t use locks, a transaction T1 may still need to block, when it reads the changes made by another transaction T2. In this case T1 may not commit until T2 does, and if T2 aborts this creates a cascaded abort as T1 must also abort. Transactions also may not return results until they commit.

Stored procedures are compiled by assembling them into a tree as in any query plan. Each operator is represented by a chunk of code with an interface, used to connect to other operators. Instead of making each operator a function or a class in the compiled code, the operators break every programming rule by connecting to each other using goto and labels. This reduces function call overhead in the compiled code.

Hekaton’s performance is much better than that of regular SQL Server, a speed up of 20X.

Strengths:

A big strength is that Hekaton is fully integrated into SQL Server, so that users will not have to be convinced to ditch their current solution. However, even if this paper were urging users to ditch their old technology, the performance results would probably still be convincing.

Weaknesses:

I would have liked to see some discussion of why they chose optimistic concurrency control instead of locks. Both types of schemes require blocking at some point: is OCC better because tables are in memory?



Review 10

The paper can be regarded as a response from the industry to the previous paper, “The End of an Architectural Era (It’s Time for a Complete Rewrite)”. The paper introduces Hekaton, which is a new database engine optimized for memory-resident data and OLTP workload. The engine is integrated into SQL Server.

The paper admits that the current database design is based on the assumptions made date back to more than 30 years ago. The main memory was expensive and it was necessary for data to reside in disk since data could not fit into the memory. To address this, people at Microsoft designed a new database engine, Hekaton. Hekaton allows users of SQL Server to selectively declare a table to be memory-optimized without needing to make a complete transition into another in-memory database. This hybrid approach is both good and bad. The main upside is that existing customers of SQL Server can gradually migrate their data from a disk-based system to a memory-based system. The downside is that such implementation to achieve the compatibility with an existing system is bound to prevent a new system from taking a full advantage of new technologies.

The authors mainly describe the internal design of Hekaton throughout the paper. The main techniques involve 1) optimization of indexes for main memory data; 2) eliminating latches and locks; 3) pre-compilation of T-SQL statements and stored procedures. However, their implementation does not take the full advantage of the proposed new architecture. For example, the Hekaton compiler works on top of the existing SQL engine. Here, the performance has been sacrificed for the compatibility. Hekaton also fails to provide a full support of T-SQL stored procedures.

In summary, the paper is a demonstration of a usual approach from many of large software vendors dealing with new technologies. Those vendors want to implement new technologies into their system, but the product still has to be compatible with an older version of the software. In most cases, a gradual change is necessary to capture current customers and this results in a hybrid system. The hybrid system can be perceived as a good thing, which supports two different applications. However, more often than not, the system may lose its strengths and become incompetent in both areas.



Review 11

Main memory has become cheaper and bigger. As a result, many papers are looking into designing main memory optimized databases for OLTP workload. This paper describes Hekaton. Hekaton's design is guided by three ideas: optimize indexes for main memory, eliminate pessimistic concurrency control, and covert compile requests into native code. All three of these ideas work towards the main goal of decreasing the number of instructions per CPU cycle in order to improve performance.

The first idea is obvious in that if entire databases can be stored in main memory, then an indexing structure that refers to pages on disk is obsolete. Hekaton supports two indexing schemes, hash indexes and range indexes. The second idea influenced Hekaton to use multiversion optimistic concurrency control. In Hekaton, whenever a transaction modifies a tuple in the database, a new version of the tuple is made. Every tuple contains a begin and end timestamp to indicate when the tuple was valid. If a transaction is currently modifying the tuple, it is made known to the other transactions. In order to prevent blocking, Hekaton allows transactions to read uncommitted data, but only commit once the data read is committed. This allows for cascading aborts, but allows for better performance if aborts do not occur. The third idea is implemented in the Hekaton compiler, which is responsible for converting stored procedures and statements into C code, which is later translated to machine code. Since SQL contains datatypes not used in C such as timestamps, allows for null values which is not allowed in C, and does not fail silently, the Hekaton compiler first translates the SQL statements into a intermediate representation called PIT before generating C code.

One possible problem with Hekaton was how to get rid of old versions of data. Hekaton implements a garbage collector, which keeps track of the oldest transactions and removes any data that is older than it based on the timestamps. The garbage collector is designed to run concurrently with transactions so as not to block and can be parallelized or throttled to improve performance.

I feel Hekaton has a glaring weakness though I may have missed it in the reading. There was no consideration of long running transactions. As well as supporting ad-hoc queries, database design should also consider long running transactions such as slow user input or system maintenance. The reason I feel like this is a huge problem is because of MVCC and the garbage collection logic used. Whenever a new version of a tuple is made, the older versions exist and are not thrown out until no other transaction can see the row. However, if a long running transaction begins, it would preserve old data until it finishes. This would waste memory resource and, if it runs long enough, may present a problem to Hekaton since nothing is getting garbage collected. Long running transactions can also present a problem if at least once transaction is read dependent on them. Any transaction that is read dependent on the long running transaction or on a dependent of the long running transaction them is forced to wait until it commits. It can be even worse if some some reason the transaction aborts since a huge number of cascading aborts could occur wasting a lot of resources.


Review 12

Comparing to assumption made by most DBMSs that the database is stored in disk, recent hardware improvement has allowed the database to be entirely stored in main memory. This paper discusses about a database engine called Hekaton which is particularly optimized for memory resident data and OLTP workloads integrated in SQL server. Since Hekaton is not a separate DBMS, some tables can remain unchanged while some tables become memory optimized.

The authors state that in order to achieve significant efficiency improvement, minor optimizations do not suffice; the engine must reduce the number of instructions. This is done by special designs such as lock-free data structures, compiled stored procedures, and so on. Besides the new architectures, since Hekaton is integrated in SQL Server, it also leverages existing services provided by SQL Server.

As for concurrency control, Hekaton uses Optimistic Multiversion Concurrency Control (OMCC), thus there is no locking. Since locks or latches contention is often the bottleneck of scalability in conventional DBMS, Hekaton is expected to perform much better in terms of scalability given the growing number of processor cores.

The main contributions of this work are:

1. Specialized memory-resident design significant reduces the number of instructions per transaction.
2. Locks/latched-free data structures allow much better scalability by eliminating the locks/latches contention issue.
3. Proves these two main improvements in experiment results.

An interesting result in their experiment is that the update speedup is more significant than the lookup speedup. It would provide more insights if the cause can be explained in detail.



Review 13

This paper introduces Hekaton, a new database engine optimized for memory resident data and OLTP workloads. Hekaton has high concurrency with latch-free data structures and new optimistic, multiversion concurrency control. The motivation for this is that, as discussed in the "OLTP through the Looking Glass, and What We Found There" and "The End of an Architectural Era" papers, OLTP engines were outdated and needed to be optimized for the new OLTP databases that can fit in-memory.
Hekaton indexes are optimized for memory-resident data, and ensures durability by logging and checkpointing records to external storage and rebuilding tables and their indexes from the latest checkpoints and logs. Latches and locks are eliminated in Hekaton, using a optimistic multiversion concurrency control to provide transaction isolation semantics instead--this results in a system with threads that execute without stalls or waits. Run time performance is maximized by converting statements and procedures in T-SQL to very efficient machine code. hekaton does not partition because that approach is not robust for the variety of workloads handled by SQL servers. The hekaton storage engine manages user data and indexes, provides transactional operations on tables of records, hash and range indexes on the tables, and base mechanisms for storage, checkpointing, recovery, and high-availability. The Hekaton compiler takes abstract tree representation, queries, and tables and index metadata, and compiles to machine code to execute over tables and indexes managed by the Hekaton storage engine. The hekaton runtime system integrates with the SQL server resources and is also a library of addition functionality for compiled stored procedures.

Hekaton ensures serializability with read stability, which is implemented by validating that a version is not updated before a transaction commits, and phantom avoidance, which is implemented by rescanning before commit to look for new versions. Hekaton's garbage collection is non-blocking (never stalls because it runs concurrently with the regular transaction), cooperative (garbage removed proactively when it gets in the way of a scan), incremental (easily throttled and started and stopped to avoid consuming too many CPU resources), and parallelizable and scalable (multiple threads work in parallel and in isolation on different stages of garbage collection). Overall, Hekaton improves performance by 20x and is scalable because of efficient latch-free structures, multiversioning, optimistic concurrency control scheme, and compiling T-SQL procedures into machine code.

This paper was very well written, and I enjoyed the clear comparisons on the multitude of speedup from Hekaton. Limitations on this paper are that it does not discuss how widely Hekaton is used today, and what sorts of new ability and functionality that Hekaton has brought about in database applications.





Review 14

This paper presents Hekaton. This is a new database engine that is integrated into SQL server and is optimized for memory resident data and OLTP transaction workloads. This paper discusses the improvements that can be made to OLTP workloads on DBMS systems. It starts by analyzing the potential improvements of improving scalability, CPI or reducing the number instructions executed per request. The latter of the three was found to have a much more significant improvement over the former two. In attempt to reduce the number of instructions dramatically the authors propose an architecture that optimizes indices for main memory, eliminates latches and locks and compiles requests to native code. The paper then goes on to discuss its implementation and evaluation.

This paper has a few strengths. It is another paper which recognizes the potential for optimization over OLTP workloads. We saw this previously in the paper that uses Shore to isolate sources of overhead in similar transactions. The overlap with this paper in terms of components that could be optimized does not cover all components in either one of the papers. The Hekaton paper mentions compiling to native code and optimizing for main memory, which appear to be improvements over this previous work from 5 years earlier. The paper is strong in its discussion of implementation as well.

The paper has some drawbacks in the evaluation section. The paper discusses experiments first in terms of CPU efficiency for lookups and updates. These appear to be entirely artificial and randomly constructed. This in no way represents how a typical table might be used in the real world. The table structure has a small schema as well, that is not varied in their experiments. I’m also somewhat confused when they mention scaling under contention. They say that they have a workload that represents an online market. I’m not sure if they are trying to say they are simulating this data or using a real data set. It would be useful to know if these are real queries and schema. An insight into where this data came from and a redesign of their CPU efficiency experiment would provide a more meaningful evaluation.



Review 15

Part 1: Overview

This paper proposes a new database engine, Hekaton, which is optimized for OLTP workloads and in memory size data. Hekaton may be accessed by T-SQL language and are stored for further performance improvements. Previous SQL servers are designed for disk storage and multi-thread programs however these can all be rebuilt in a latch free, single-threaded way for better performance. As OLTP workload are light-weighted and small in size, in the time of cheap memory and better hardware reliabilities, we can build in memory database and forget about locks, logs, and buffer control.

This paper provides a theoretical proof indicating that existing SQL server cannot exceed 10-100X throughput improvements and thus concludes the only way to achieve high performance improvement is to change the database design. To actually implement Hekaton, the in memory database system, they optimized the indexes for main memory and call their design Hekaton Index. By being interpreted in machine code, Hekaton can be highly efficient. Hekaton does no partitioning and therefore save up the cost of constructing, sending, and receiving the requests. Implementing a database engine is extremely hard as they need to implement the query processor including the query optimizer and deal with all possible corner cases that come from arbitrary user input. Transactions are logged on some single log file. Checkpoints are used for durability purpose.

Part 2: Contributions

Implemented a new database engine, hekaton, integrated with SQL server instead of being built separately. Hekaton can be accessed by T-SQL and stay together with the regular tables. New data structure is used for preventing interference of transactions however without implementing locks. Theoretically proved that if we want to improve performance relational structure is no longer the right choice.

No partitioning. Hekaton therefore can work with certain workload which may not be partitionable.

T-SQL language commands are actually compiled into machine code which is extremely challenging. Hekaton design is very flexible and can support all query operators. Even with some restrictions, Hekaton design is a brilliant product.

Part 3: Possible Drawbacks

Logs are still used for keeping durability and lead to impurity of in memory processing and thus may not be suitable for some machine without external storage. There are still some cases that Hekaton cannot make sense to generate correct machine code according to user inputs.



Review 16

The paper proposes a memory resident OLTP engine called Hekaton which is integrated to the existing disk based SQL server. A user can create both memory based table, managed by Hekaton, and disk based table which is managed through the conventional SQL server. One important feature of the system is that it is capable of handling queries which involve both kind of tables. To achieve a large performance improvement, the authors proposed various optimizations for memory resident tables while addressing different features including concurrency control and achieving durability for those tables. One optimization is to create indexes which uses lock free data structure which are typically optimized for main memory. In order to make its data structure lock free, Hekaton uses optimistic multiversion concurrency control (MVCC) system as opposed to using locks to provide serializable isolation. This allows it to scale with number of cores per CPUs without a bottleneck due to locking.

Another optimization technique proposed is that instead of using interpreter based execution as in most conventional DBMS, Hekaton compile SQL statements directly into native machine code. This allows room for many kind of optimization, specific to the machine being used, during compile time. In addition, I like how the authors maintain transaction durability by allowing to recover a memory-optimized table after a failure. In order to achieve this they store transaction logs and checkpoints to external disk. They maintain an efficient transaction logging by generating log record at transaction commit time and trying to group multiple log records into one large I/O.

The main strength of the paper is its approach in augmenting the existing SQL server with a memory resident tables. This allows customers to gradually move into a complete in-memory database as the cost of memory is even further decreasing. This approach is is more pragmatical and cost efficient than those a complete rewrite of in-memory OLTP database system. Furthermore, the possibility of using different available services in SQL server is a plus. For example, Hekaton leverages the “AlwaysOn” feature which provides it with high availability. This enriches Hekaton with no additional development costs.

Although I like their approach of a gradual adaptation of memory resident tables, I don’t think that this will persist in the future. The emergency of flash based memory is now making disk based DBMS obsolete. Even Hekaton has to relies on disk in terms of maintaining logs for durability and concurrency control. I think that we have to go to a complete memory based database design approach and discuss about how features like concurrency control and durability could be achieved in this new medium rather than trying to solve the problem partly. The approach taken by Hekaton may not bring the performance gain advocated when handling join queries which involves tables that exist both in memory and disk. In this particular case, overheads due to locking by the conventional OLTP database system undermines the promised performance benefits of Hekaton.



Review 17

Purpose of the paper
This paper provides an overview over a new database engine for SQL, Hekaton. It is integrated into SQL and is not a different database. It is analyzed that it is impossible to have 10-100x throughput improvement by optimizing existing SQL server mechanisms. There are two options available:
1. either reduce CPI
2. or reduce instructions executed dramatically (by 90% fewer), where we need more efficient way to store and process data.

Design overview
Hekaton is optimized for memory resident data and OLTP workloads. Although it is memory-based database engine, it only puts performance-critical tables in main memory. Transactions that only access Hekaon tables can be compiled into native machine code, which is faster. It support gradual conversion from normal database table to Hekaton tables. Latch-free data structures and new optimistic multi version concurrency control technique are used to prevent physical interference among threads and interference among transactions.

The 3 major components in Hekaton are:
1. storage engin
2. compiler
3. runtime system
This paper provides detailed introduction to each part. Evaluation based on CPU cycles are provided, which I think is a fair test because measuring in run time would require us to consider the difference between memory and disk.

Weakness and limitations
A. The new design does posed some limitations. For example, Multi-version requires garbage collection.
B. To minimize the number of run time checks, we need to:
1. compiled stored procedures are limited in number
2. those code can only run in predefined security context
3. stored procedures are schema bound
4. complied stored procedures must execute in the context of a single transaction


Purpose of the paper
This paper provides an overview over a new database engine for SQL, Hekaton. It is integrated into SQL and is not a different database. It is analyzed that it is impossible to have 10-100x throughput improvement by optimizing existing SQL server mechanisms. There are two options available:
1. either reduce CPI
2. or reduce instructions executed dramatically (by 90% fewer), where we need more efficient way to store and process data.

Design overview
Hekaton is optimized for memory resident data and OLTP workloads. Although it is memory-based database engine, it only puts performance-critical tables in main memory. Transactions that only access Hekaon tables can be compiled into native machine code, which is faster. It support gradual conversion from normal database table to Hekaton tables. Latch-free data structures and new optimistic multi version concurrency control technique are used to prevent physical interference among threads and interference among transactions.

The 3 major components in Hekaton are:
1. storage engin
2. compiler
3. runtime system
This paper provides detailed introduction to each part. Evaluation based on CPU cycles are provided, which I think is a fair test because measuring in run time would require us to consider the difference between memory and disk.

Weakness and limitations
A. The new design does posed some limitations. For example, Multi-version requires garbage collection.
B. To minimize the number of run time checks, we need to:
1. compiled stored procedures are limited in number
2. those code can only run in predefined security context
3. stored procedures are schema bound
4. complied stored procedures must execute in the context of a single transaction


Purpose of the paper
This paper provides an overview over a new database engine for SQL, Hekaton. It is integrated into SQL and is not a different database. It is analyzed that it is impossible to have 10-100x throughput improvement by optimizing existing SQL server mechanisms. There are two options available:
1. either reduce CPI
2. or reduce instructions executed dramatically (by 90% fewer), where we need more efficient way to store and process data.

Design overview
Hekaton is optimized for memory resident data and OLTP workloads. Although it is memory-based database engine, it only puts performance-critical tables in main memory. Transactions that only access Hekaon tables can be compiled into native machine code, which is faster. It support gradual conversion from normal database table to Hekaton tables. Latch-free data structures and new optimistic multi version concurrency control technique are used to prevent physical interference among threads and interference among transactions.

The 3 major components in Hekaton are:
1. storage engin
2. compiler
3. runtime system
This paper provides detailed introduction to each part. Evaluation based on CPU cycles are provided, which I think is a fair test because measuring in run time would require us to consider the difference between memory and disk.

Weakness and limitations
A. The new design does posed some limitations. For example, Multi-version requires garbage collection.
B. To minimize the number of run time checks, we need to:
1. compiled stored procedures are limited in number
2. those code can only run in predefined security context
3. stored procedures are schema bound
4. complied stored procedures must execute in the context of a single transaction





Purpose of the paper
This paper provides an overview over a new database engine for SQL, Hekaton. It is integrated into SQL and is not a different database. It is analyzed that it is impossible to have 10-100x throughput improvement by optimizing existing SQL server mechanisms. There are two options available:
1. either reduce CPI
2. or reduce instructions executed dramatically (by 90% fewer), where we need more efficient way to store and process data.

Design overview
Hekaton is optimized for memory resident data and OLTP workloads. Although it is memory-based database engine, it only puts performance-critical tables in main memory. Transactions that only access Hekaon tables can be compiled into native machine code, which is faster. It support gradual conversion from normal database table to Hekaton tables. Latch-free data structures and new optimistic multi version concurrency control technique are used to prevent physical interference among threads and interference among transactions.

The 3 major components in Hekaton are:
1. storage engin
2. compiler
3. runtime system
This paper provides detailed introduction to each part. Evaluation based on CPU cycles are provided, which I think is a fair test because measuring in run time would require us to consider the difference between memory and disk.

Weakness and limitations
A. The new design does posed some limitations. For example, Multi-version requires garbage collection.
B. To minimize the number of run time checks, we need to:
1. compiled stored procedures are limited in number
2. those code can only run in predefined security context
3. stored procedures are schema bound
4. complied stored procedures must execute in the context of a single transaction


Review 18

This paper described a SQL OLTP engine called Hekaton.
Hekaton takes advantage of the dramatically increased size of current database management system, and put some tables into memory to improve query execution performance. But it is only a SQL engine instead of a full DBMS.
It has several design considerations:
1) Optimize indexes for main memory
Because tables and indexes are now stored in main memory instead of disk, some optimization must be done to utilize this I/O performance improvement.
2) Eliminate latches and locks
This is needed because of the nature of OLTP.
3) Compiler requests to native code
OLTP are likely to run the same query over and over again, so compile will save time.
4) No partition
Some data are not partitionable.
Then the high-level architecture of Hekaton is shown. It contains three major components: storage engine, compiler and runtime system. Then each of these three major components is discussed.
Finally, experimental results are given.

Contribution:
I think the main contribution of this paper is the idea that we can use the good part of OLTP systems, and move some tables into main memory to improve performance. This does not require a complete overwrite of the system. And many old optimizations could still be used.
Another contribution in my point of view is that it uses query compilation instead of interpretation.



Review 19

The purpose of this paper is to discuss the memory-optimized database engine introduced in SQL Server namely; Hekaton. The database engine has been built recognizing the need to take advantage of the fact that memory has become a lot cheaper over the years and whole databases can fit into main memory. Besides that, there are a few major concepts that this system is using in order to provide a high concurrency and performance.

Hekaton can be summarized using these points
1.Memory optimized tables are managed by Hekaton and stored entirely in main memory.
2.There are special SQL stored procedures for Hekaton tables that are converted into native machine code thereby increasing the speed at which the query can be executed.
3.The system uses Multiversion concurrency which involves using a versioning system in order to identify the validity of a value in a database and therefore, no overhead of using locks or latches while preserving the atomicity and resulting in high concurrency.
4.The range indexes for the Hekaton tables are constructed using Bw trees i.e, a lock-free version of B-trees. A combination of lock-free hash tables and range indexes are used in order to access the rows of the Hekaton tables.
5.The Hekaton tables are durable using normal Sql Server transaction logs that are stored in the disk of the computer rather than memory so that recovery is possible in case of failure.


Some of the advantages that I perceive is the fact that Hekaton is completely integrated with SQL server. Not only is the syntax similar to T-SQL which reduces the learning curve for a customer trying to use hekaton but also the fact that it can be used in conjunction with normal SQL tables. Using Multiversion concurrency removes the need to use locks or latches which are biggest blocking factors in a database engine. The authors have also defended the idea of not having used partitioning as other memory-resident databases in order to support a variety of OLTP workloads. The Hekaton tables can have high availability while being used with Always on, which also results in preserving the durability property of the database.

Even though the authors have shown a performance improvement after having used stored procedures that are converted to machine code, these procedures can only work with Hekaton tables. This may result in not such a great performance when executing transactions over a combination of Hekaton and regular tables. Hekaton only supports Select, Insert, Update and delete which means that tables cannot be altered without being deleted and recreated. Since it supports only inner joins and couple of other aggregate operators, the queries are quite restricted in terms of being able to work with the Hekaton tables. The experiment was run on a table with only three columns which felt like too few columns considering a typical OLTP table even though the performance improvement is definitely spectacular.



Review 20

This paper introduce a database engine that is optimized for the in-memory data. This engine has higher performance than traditional DBMS and is fully integrated into SQL.
As the price for main memory goes down, it is possible to have the working set inside the memory. Hekaton engine is optimized for in-memory database and multi-core CPU. Only the most performance critical tables could be put in memory and managed by Hekaton. The Hekaton uses multiversion concurrency control instead of using lock and latch to provide better concurrency.
In the paper the author introduce the three components if Hekaton engine: storage engine, compiler and runtime system. It also talks about how the multiversion concurrency control works with read, update and how to utilize and manage the index of Hekaton managed tables.
One reason Hekaton can provide better run time performance is because it converts the SQL statements and stored procedures into highly customized native code. The paper talks about how the engine compile the schema and stored procedures. Later, the paper talks about how the engine uses the log and check points to provide durability for a in-memory table after failure and how the recovery works after failure. Then the paper talks about how the garbage collection works if data is no longer visible to any active transaction.
In the end, the paper use the experiment result to show that the Hekaton uses have much better performance in lookup and update and also have much better scalability than traditional database.

Strength:
From the fact that in-memory database is available now, the author introduce their database engine and fully explains the details for every part of the implementation. This engine is different from other in-memory database and most importantly is can integrated into SQL server.

Weakness:
As a know, for the normal database engine, buffer manager plays a important role in DBMS and performance. The author should also introduce how the buffer manager changes in SQL Server when adding the Hekaton into SQL Server as it now should manage both in-memory table and traditional table.


Review 21

The authors of the paper added the ability to keep tables in a relational, SQL database in-memory for performance. This has the advantage that from a user's perspective, the only difference between in-memory tables and regular tables is a configuration option. The system (Hekaton) has several interesting components: optimized indexes for main memory, lock-free data structures, and JIT compilation. Persistence across failures is handled by writing logs and checkpointing to durable storage. In an evaluation, Hekaton is shown to scale well and have high throughput (with a very wide margin between it and non-memory resident data).

I was surprised to see that Hekaton produced C code which was fed into a standard C compiler. I was expecting to see some kind of intermediate representation like LLVM IR or CLR IL (the .NET intermediate representation). These representations have mature JIT compilers whereas C compilers are usually built for producing fast code (putting far less emphasis on quickly producing code). On the other hand, I was glad to see that the authors compared their design to previous work. For instance, the authors noted a number of database designs in previous research that use partitioning in their discussion about why they chose not to use it.


Review 22

This paper proposes a new DBMS called Hekaton that is integrated in SQL Server and is in memory. Over the past few decades, main memory has gotten cheaper and more abundant, but DBMS are still optimized to use disk. Hekaton provides the ability for database administrators to still use SQL Server, but have specific tables residing in memory for optimized performance. The main differences in Hekaton are that it uses lock-free data structures and a new type of concurrency control.

The main concerns for the designers of Hekaton were optimizing how indexes worked in memory, eliminating the need for locks, and increasing performance by compiling stored procedures into machine code. The first two concerns are solved by the innovative multi-version concurrency control algorithm. Hekaton uses lock free hashes and lock free B-trees for indexing, and for each update, a new row is created instead of updating the existing row. The major performance increase, though, is in the compilation. T-SQL queries are still run through the query optimizer, but then they are combined with table and index metadata to produce c code and finally machine code.

Even though Hekaton makes practical innovations by integrating with SQL Server, I still have the following questions and concerns:

1. Many of the new optimizations in Hekaton were not individually tested to see what the benefits of each were. For example, does the new multi-version concurrency control mechanism really outperform other types of optimistic concurrency control?
2. When a transaction has a commit dependency, it has to wait until the other transaction has committed. This could create chains of commit dependencies and one abort might create an abort chain. Is there any way to avoid this chain?
3. Similar to the previous issue, all transactions in the commit chain will have to abort and restart, but there is no guarantee that it will succeed the next time around. How are transactions guaranteed to not starve?
4. The paper explains that lock free hashes and lock free B-trees are used, but the implementations of these structures are never explained.



Review 23

Hekaton is a DBMS optimized for databases that fit entirely in main memory. It uses only latch-free data structures and uses a multiversion concurrency control technique in order to provide high concurrency for transactions. Hekaton, like the DBMS we read about in previous papers, revisits some assumptions that were made when relational databases were designed in the 70s and 80s and reevaluates them. By eliminating latches and locks, compiling requests to native code, and optimizing indexes for main memory, it attains throughput 10-100 times that of SQL Server. Hekaton is built on top of SQL Server and fully integrated with it. This allows it to take advantage of many of the optimizations that have been built in SQL Server over the years while also implementing new strategies and optimizations that are more suited to today's technology.

Hekaton uses continuous checkpointing in order to solve complaints associated with hyper-active checkpoint schemes.This relies on streaming I/O. Hekaton maintains delta files to store information about which versions contained in a data file have subsequently been deleted, and a checkpoint file inventory tracks references to all the data in the delta files to make a complete checkpoint.

Hekaton also implements garbage collection. Garbage collection is non-blocking and cooperative. This saves processing time, as garbage is removed proactively whenever it is blocking a scan. However, collection can also be throttled in order to avoid consuming excessive CPU resources.


Review 24

This paper implemented Hekaton database engine, which is optimized for high large main memories and many-core processors. The motivation for main memory databases is that the assumption of expensive main memory does not hold now. The trend of SQL server is to build database engine optimized for large main memories and many-core CPUs. In addition, Hekaton has other features that are desired. For example, it is integrated into SQLServers, having flexibility property. Therefore, this paper introduce Hekaton database engine and compare its performance with regular data engines.

First, the paper described the key advantages of Hekaton. It was integrated into SQL server, which has benefits of flexibility. The customer can save only the most performance-critical tables in main memory, while leaving other tables unchanged. In addition, another property of Hekaton is that it uses lock-free data structures in order to avoid physical interference. Therefore, given the situation and assumption, the design considerations of Hekaton are optimizing indexes for main memory, eliminating latches and locks, compiling requests to native code, and no partitioning to avoid request communication overhead.

After describing the advantages of Hekaton database engine, the paper talked in more details about the implementation. Hekaton maximizes run time performance by converting SQL statements and stored procedures into native code. This is one of the most important implementation and improvement of Hekaton. To deal with transaction issues, the paper introduced the transaction commit processing and durability, including logging and checkpoints. Comparing the performance of Hekaton with other data engines, the Hekaton outperformed in two aspects. The first one is CPU efficiency. Because Hekaton converts SQL statement into native code, it improves the CPU efficiency. In addition, Hekaton also improves the scalability of DBMS since it does not use locks and latches. Thus, the paper verified the advantages of Hekaton by describing its implementation and providing experimental results.

The strength of this paper is the organization. The paper first talked about the motivation for main memory database, and then the implementation of Hekaton. Last, it implemented Hekaton and conducted performance comparison. This gave readers a complete design flow of Hekaton. In addition, it discussed in details about the transaction issues, which is important in database design nowadays.

The weakness of this paper is few examples when discussing transactions. When it talked about the transaction issues of Hekaton, it only described the rules. However, providing some examples will assist readers in gaining main ideas of this topic.

To sum up, the paper implemented main memory database engine called Hekaton, which is optimized for large main memories and many-core CPUs. It achieves high performance and scalability by using efficient latch-free data structures, and compiling SQL stored procedure into efficient machine code.


Review 25

This paper is an introduction to Hekaton, which is another in memory-optimized OLTP engine. Similar to other recent OLTP papers, Hekaton realized that what was once true about databases needing to read from disk because there wasn’t enough memory to store the entirety of the data in memory is no longer true. In fact most if not all OLTP tables can be stored entirely in memory, or at least the entire working set in memory for a modern day system. Because of this, Hekaton was able to remove some overhead and increase performance.

Hekaton was able to remove overhead in logging, latching, and locking, similar to the other papers in this field. The other main thing that Hekaton was able to do that helped improve performance was compile with native code for queries rather than interpreted code. This produced the largest speedup as interpretations greatly hurt performance.

The main trick to how Hekaton works is it creative use of timestamping. Each record has a begin and end timestamp for when it was created and when it was made obsolete. This is useful because you never overwrite a record you just insert a new one and update the old one to have an ending timestamp, thus making it obsolete. This is also important for garbage collection and rolling back, and it would work as you would imagine by checking the timestamps. One neat thing about their garbage collection is that it is multithreaded and non-blocking, so garbage collection can be done while other code is running without limiting access to data.

There were a few weaknesses I noticed with this paper:
1.) They didn’t compare themselves to other in memory OLTP systems (unless I am mistaken). Their results were in comparison to current OLTP systems, making their 20-35 times improvements seem very impressive. However, having just read another in memory OLTP paper I know that H-Store was able to generate 82 times performance upgrades and I would have liked hear more about what they left out that was put into H-Store and why they made those decisions.
2.) They briefly had a section on query processing restrictions (pretty important subject) and stated “compiled stored procedures support a limited set of options and the options that can be controlled must be set at compile time only”. This felt like an important sentence to just slide in there as it is very significant how limited those options are and why they are limited that way. I would have liked more discussion on that as well.
3.) This is a smaller issue but the last graph I felt was a poorly setup graph that didn’t do a good job explaining itself. The main issue I had was the “number of cores” section should have just been the X-axis to the graph, not a data field that showed up on the Y-axis, especially when it’s max was 12 and the minimum step of the graph was 5,000 so a difference between 2 and 12 is unnoticeable.

Other than that I think it was a good paper. I am a strong supporter of current in-memory OLTP systems and think they should definitely be adopted. They offer strong performance increase and no significant downside. Because of that I like this paper and others like it because all innovations in this field I view as important and helpful. Also this paper did a great job of describing Hekaton so that readers could get a clear understanding. Overall, definitely a good paper and worth the read.



Review 26

As hardware has changed over recent years (including the cheapening of RAM/Disk storage, and Moore’s law applying to processors, etc), the need for a different DBMS scheming is also necessary. Hekaton is, specifically, an OLTP engine optimized for modern hardware in the SQL Server environment. Since the market calls for high throughput and low-latency, the computer that works best for Hekaton is one with large memory, many cores, and fast disk access.

Most data lives in memory with Hekaton; there are pointers directly addressing row values, index values living in memory, stream-storage, and no write-aheads or buffer pooling. Since most actions can be done in memory, the creators of Hekaton were able to implement a lock/latch-free system, with (ACID) multiversioned concurrency control. The idea is to provide an optimized system that has a hybridized engine without changing the interface so it can be accessed just like a normal table. Performance critical data can now fit in memory, so the speed of a cache is achieved without sacrificing any of the capabilities of a “traditional” database. In addition, to properly recover from failures, a checkpointing system including a version history of data changes and logs was also incorporated into the Hekaton pipeline.

The accomplishment of a highly concurrent OLTP execution system is impressive, but I am curious about the design choice for latch-free/MVCC/speculative reading. This means that read-only events can occur without waiting, but speculative reads might force a wait in primary transactions. Another disadvantage to the system is that stored procedures (which are, by nature, in-memory) can only access tables which are also in-memory, requiring more care to determine which procedures to place in memory (in-memory tables that reference disk-based tables will be slow due to additional overhead).


Review 27

With increasing RAM sizes, fitting large tables in databases is now feasible and has many performance gains to be realized. This paper presents Hekaton, an in memory database that is an integrated part of MySQL. Hekaton’s is architected specifically to deal with in memory workloads and is easily integratabtle into an existing MySQL system.

Hekaton’s unique architecture focused on three core principles to achieve vast performance gains:

1. Tables can reside in memory simply by marking a table to be memory optimized. Hekaton does away with traditional B tree style indexing in instead opts for lock free hash tables and Bw trees which are lock free versions of b-trees.

2. Hekaton does away with all locks and latches and instead opts for a optimistic multi version concurrency control system. Thus, the system can execute without any kind of busy waits or stalls.

3. Hekaton has a complier that translates T-SQL stored procedures into native C. Having native C code offers obvious speed gains that can heavily improve performance for readily used stored procedures.

While Hekaton has many upsides, it still contains many limitations. Hekaton cannot service queries across databases which can be an issue for many large corporations. Another large draw back of Hekaton is that databases with Hekaton objects cannot have snapshots. Hekaton also only supports row sizes of up to 8060 bytes which again can be a very problematic depending on your use case.


Review 28

This paper talks about Hekaton, a new database engine optimized for memory resident data and OLTP workloads, which is fully integrated to SQL Server. It is very easy to use via SQL Server. The paper gives an overview of the design of the Hekaton engine and reports some experimental results.

The paper starts with the design of Hekaton: the architectural principles (optimize index for main memory, eliminate latches and locks, compile requests to native code) and the “no partitioning” policy. Hekaton has three major components: Hekaton storage engine, Hekaton, and Hekaton runtime system. The paper explains how these components interact with SQL Server and uses facilities in SQL Server, such as catalog, query optimizer, high availability, and transaction management. In Hekaton, a table is stored entirely in memory, and records are accessed via index lookup. The paper explains the storage and indexing (hash and range indexes) in Hekaton, including how CRUD operations work. Next, it discusses query processing, in which Hekaton maximizes run time performance by converting SQL statements and stored procedures into highly customized native code. But the paper then explains the limitations of current implementations. Hekaton utilizes optimistic multiversion concurrency control in ensuring read stability and avoid phantom read. Instead of log, it relies on timestamp to let system know which data are visible and which are not. Then the paper explain the technical operation of commit process and transaction durability. Garbage collection in Hekaton is also explained. Last, of course, is the experimental result in which Hekaton proved to speed up the performance.

The main contribution of this paper is the explanation of a new database engine which is memory-centric but is easily used since it is integrated with a more mature, widely known database engine. There has been talks about utilizing “new” database engine concept which optimize the use of memory instead of disk, but usually the research involves a standalone database engine. It could very well open the door for another implementation integrated with another, mature database engine/DBMS that is widely used since it means that the new database engine would be able to use facilities that is already built in the other DBMS. Another contribution is showing that compiling query and procedures in native machine code helps improving the performance.

However, with all the performance benefit, I think the paper does not explain how to move the OLTP data into a more sophisticated data warehouse. Would all the data remain in the memory? In the end, those transactional data would be used for analysis, wouldn’t it? Does this mean that user has to move it manually from memory to disk?



Review 29


The purpose of this paper is to introduce a new database engine that is integrated with the SQL Query DBMS that provides further optimizations for databases that live entirely in-memory instead of on-disk and also optimizes for online transaction processing (OLTP) workloads. It shows how current DBMS utilities don’t do so well with these kinds of workloads (pointing mostly to previous research, some of which we read for this week), and then runs through how Hekaton, their system, runs alongside SQL server and how it is and isn’t integrated with existing SQL Server features.

The technical contributions of this paper are mainly in the end product : Hekaton is integrated with an existing SQL Server and provides lots of migration support (i.e. any existing DBMS that is implemented in SQL Server can be migrated to Hekaton). Hekaton tables are compatible with normal SQL tables as are the compiled transactions that Hekaton produces. This paper presents a compiler structure that compiles queries to be run on Hekaton tables into C before it uses a C compiler to translate the query execution instructions (now in C) into machine instructions for extreme levels of optimization. I think this is a main technical contribution, because I can see how this pipeline would be useful in the adaptation of Hekaton-like system to other popular relational DBMS’s other than SQL Server.

I think this paper is strong in its experimental evaluation and in its completeness. I think the authors do a good job in the structure of the paper in laying out all of the key ideas that will be presented in a section before elaborating much more fully in subsections. Additionally, there is a complete and very detailed discussion of results, which I really appreciated.

As far as weaknesses go, I think there are some confusing sections of the paper where they seem to state that their system does one thing, and then contradict this statement in a later section. For example, this happens in section 6 when discussing Commit logging and transaction rollback. One section states that transactions are irreversible after being successfully logged, but the subsequent subsection discusses the method for rolling back transactions. I don’t believe there are any other fundamental weaknesses of this paper.




Review 30

Paper Title: Hekaton: SQL Server’s Memory-Optimized OLTP Engine

Paper Summary:
This paper presents an overview of the Hekaton engine design and reports some experimental results benchmarking its performance. Hekaton engine is a database engine that’s optimized for memory resident data and OLTP workloads. The motivation of this design, similar to the previous papers, is the advance of the hardware technology recently achieved. To take advantage of the advances such as larger and cheaper memory, multi-core architectures, the database management algorithm has to be adjusted accordingly. T-SQL is such a modification.

Paper Details:
This paper provides details of Hekaton implementations including its engine, compiler, runtime system, etc. While there are many aspects of the Hekaton architecture, the paper focuses on it’s storage, query processing and transaction management. Comparing to the old fashioned design, an interesting point is that in Hekaton, by taking advantage of the available large memory space, it has the advantage of storing almost the entire system in the memory thus avoided the overhead of reading and writing to the disk from memory. While this is a common thing nowadays, it was probably a significant improvement at that time.

In the experiment section the paper demonstrates some comparisons of the performances and a general observation is that the Hekaton system outperformances the baseline model by orders of magnitudes. However this is probably not that surprising since it is a comparison between a developed model using advanced hardware with a simple/out-dated baseline model.


Review 31

Summary:
This paper introduces a new database engine named Hekaton that optimized for memory resident data and OLTP workloads. The system was integrated into SQLServer, and can be easily queried using T-SQL. Moreover, Hekaton implements a lock free concurrency control. It uses multi-version transaction control along with the help of timestamp synchronization, transactional logs and checkpoint, and also garbage collection to archive a fully functional and durable transaction control mechanism. It also uses no partitioning physical design as well as memory optimized indexes to improve its performance. Based on the experiment result, it improves the performance of OLTP transaction by an order of magnitude.

Strengths:
1. It implements the Hekaton engine, which is a magnitude fast on OLTP transaction operation and also easy to use since it was integrated to SQLServer. Moreover, it can be access by T-SQL, just need to declare a table optimized.
2. This paper gives complete step-by-step introduction of the system as well as the design principle behind the system, which helps its reader to understand it clearly.
3. It gives detailed example with instructional graph to illustrate the concepts, which make the concepts more comprehensible for its readers.

Weakness:
1. It uses the latch free multi-version concurrency control to improve the performance of the system. However, the multi-version concurrency control system needs to keep multi-version of the same data set, which may leads to a waste of memory, especially for heavy transactional workflows.
2. It uses the Garbage collection algorithm to cleanup the versions that are not used for future. However, since these procedures are run in background, heavy transactional workflows could also generate a lot of overhead because of the GC. Moreover, if the GC doesn't cleanup the stale versions, it may also cause a lack of memory even though there is potentially enough memory to use.