Main memory database system is taking advantage of the fast and random access of physical memory. However, current mainstream database is still using disk storage and standard SQL server designed for disk resident database. Hekaton engine provides an option to the current database system to transfer some of the critical part into main memory Hekaton table in order to increase performance. |
Hekaton is designed to optimize main memory resident data with special design on concurrency control, access methods, query processing and the method they used to transfer stored procedure into native machine code.
Some of the strength of the paper are:
1. Hekaton is using flexible indexing for different tables, b-tree index for regular tables to satisfy block-oriented feature of disk storage and hash index for Hekaton tables to take full advantage of random access.
2. Hekaton fully integrated to SQL server instead of developing new main memory server, so that the old database structure can become compatible with the new main memory database structure.
3. The design of letting working thread check for garbage versions creates parallel garbage collection which increases efficiency.
Some of the drawbacks of the paper are:
1. The paper didn’t give enough background on the advantage of main memory database system theory.
2. The query processing section is vague. Some of the main concepts are not clearly illustrated. For instance, the critical tree structure pure imperative tree (PIT) is not well presented and analyzed.
The traditional assumption that main memory is expensive and data resides on disk no longer holds due to the decreasing prices of memory. Nowadays the majority of OLTP databases fit in memory, and therefore it is time to design database engines optimized for large main memories and many-core CPUs. This paper proposed a new design called Hekaton, which is targeted for OLTP workloads.|
Hekaton consists of three major components: storage engine, compiler and runtime system. As Hekaton aims to achieve 10-100X higher throughput, this paper proposed three design principles:
1. Indexes are designed and optimized for memory-resident data to reduce instructions per transaction. Durability is ensured by logging and checkpointing records to external storage; index operations are not logged. During recovery Hekaton tables and their indexes are rebuilt entirely from the latest checkpoint and logs.
2. Using latch-free(lock-free) internal data structures and multiversion concurrency control to provide transaction isolation semantics.
3. Converting statements and stored procedures written in T-SQL into machine code to maximize run time performance.
The main contributions of this design are as follows:
1. The Hekaton engine is fully integrated into SQL Server. Not being a separate DBMS offers 4 advantages:
1. It is simple to use this functionality without hassle and expense of another DBMS.
2. Only the most performance-critical tables need to be in main memory; other tables can be left unchanged.
3. Stored procedures accessing only Hekaton tables can be compiled into native machine code for further performance gains.
4. Conversion can be done gradually, one table and one stored procedure at a time.
2. Hekaton tables are fully transactional and durable and accessed using T-SQL in the same way as regular SQL Server tables. A query can reference both Hekaton tables and regular tables and a transaction can update data in both types of tables.
3. T-SQL stored procedures that reference only Hekaton tables can be compiled into machine code for further performance improvements.
4. Hekaton uses only latch-free data structures and a new optimistic, multiversion concurrency control technique to provide high levels of concurrency without partitioning, and therefore has a much better performance.
By using these techniques, Hekaton achieves 10-100X higher throughput.
The main drawback of this design is as the primary goal of Hekaton is to support efficient execution of compile-once-and-execute-many-times workloads. Therefore, when the workloads are mainly ad hoc queries, this design may not be suitable.
Problem & Motivation|
The traditional DBMS based on the assumption that the main memory is expensive and data resides on disk; however, this assumption is no longer valid. So the authors propose the Hekaton which should satisfy the following requirements:
1. Optimized for memory resident data and OLTP workloads.
2. Should be integrated into SQL server (instead of a separate DBMS).
From my understanding, the Hekaton has many impressive outcomes. Firstly, it is a lock-free data structure because it utilizes an optimistic multi-version concurrency control. Secondly, it can be integrated into SQL server and transfer the SQL server table into Hekaton table by user command. In this way, it can actually use the original data structure and get the original customer. And also, the Hekaton and SQL server together can be treated as OODB. Thirdly, the query processing structure which combines the hash index and range index is impressive. The example the authors give on how to use the structure to implement read and write is clear and vivid.
Lack enough background for the workflow of the SQL server and therefore, I do not get the general sense of the SQL language it used after reading section 5.
For many decades, database systems were designed with the assumption that data resides on disk, due to the relatively high cost of main memory. Market realities have changed in recently, however; memory prices have dropped by a factor of 10 every five years over the past 30 years. In response, SQL Server, a very widely used relational DBMS, has added a new database engine, Heckaton, that is designed to run using main memory as the primary form of storage. The paper “Hekaton: SQL Server’s Memory-Optimized OLTP Engine” details the implementation details as well as some performance results of Hekaton. |
As motivation for the project, the authors cited the difficulty of achieving 10-100x speedups solely by optimizing scalability and reducing the number of cycles per instruction (which only yield a 3-4x improvement). As such, reducing instructions per transaction, while remaining competitive in the other two aspects, is their goal for Hekaton. They do so by following three principles in designing the architecture: optimizing indexes for main memory, eliminating the use of locks and latches, and compiling the queries to machine code. Additionally, Hekaton is different from other memory resident database systems (MMDBs) in that it does not partition the database, with the reasoning being that often, the workload is not partitionable, which actually increases work done due to the overhead involved. Hekaton is composed of three components: the storage engine, compiler, and runtime system. The storage engine manages data and indexes, as well as providing transactional support and other fundamental database features like checkpointing and recovery. The paper also describes the timestamp system employed for concurrency control, as well as how garbage collection is handled. Additionally, checkpointing is done using data and delta files, which store information about inserts/updates and deleted versions, respectively, as well as some methods to prevent the number of files from growing out of control by merging old data/delta files and other methods. The compiler, on the other hand, converts SQL queries and procedures (in the form of an abstract tree representation) into native code. Finally, the runtime system provides integration with other SQL Server resources.
This system has a number of notable strengths besides being a viable MMDB implementation. For one, Hekaton is directly integrated into SQL Server, and all the user has to do in order to use Hekaton is to declare a table(s) as memory optimized. This makes it very accessible in comparison to other stand-alone systems, and especially appealing to users who already have a lot of data stored in conventional disk-based DBMSs like SQL Server. The Hekaton design also manages to achieve good scaling as well as speedups of 10-30x, depending on the number of transactions and type of task (lookup vs. update). In fact, the performance advantage of Hekaton increases as the number of transactions increases.
One weakness of this paper is that the authors actually do not formally address the shortcomings of Hekaton, only hinting at some throughout the paper. For example, the use of optimistic concurrency control, along with the fact that in their system, conflicting transactions need the preceding the transaction to complete, which leads to the possibility of cascading aborts (e.g. Transaction T3 aborts, causing T2 to abort, which causes T1 to abort, etc.). Also, a direct comparison of performance results with other memory resident systems, i.e. an apples to apples comparison, would have been helpful in seeing how good Hekaton is compared to other in-class systems.
The purpose of this paper was to introduce the Hekaton database optimized for memory resident data as opposed to disk resident data. This is significant, as the paper shows with a brief history, because at one point many of the prominent DBMSs were designed under the assumption that main memory was expensive and data was stored on the disk, which quickly became untrue with improvements in technology. Although there are other existing implementations of MMDBs, this one is valuable since it is integrated directly into the SQL server which was built on the assumption in the previous sentence.|
The Hekaton system is implemented on three design principles, which were essentially optimizing memory access for main memory as opposed to disk, using multiversion concurrency control (MVCC) to eliminate locks on shared resources that might lower transaction throughput, and compiling native functions to a set of optimized machine code instructions. Hekaton also does not partition it’s data for the sake of saved speed and complexity in only requiring one lookup query to one table instead of multiple queries each to different tables and waiting for the results to return. The three main components of the Hekaton system are the Hekaton storage engine, the Hekaton compiler, and the Hekaton runtime system. The storage engine handles indexing of data, transaction management, as well as updating checkpoints and executing data recovery. The compiler turns T-SQL functions to optimized machine code to be executed on the storage engine. The runtime system integrates with the SQL server system. When plotting transaction throughput versus number of cores, the Hekaton system showed better scaling with increased computational resources over latch-based systems due to less lock contention.
I liked how this paper was organized similarly to a lot of what we have read in the past. This helped follow and locate information more easily since a pattern is being learned. There was even a section detailing the familiar pattern of detailing the considerations that went into the design and any background that is necessary to know, then describing the architecture, and finally taking a closer look at each of the main contributions. I also always appreciate the iterative deepening of information presented, like referencing the system architecture in section 3 before elaborating on it in more depth later in the paper. I thought in general the graphics were also helpful, detailed, and clean. And the appeared often to complement the text, which I appreciated.
I did not like how the paper assumed a bit much as far as the reader’s prior knowledge. I’m sure it was to cover ground, however so I did not mind since we read the main memory database paper before this. However if this was not the case I’m sure a less familiar reader might take a bit more time to get up to speed on the new contribution being presented. However concepts like T-SQL stored procedures were much less familiar to me.
There are several main memory database out there but this paper chooses to introduce Hekaton because it is somewhat unique. To start with, Hekaton is the database system that optimized for memory resident data and OLTP workload. And it is not a separate DBMS but part of SQL server. Then the Hekaton are fully durable and transactional. It also uses T-table, which stores procedures that reference only Hekaton tables that can be compiled into machine code. And it also designed for high level concurrency.|
The Hekaton is designed to have 10-100 improvement of throughput that could be possibly achieved by the following methods: 1)improving scalability 2)improving cycles per intrusion 3)reduce the number of instructions executed per request. The last one is proven to be the only direction for the improvement. This is done by indexes optimization, locks elimination, highly efficient machine code conversion.
The paper also introduces the overall architecture of Hekaton including storage engine, compiler and runtime system. For storage and indexing, Hekaton supports hash indexes an range indexes, and multi-version strategy is adopted which any update will create a new version. For run time performance maximization, Hekaton chooses to reuse the existing SQL server compilation stack, generates MAT and then transforms it into PIT which can be converted to C code and then produce DLL. For transaction management, it uses MVCC to provide snapshot isolation without using locks. For the transaction safety, Hekaton logs the transactions and checkpoints to the durable storage. As a MVCC, Hekaton adopts garbage collection mechanism which has the properties of non-blocking, cooperative(subsystem), incremental processing and parallel GC strategy. The experimental results show the efficiency and scalability.
Hekaton is one hundred in Greek, it is designed to reach 100x performance improvement. The original SQL server is designed to be disk optimized. Hekaton is a new database engine that is proposed by the author. It is optimized for memory resident data and OLTP workloads. Hekaton is designed in table level. The main properties are shown below: optimized indexed for main memory, eliminate latches and locks, compile requests to native code. |
The paper pointed out the several differences in the Hekaton. Hekaton is optimized for byte addressable memory instead of block addressable disk. It should have at least one index. All indexes should be created before table created. In this way, records are always accessed via index. One is that Hekaton has no latches or locks. In Hekaton, the transactions are processed under Optimistic multi version concurrency control. Also, Hekaton designed to support multiple concurrently generated log streams. There are two mode of Hekaton engine, interop mode and native mode. The interop mode goes through query interpreter and is used to query both disk data and memory data. Therefore, it would support all operation. THe native mode is optimized for compile once and execute many workload. The query is compiled and can only applied on Hekaton tables. Also, it does not perform any runtime checks.
The main contribution is that this OLTP engine is added to Microsoft SQL Server 2014. It is fully embedded in the SQL server that no additional license needed and no need to maintain 2 database and use different queries. The engine that compile the database query into C code and addressed several potential problems.
However the weak point of this paper also exist. I didn’t find that the author talks about the isolation level implementation by the author. Another point is that I do think it would be better that if the author could compare this model to other existing main memory models.
Hekaton is a new engine built in to SQL server for memory resident data. As memory has gotten cheaper, it has become more viable to store tables completely in memory, leading to the potential for significantly better performance. Hekaton is unique in that it is not a completely new database; users can simply specify that a table is memory optimized and from there, it will be stored in main memory. This is useful for a number of reasons: a) it allows users to write transactions that will access tables both in main memory and on disk and b) it allows users to slowly convert some tables without committing to using a main memory database for all tables.|
Hekaton uses optimistic MVCC for concurrency control. Before committing, transactions confirm that the data that has been read has not been updated and that no phantoms exist. If there is a problem, the transaction is aborted. Because locks are not used, query processing can be faster under low contention - note that this architecture is a bit different from what was mentioned in “Main Memory Database Systems: An Overview,” where the authors seem to have little concern about locks for main memory databases (compared to traditional systems).
The authors make it clear that they aim to increase throughput 10-100X, and they cannot do that through simple optimizations. To hit this goal, one of the overhauls is the addition of compiled stored procedures, which are traditional T-SQL stored procedures compiled into C. This does mean that like many advanced systems, hekaton is optimized for workloads that are known in advance. It is shown that this yields great results, however - on page 1252, compiled procedures are shown to be significantly more efficient than interpreted, in terms of CPU usage. The use of these are well explained in the paper.
One piece of the paper that I thought could have been stronger was the experimental results section. Regarding the experiments for CPU efficiency, it seemed as though the authors did not use an existing benchmark and instead made up their own. I did not feel as though there was quite enough information about size, etc., to be able to replicate the experiment exactly. Additionally, it seemed as though the comparisons were only against traditional SQL servers, not other main memory databases. I think that comparing to other main memory databases would have added credibility to the results section.
This paper introduces Hekaton, a new database system that is optimized for memory resident data and OLTP workloads. Though several main memory database systems already exist, Hekaton purposed several novel features which haven’t been achieved by other systems. One of the main goals of Hekaton is to dramatically reduce the number of instructions executed for per request, which the authors think is the only hope to significantly increase system throughput.|
One method used by Hekaton is to allow any thread to access any record without acquiring latches or locks. To achieve this goal, it utilizes one version of hash index and range index that doesn’t need locks. Also, Hekaton uses multi-versioning for its data record. Whenever a request needs to read a record, it checks the begin/end timestamp of a record to determine if the record is visible to it. For updates, a new version of the record will be created. Of course, this will lead to nonserializable results. To eliminate that problem, Hekaton uses optimistic concurrency control. Whenever a transaction commits, Hekaton will verify that the version it read has not been updated and that no phantoms have appeared, otherwise, the transaction needs to be aborted.
Another major contribution of the paper is compiled stored procedures. Traditionally, database systems use interpreter based execution mechanisms and perform runtime checks. It will be much faster to compile instruction into native machine code. Hekaton will first translate stored procedures into C code and then use C/C++ compiler to convert it into machine code. It’s notable that the interface for each operator is not in function form, instead, labels and gotos are used. The authors claim that this design can support any query operator (blocking or nonblocking) and results in the fewest number of instructions.
From an engineering perspective, one contribution of Hekaton is that it’s fully integrated into SQL server. It allows users to gradually convert tables and application to take advantage of the performance improvements provided by Hekaton. The only downside is that currently the query surface area of compiled stored procedures is limited and only allows users to query memory-optimized tables (those reside in main memory).
In the paper "Hekaton: SQL Server’s Memory-Optimized OLTP Engine", Cristian Diaconu and Co. discuss Hekaton, a new database engine that is optimized for memory resident data and OLTP workloads. When observing current market values for main memory, we can notice a gradual drop in prices for reasonable amounts of storage. Thus, there is greater justification for fitting OLTP databases in main memory; most of them fit within 1TB, a relatively affordable amount of storage. Furthermore, Hekaton does not exist as a separate system, but as an integration into SQL servers. Thus, there are four main benefits to using it compared to its competitors:|
1) Customers don't need to buy/learn a new system
2) Customers can indicate performance-critical tables that they want in main memory
3) Stored procedures accessing Hekaton tables can be complied into native machine code
4) Conversion only needs to be done gradually - one table at a time
As a consequence of storing tables in main memory, Hekaton has durable and transactional tables that hold both hash indexes and range indexes. These tables are updated with T-SQL and compiled into native machine code. When tackling concurrency, it avoids using lock data structures and instead opts into multi-version concurrency control techniques to avoid interference among transactions. It seems clear that these greatly support commercial use.
Through analysis, it is shown that the only feasible way to improve performance is through improving scalability, improving CPI, and reducing the number of instructions executed per request. However, with the current model, reducing the number of instructions is an impossible task - we need the assistance of main memory to reduce the number of instructions per transaction. This is done through index optimization for main memory (non B-tree indexes), a lock-less optimistic multi-version concurrency system, and conversion of statements and stored procedures into machine code. Additionally, since partitioning did not fit their customers use cases, they did not implement support for this.
Hekaton is discussed and evaluated in several aspects:
1) Storage and Indexing: This is implemented with either hash indexes with lock free hash tables or range indexes with bw trees (lock free version of B+ trees). When reading data, there is always at least one version of a record that is visible to read. A lookup involves a bucket scan for data that fits within a valid time. When updating values (such as insert and delete), new versions are created and updated into appropriate buckets. The older versions are handled by garbage collection.
2) Programmability and Query Processing: Rather than using interpreter based execution mechanisms to perform run time checks, statements and stored procedures are converted to native code. The workloads that are targeted are "compile once and execute multiple times". In order to keep consistent with SQL servers, they allow a table for stored procedures to preserve the ad-hoc feel of SQL servers. Diaconu does a mixed abstract tree to pure imperative tree conversion (declarative to imperative syntax) in order to compile effectively.
3) Transaction Management: Uses optimistic multi-version control concurrency to provide snapshot, repeatable read, and serializable transaction isolation without locking. This is done through snapshot isolation. They make sure that there is both read stability and phantom avoidance within the system. Timestamps and previous versions are recorded - both begin and end. Once the transaction has completed normal processing, it begins commit processing. It validates that read version were not updated and that phantoms were avoided. Once successfully logged, the values are irreversibly committed unless a rollback is specified by the user.
4) Transaction Durability: In order to revive after a memory failure, transaction logs and checkpoints are kept on disk so the database can be reassembled. Reassembly starts after the latest checkpoint when scanning the tail of the log. Using a parallelized method, the tail is then replayed from the timestamp of the checkpoint in order to recover the state of the database before the crash occurred. It is noted that logging can become a scaling bottleneck - this is true for any main memory database system.
5) Garbage Collection: This addresses how Hekaton deals with the multiple versions of itself that are no longer necessary. Rather than thinking about it as reachability of a pointer from any location, it should be thought of as visibility of the version to any active transaction. This way, it never stalls any transactions, is removed when it is encountered in the way of a scan, has incremental processing, and is both parallizable and scalable.
Much like the other paper about MMDBs, Hekaton also has many drawbacks that accompany it. When discussing the experimental results, they use the wrong type of graph to represent their data. A bar graph does not work because they are trying to show the relationship between three variables - the number of cores, transaction number, and transaction per second of the method used. They should have used a stacked bar chart to show percent increases - a number that is much easier to comprehend by humans. Another drawback of this paper is the lack of exploration on higher volumes of data. I would be interested to see how Hekaton can be pushed to the limits.
This paper describes Hekaton, a main-memory DMBS implementation inside of SQL server. Since the entire implementation is part of SQL server, individual tables and procedures can be created in or transferred to main memory, which makes transitioning existing databases easier. Hekaton stores its tables in main memory, and is designed for an OLTP workload.|
Hekaton uses optimistic multiversion concurrency control in order to handle several transactions at once. The important aspect of this is that it doesn’t use any form of locks or latching for concurrency control, which will increase speed as long as few transactions are aborted. In addition, for the sake of speed, tables are not partitioned. Tests showed that partitioning incurred more overhead than it saved time.
Concurrency is implemented with multiple versions of records. Every time a record is created, a new version is created with the creation timestamp. Each transaction has a begin and end timestamp, and can only read record versions with timestamps between its own creation and the current time. However, it’s possible for other transactions to create new versions between the begin and end timestamp that the original transaction may have read. As such, at commit time, the transaction must verify that no new records have been placed, and must be rolled back if this verification fails.
In order to recover from transactions, Hekaton needs to keep a transaction log, as well as keep track of the most recent checkpoint. Usually, a single log can be a bottleneck for transactions; however, since the entire database is in memory, no log records need to be written for dirty pages, which reduces contention on the log tail. The checkpoints keep track of which record versions are created in data files, and keeps track of which were deleted in delta files. Whenever the database crashed, the most recent checkpoint can be loaded, and the log can be replayed from the checkpoint to recover the database.
The paper gives an overview of the design of Hekaton, which is a new engine optimized for memory resident data and OLTP workloads. The engine is designed for high concurrency, using latch-free data structures and a new optimistic MVCC technique which is different from previous paper we discussed in MVCC.|
Some characteristics about Hekaton is that, it’s integrated into SQL Server, memory optimized tables are managed and stored in main memory, Hekaton tables can be queried and updated using T-SQL in the same way as SQL server tables and Hekaton is designed for high levels of concurrency but does not rely on partitioning to achieve it.
When design the Hekaton for better performance, the authors take into account three principles: optimize indexes for main memory, eliminate latches, locks and compile requests to native code. They also adopt a strategy of no partitioning.
The Hekaton mainly consists of three component: the Hekaton storage engine, Hekaton complier and Hekaton runtime system.
Hekaton achieves its high performance and scalability by using very efficient latch-free data structures, multiversioning, a new optimistic concurrency control scheme, and by compiling T-SQL stored procedure into efficient machine code. Transaction durability is ensured by logging and checkpointing to durable storage. High availability and transparent failover is provided by integration with SQL Server’s AlwaysOn feature.
The drawback of Hekaton is discussed in previous papers that the concurrency control strategy to ensure serializable execution adds too many constraints and entail write to share memory with read, which decrease concurrency at the same time.
This paper proposed HeKaton, which is a memory-optimized OLTP engine that can be integrated into SQL server directly. |
I think the most important contribution of the paper is exactly the integration into SQL server. The only effort a user should do to take advantage of the MMDB is only declaring a or many table(s) in a database as memory optimized. It is actually attracting to consumers. They don't need to purchase another database, also they don't need to hire an engineer to take care of a new MMDB system. Another reason is that not all part of a database needs to perform in a main memory way if they are not time sensitive or being "hot" (visited frequently).
The technical contribution of this paper is using a latch-free data structure and a new multiversion concurrency control technique. The architectural principles are useful, which are optimizing indexes, eliminating latches and locks, and compiling requests to native code. Another contribution is that the paper didn't use partitioning for multi-cpus, which is very different from many MMDB systems. Also, the paper gives a detailed description of its query processing plan and transaction management.
Overall, I like this paper since it seems make MMDB easy to use from a user's perspective.
This paper proposed a novel database engine integrated into SQL Server called Hekaton. Hekaton is optimized for memory resident data and OLTP workloads, it features high performance and high concurrency. In this paper, the authors discuss the design of their Hekaton engine in detail. Nowadays, the price of hardware is much cheaper so that one machine with 1TB RAM is no longer a big deal. The larger RAM makes it possible for the whole OLTP databases fit into main memory. Based on this background, it is necessary for DBMS vendors to design a new in-memory OLTP engine to achieve better performance. So, people in Microsoft introduced the Hekaton engine and integrate it into their SQL Server.|
In order to achieve 10-100X higher throughput, the engine must be carefully designed. The engine must execute drastically fewer instructions per transaction, achieve a low CPI and have no bottlenecks. Based on this, they introduce some principles in their design including optimized indexes for main memory, latches and locks elimination and requests compilation to native code. Unlike other main memory DBMS mentioned in their paper, there is no partitioning mechanism in Hekaton due to the higher overheads and non-partitionable data. There are three major components of Hekaton, storage engine, compiler, and runtime system. These components leverage several existing services of SQL Server. Hekaton provides both hash indexes and range indexes, one table can have multiple indexes. Another important feature of Hekaton is unlike traditional interpreter-based execution, they focus on efficient execution of compile-once-and-execute-many-times, this is realized by converting SQL statements and stored procedures into high customized native code which maximize the run-time performance. For transaction management, Hekaton utilizes optimistic multi-version concurrency control to provide the snapshot, repeatable read and serializable transaction isolation levels. To guarantee transaction durability, it uses transaction logs and checkpoints to durable storage. Since Hekaton uses a multi-version system, it is necessary to incorporate garbage collection mechanism. In Hekaton, the GC is non-blocking, cooperative, incremental, parallelizable and scalable, make the GC process much more efficient.
This is a nice paper describing the SQL Server’s novel memory optimized engine. This paper gives a comprehensive description of the Hekaton. There are several advantages to this paper. For the concurrency control, Hekaton utilizes MVCC and proved several isolation levels without locking, this design greatly improves the concurrency and scalability. Besides, from an engineering perspective, Hekaton use a number of services already available in SQL Server, this reduces the work of building Hekaton and make it easier for maintenance. Also, since Hekaton is integrated into SQL Server, it is user-friendly because a user can easily migrate their disk-oriented data to main memory and enjoy the features of memory optimized engine, this flexibility is always pleasant for users.
The downsides of this paper are minor. First of all, they reuse lots of components that may not optimized for main memory scenario like query optimizer and storage, log. For example, use logging to disk to ensure reliability may become a bottleneck of the system. Second, although the usage of MVCC introduces many advantages, GC is required which result in overheads. Besides, the experiments done with this paper is limited, they do not make a comparison with other in memory systems optimized for OLTP workloads.
This paper introduces Hekaton, which is a database engine that is not completely a main-memory database, but is optimized to try to take advantage as much as possible of in-memory data. In particular, it allows tables to be specifically declared as in-memory tables that will store all data in memory. For these “Hekaton tables” it uses optimized indexes that consist of hash indexes & modifies B-trees called Bw-trees that avoid locking. Hekaton is also optimized for common queries (rather than ad hoc queries) by including a compiler that compiles stored procedures into native assembly code that is highly optimized for memory operations. It supports multiple consistency schemes, with lower-level schemes being supported by lock-based strategies and serializability being supported by a snapshot-isolation-based (that catches violations) strategy. Transactions are logged to stable storage so that we have a better guarantee of recovery.|
An obvious advantage of Hekaton is that it is a database that optimizes for memory-resident data, so it can handle transactions on those tables much faster than other database engines. The compiler seemed to be particularly unique, as it optimizes, seemingly to the lowest level, queries that are repeated often. It also supports multiple versions of consistency which is very thorough, and maintains safety by writing to stable storage rather than relying on main memory.
One disadvantage of Hekaton is that it is not optimized for ad-hoc queries, so it works well for OLTP workloads where queries will be fairly standard but is not as ideal otherwise. Another weakness is that the compiled stored procedures offer a “limited set of options” and don’t encompass the full surface area of T-SQL queries. Another thing I was unsure about was the fact that Hekaton seems to try to avoid using lock-based strategies, but the previous paper we read stated that main-memory databases like to use lock-based strategies because the increased speed decreases lock contention, so there is a discrepancy there that the authors don’t seem to address.