Review for Paper: 13-Main Memory Database Systems: An Overview

Review 1

This paper provides an overview of main memory database systems (MMDB). MMDBs store data in main physical memory and provide high-speed access, and this paper surveys optimizations designed for MMDBs.

MMDB differs from disk DB in the following aspects:
1. The access time of main memory is much less
2. Main memory is volatile
3. Disk access is uniform and independent of work, main memory access is not
4. Disk data layout is crucial to performance, which doesn't hold for MMDB
5. Main memory is directly accessible by the processor, make data vulnerable

One thing to keep in mind is one should always have a backup copy of the database on disk, to endure potential media failure. Some factors force frequent backup:
1. Memory is vulnerable to OS errors
2. Media failure means loss of the entire database, and recovery is slow
3. Sources like battery lead to higher data loss probability

Then the paper introduces impacts of memory resident data:



This paper provides an overview of main memory database systems (MMDB). MMDBs store data in main physical memory and provide high-speed accesses.

MMDB differs from disk DB in the following aspects:
1. The access time of main memory is much less
2. Main memory is volatile
3. Disk access is uniform and independent of work, main memory access is not
4. Disk data layout is crucial to performance, which doesn't hold for MMDB
5. Main memory is directly accessible by the processor, make data vulnerable

One thing to keep in mind is one should always have a backup copy of the database on disk, to endure potential media failure. Some factors force frequent backup:
1. Memory is vulnerable to OS errors
2. Media failure means loss of the entire database, and recovery is slow
3. Sources like battery lead to higher data loss probability

The paper introduces impacts of memory resident data. Aspects include concurrency control (large lock granules can be used), commit processing (logs are needed), access methods (index structures can store pointers to indexed data), data representation (tuples represented as sets of pointers to values), query processing (construct compact data structures to speed up queries), recovery (do checkpoint backup and failure recovery), performance (performance of checkpoint and backup is critical), application interface (memory position pointer provided to user and allows better performance, but have authorization issues and system can't log all changes), and data clustering (since tuples store pointers to values, migrating data to disk is tricky).

Some DBMS for memory resident data (OBE, MM-DBMS, IMS/VS, MARS, HALO, TPK) are introduced and compared. They differ in the aspects mentioned in the above section.

In general, the paper is well-structured, and clearly illustrate the design considerations for an MMDB system. In the section describing different MMDB systems, I think it would be better if the performance impact of different designs can be introduced (with testbenches would be better).



Review 2

Recently, memory price is decreasing, which allows database system resides data entirely in main memory. The advantage of main memory database (MMDB) is trivial, memory provide faster access and also random access which is not allowed in traditional disk resident database (DRDB). The paper aims at introducing main memory database techniques including architecture, concurrency control, access and recovery mechanisms to reveal the insights of MMDB.

Some of the strength of this paper are:
1. Main memory database provides faster and random access, the paper analyzed detailed concurrency control and lock implementation which decrease lock contention and increase time and space efficiency.
2. The new query processing methods on MMDB take the advantage of random access and successfully get rid of sorting the whole relation
3. Recovery and protection technique are well developed when the application can directly access memory.

Some of the drawbacks of this paper are:
1. This paper has no experiment, which is not sufficient to illustrate how MMDB performs better than DRDB and in what situation MMDB performs better.
2. Main memory is still smaller than most of the solid-state storage like disks, so large data won’t fit in main memory
3. Main memory suffers from power instability, even if with the recovery mechanism, it still needs lots of resources to recover from failure because disk operation is costly. If failures happen regularly, that would be a big cost of the database.



Review 3

Main memory database systems(MMDB) become reality due to the cheaper semiconductor memory and and increased chip densities. As the conventional databases are optimized for disk storage mechanisms, different optimizations to structure and organize data as well as make database reliable must be considered for MMDBs. Therefore, this paper surveyed the major memory residence optimizations and briefly discussed some of the memory resident systems that had been designed or implemented.

This paper began with a review of difference between main memory and disks:
1. Access time is much shorter for main memory than for disk storage.
2. Main memory is volatile while disk storage is not.
3. Disks are block-oriented while main memory is not.
4. Sequential access is not as important in main memory as that in disks.
5. Data in main memory is more vulnerable than data in disks.

It is reasonable to model MMDB because first, in some cases, the database is of limited size or is growing at a slower rate than memory capacities are growing; second, In some real time applications, the data must be memory resident to meet the real-time constraints; third, in large applications, hot data which is accessed frequently is usually low volume and with stringent timing requirement, and thus can be stored in main memory.

The main difference of a MMDB and a DRDB with a large cache is that the DRDB is not taking fully advantage of the memory, and we cannot say the main memory is nonvolatile and reliable by introducing special purpose hardware. Memory backups have to be taken relatively frequently because first, memory is more vulnerable due to direct accessibility by the processor; second, a memory board failure leads to whole database loss and more time-consuming recovery; third, battery backed memory leads to higher probability of data loss.

This paper also introduced the impact of memory residency on some functional components of DBMSs:
1. For concurrency control, lock contention may not be such important, and large lock granules are more appropriate. Besides, the actual implementation of the locking mechanism can also be optimized for memory residence of the objects to be locked.
2. For commit processing, to deal with the problem of longer response time and lower throughput brought by the stable log, first solution is that a small amount of stable main memory can be used to hold a portion of the log. Other solutions could be the introduction of precommited status of transactions or group commits.
3. For access methods, the data values on which the index is built need not be stored in the index itself, and index structures can store pointers to the indexed data, rather than the data itself.
4. For data representation, relational tuples can be represented as a set of pointers to data values.
5. For query processing, query processors for memory resident data must focus on processing costs, whereas most conventional systems attempt to minimize disk access.
6. For recovery, first is to keep the backup up-to-date, and second is to recover from a failure. One principle is that checkpointing should interfere little possible with transaction processing. Possible solutions are transaction-consistent checkpoints and fuzzy dumping. Disk striping or disk arrays can also be used to solve the recovery problem.
7. For performance, the performance of a MMDB depends primarily on processing time, not on the disks.
8. For application programming interface and protection, access to objects in a MMDB can be more efficient by giving applications the actual memory position of the object, or by eliminating the private buffer and giving transactions direct access to the object.
9. For data clustering and migration, we need to note that migration and dynamic clustering are components of a MMDB that have no counterpart in conventional database systems.

This paper also gave representative MMDB systems, including OBE, MM-DBMS, IMS/VS Fast Path, MARS, HALO, TPK, and System M.

The main contribution of the paper is that it gave a clear introduction of main optimization directions and achievements of MMDBs over DRDBs, and analyzed the advantages and disadvantages of the most popular MMDB systems. This paper gave enough examples and details to make it easy to understand.

The main drawback of this paper is that it didn’t summarize the problems of existing designs of MMDBs and the possible improvements of these systems.


Review 4

Problem & Motivation
The authors are invited to make an introduction to the MMDB and summarize the techniques the MMDB typically use.


Main Achievement
Introduce many concepts and techniques that are widely used in MMDB.
For example:
1. DRDB and MMDB have copies both in memory and on disk. The key difference is that in MMDB, the primary copy lives permanently in memory.
2. The key difference between memory and disk: access time, volatility, block-oriented or not, disk, vulnerability.
3. Preconditions of MMDB: data fits in memory, memory resident data optimization, non-reliable.
4. The special techniques that can be used in MMDB: T-tree, additional bits in memory for the lock, group commit, stable memory and so on.

Drawback
Clearly organizations will be better if including more details about these techniques (like what is the alternative way to conduct the query processing instead of the traditional way which is sort-merge join).


Review 5

As technology continues to progress, it is often the case that previously valid assumptions or limitations that go into the design of a system are no longer the case, resulting in changes or the introduction of new designs with different capabilities. This has been the case with high performance database systems as well. The paper “Main Memory Database Systems: An Overview” provides an overview of a memory resident database systems (MMDBs), a new (at the time) type of database system that stores data in main physical memory, as opposed the conventional ones that use hard drives for storage. Memory is orders of magnitude faster than disks when it comes to data access, but for a long time, constructing a database using only memory for storage was prohibitively expensive. At the time this paper was written, falling prices due to improved processes and other trends made it finally feasible to entertain the possibility of large scale MMDBs.

Due to the fundamental differences between memory and hard drives, MMDBs have different design considerations that this paper attempts to discuss and address. For example, since memory access is so much faster than disk, transactions complete much more quickly, reducing the need for locks to ensure concurrency control. Therefore, large lock granules rather than small are more appropriate for MMDBs. Also, transactions may complete so quickly that logging becomes a bottleneck (since transactions must be written to log), a problem that conventional systems do not have to deal with in most cases. This can be alleviated by writing to log in batches, rather than as soon as each transaction completes. Additionally, since sequential access is not significantly faster for MMDBs as is the case for disk-based systems, the data structures (B-trees), data representation, and query processing that were designed to optimized disk access times can all be changed to adapt to MMDBs. One disadvantage of MMDBs is the increased rate of failure, due to the fact that main memory is volatile and loses its data if it is turned off or loses power. This requires more frequent backups and checkpointing schemes that increase the system’s robustness to failure while not unduly slowing down transaction processing. The paper concludes by discussing some real-world MMDB systems in use and/or development, such as IMS/VS Fast Path, TPK, and System M.

The strength of this paper lies in its thorough discussion of the design challenges facing MMDBs, and it was written at a time when industrial scale MMDBs first starting to become cost-effective. It was probably the case that many designers then were still unsure about whether it was even practical to design databases where data fit entirely in main memory, as well as the various implementation challenges stemming from the physical differences between memory and hard drives. This paper lays everything out in a straightforward fashion, and even includes a section on actual MMDBs and the features/characteristics of each. As an informative tool, this paper was quite effective.

This paper does not have any weaknesses, since it is not presenting any new findings of its own. Any objections have to deal with the MMDBs itself, such as the relatively simplistic treatment of concurrency control, and not mentioning other possible schemes such as multi-versioning and whether or not they would work well with a main-memory dominated database system.


Review 6

The main contribution of this paper is describing the main memory database system (MMDB) by comparing it to the traditional DRDB database system. The paper presents the shortcomings of the more traditional architecture and how main memory improves upon these areas, and also raises new challenges. This is significant because there are different use cases like telecommunications where we can guarantee that our database must fit in main memory to fit real time constraints, in which case we can speed up transactions because of the much higher I/O performance of main memory access as opposed to magnetic disk storage access.

MMDB showed improvements over the old system in its concurrency control since the system was not as negatively impacted by locks due to the improved transaction speed of main memory over disk. The paper discussed improvements on transaction logging overhead as well as how main memory showed better access times since it did not need to support systems that disk storage requires to facilitate memory access. We saw how we could take advantage of pointers in main memory as well. We looked at the example of sort merge join to show how sequential memory access on a disk is not significantly faster than random access on a memory resident system. Recovery is performed by keeping backups of the data on the disk system and then bringing them up to date using the log. We also took a look at how processing time for a MMDB depended on processing time whereas that of DRDB depended on the number of I/O instructions.

I liked how each section of information was concise in this paper. The information was presented, often with an example and then the next logical idea was presented. This was most presented in section II where each component of the DBMS was considered and evaluated on MMDB versus the traditional disk resident system. This section is where I developed the best idea of where main memory storage showed improvements over disk and where it rose new challenges and did not necessarily improve in an area, namely the Data Clustering section which spoke about how MMDB need not cluster data, so there is a new question of the optimal way to migrate it to the disk. I also liked the table they provided of existing MMDB implementations and proposed implementations. This helped summarize the body of text that followed and relate each implementation to one another.

I did not like about this paper how it had a very limited number of graphics. I find visuals helpful in a paper, although I found the material in this paper to be high-level enough that it was not absolutely necessary. In general it was a good read but this might have saved some time and imagination in processing the text.



Review 7

This paper gives an overall of MMDB which stores data in main memory that achieves high speed access. MMDB is different from previous DBMS(DRDB) because it uses different structure optimizations and storage mechanisms although they seems have the same feature: DRDB will load data into main memory for access which looks the same as MMDB. However, the key difference between DRDB and MMDB is the latter one will permanently cache data into the main memory.

The paper first introduces the reason why MMDB occurs. This is because the main memory is cheaper and cheaper and the increasing need of real time processing. Then it gives the answer of some commonly asked questions: 1)MMDB assumes the entire database fits in the main memory. 2)Though DRDB looks the same as the MMDB when DRDB owns a large cache, DRDB does not take the fully advantage of the memory. 3)the main memory is not guaranteed to be both nonvolatile and reliable. According to these, the MMDB be choosing is the result after balancing the overhead of backups and disk writing.

Then the paper introduces the differences between traditional DBMS and MMDB in various aspects: 1)concurrency control: locks contention will make few differences in main memory so that large lock granules(even entire database) is suggested for memory resident data. For the implementation, there is no need to have a hash table lookup. 2)commit processing: usually people do safe commit by logging first and then commit. But this harms MMDB. Some techniques resolve this: 1: use a small portion of main memory for logging 2)group commits. these methods reduces the influences brought by logging. 3)access method: due to the change of the properties of the memory, B-tree is replaced by other forms of hashings and trees. And fast random access let data value not necessarily need to store in index any more. The use of tuple pointer ensure the fast and space efficient of data structure. 4)data representation: pointers are used and it simplfies things. 5) query processing: the extraordinary fast sequential access is not longer that appealing in main memory. The relational operations and query optimizations vary. 6)recovery: backups are stored in disk or other stable places. In memory resident DBMS, only checkpointing and failure recovery need accessing to disk. Therefore, disk I/O performs using large block size which is efficiently written. When backup, the data is loading in 'on demand' manner. 7)performance: backup will not interfere traditional DBMS but MMDB. 8)data clustering: in DRDB data objects are clustered while not in MMDB because the condiments of an object may be dispersed in memory.

The paper also introduces several MMDBs and introduces their specific properties. The main contribution of this paper is that the comprehensive overview of the MMDB system and comparison between MMBD and DRDB give me a clear understanding of it. Also this paper introduces lots of applications of MMDB system which makes this article convince.

One suggestion is: it will be better to use more space for the specific application and make it more detail.




Review 8

This paper mainly gives an overview on the Main Memory Database System that is different from Disk Resident Database System. This purpose of writing this paper is to survey major memory residence optimizations and discusses memory resident systems.

The paper first point out that it is reasonable to assume the entire database can be fit in memory. And then the paper compared main memory with disk and point out the difference,
1.The access time of main memory is much less than disk
2.The main memory if normally not volatile
3. The layout of data is different.
The difference that listed above can cause much design differences in Concurrency control, commit, access, recovery and so on.As for concurrency control, since the main memory if much faster than disk access, so we would expect the transaction complete more quickly. Also, the lock contention is not that important, and we are tend to lock in a larger schema since it is fast. Then, for commit processing, we would need a backup copy on the disk. THe backup copy is used to protect against media failure and power outage. The MMDS also need a stable log, which will potentially undermine the performance.Also, index of main memory should be not the same, we can use pointers in the leaf node since the main memory can use pointers very fast.

The main contribution of this paper is that it gives a thorough overview on the main properties of main memory database, and discussed the existing design and implementations of databases. Also, this paper remind us the the main memory resident database system will become more common in the future.
The weak point of this paper falls on that it only introduce the current models that exist, but fail to compare those models. I do think that the author should have a preference among the models. Also, a comparison would also be good.



Review 9

This paper describes optimizations for main memory database systems. First, the authors describe how memory differs from disk (especially in this context), when main memory databases would be useful, and how they differ from systems that seem similar on the surface, like a disk resident database (DRDB) with a very large cache. Next, they describe various aspects of a traditional DBMS that must be modified for a main memory system. Finally, they discuss various existing MMDS implementations, from purely theoretical designs to databases that are available commercially. The authors emphasize that as memory becomes cheaper, it becomes more and more possible to store frequently accessed (“hot”) data in main memory. As an aside, I found myself wondering how main memory databases are now thought of when solid state drives are commonly used for database systems.

The authors do a good job presenting what is essentially a paradigm shift. Main memory databases inherently have different problems to solve than DRDBs. While they have separate sections describing nine of these problems, I thought that a few were particularly well done. First, section A describes that locking can afford to be less fine-grained on memory-resident databases, because faster access times mean that data is under less contention. Additionally, locks can be stored along with the data, allowing a minimal number of machine instructions when there is low contention. The other general idea that I thought was interesting is that query processors, and really every aspect of a MMDS, must focus on processing speed instead of on disk access speed. This is in stark contrast to many of our class discussions that have had some focus on the speed of disk I/O, which informs the architecture of the entire system.

One thing that I was unsure about in the paper was group commit, discussed on page 511 and 512. The problem that this technique aims to solve is that writing to a log in a MMDB becomes an intensive operation, because other operations do not require the slow operation of direct disk access. The suggestion is that log records are not sent to disk when commits happen; instead, log records for many commits are stored in memory, and when enough have accumulated, they are written to disk. This seems like a good method to reduce the number of disk writes, but it also seems like something that could lead to data loss if the database went down. Either this is a flawed process (perhaps unlikely), or the authors did not mention some necessary context - that the unflushed logs were stored in nonvolatile memory, etc.


Review 10

This paper gives an overview of main memory database systems. The whole can be divided into two parts: in the first part, the authors discussed several optimizations that can be made in the main memory database systems but not in the traditional disk resident systems. In the second part, several memory resident systems were described. The description mainly focuses on the design choice these systems made to address the problems raised in the first part. As the price of a byte of main memory keeps dropping relative to the cost of disk accesses per second, the authors believe that memory resident database system will become increasingly common in the future. Thus the topics in this paper will become more important and commonplace.

In terms of the optimization opportunities, the paper provides 9 examples. In my opinion, most of these optimizations can be divided into two groups based on the cause of the optimization. The optimizations in the first group result from the fact that random access in memory is fast. This fact allows us to use new access methods, data representations, and new query processing techniques. The other group is based on the fast speed of main memory. Thus we can have a new lock mechanism, commit & recovery techniques, performance evaluation should focus on processing time instead of the number of I/O operations, etc. Of course, there are some new problems faced by the memory resident systems. For example, if data are not clustered in the memory (since random access in memory is fast), upon migration to disk, how and where should an object be stored? Also if applications build on top of memory resident systems are given actual memory position of an object (thus the buffer pool can be eliminated, and avoid unnecessary data copy), how to prevents them from read or modify unauthorized objects.


The paper does a good job in the introduction part, where differences between memory resident and disk resident systems are listed and addresses some commonly asked questions. The introduction part gives reader sufficient background to understand the discussion of optimizations. However, in the second part of the paper, the discussion of each system simply lists its design choice for these optimization opportunities but there’s no analysis or explanation. If a reader is not familiar with a system, then the discussion probably makes little sense to him/her.




Review 11

In the paper "Main Memory Database Systems: An Overview", Hector Garcia-Molina and Kenneth Salem review the benefits and drawbacks of using a main memory database system (MMDB) as opposed to traditional database systems (DRDB). They believe that this concept is feasible and interesting due to the availability and lower costs of newer semiconductors that have greater chip densities. When comparing main memory to disk storage, there are some notable differences that lead to many implications:

1) Main memory access time <<<< disk access time
2) Main memory is volatile while disk is not. (It is possible to make main memory not volatile)
3) Disks are block oriented while main memory is not
4) Sequential access is fastest on disks while for main memory it does not matter
5) Main memory is more vulnerable to software issues

Furthermore, Garcia-Molina also makes the assumption that they are working with databases that meet the criteria to fit into main memory. Even when some examples may not work, they claim that these databases can be translated into MMDB by targeting a partition of the database that has "hot" data (data that is accessed frequently). They also assert that even though DRDB caches part of its data into main memory, it is not utilizing 100% of what it could be doing. This includes using pre-built index structures designed for disks even though it is cached in memory and constant references to the buffer manager every time a tuple is accessed. Once these issues are optimized over sufficient time and research, DRDB will slowly morph into MMDB.

In this paper, Garcia-Molina and Salem go over the components of MMDB and existing implementations. The components are the following:

1) Concurrency Control: Using very large lock granules is preferred. If the entire DB is locked, we have serial execution - a desirable trait. Furthermore, lock status can be kept track of with extra bits in memory.
2) Commit Processing: Due to volatility, stable backup logs are constantly needed. There are two approaches: either store the log on stable main memory or pre-commit transactions.
3) Access Methods: B-trees are not good for main memory; there are implementations with hash tables and trees. More importantly, the index structures can hold pointers to data rather than the data itself which greatly reduces overhead.
4) Data Representations: Can represent relational tuples as a set of pointers to data values - space efficient.
5) Query Processing: There is no need for sorted relations. Rather, there is more emphasis on minimizing processing cost than minimizing disk access. Thus, we need to make sure optimization techniques work on all systems.
6) Recovery: Need to bring back data from disk. Could use disk striping or disk arrays?
7) Performance: Metrics to measure time in MMDB are very vague (DRDB systems measure with I/O). Furthermore, MMDB systems will have backups much more frequently due to volatility - a bottleneck of the design.
8) Application Programming Interface and Protection: There are some security vulnerabilities, but elimination of the private buffer can allow for less instructions per transaction.
9) Data Clustering and Migration: Unnecessary; everything is in memory.

Even though the paper was organized well and very easy to understand, there were still some drawbacks. I felt that support for MMDB could be justified through some experiments that detailed the greater performance. If anything, they could have used existing MMDBs that they discussed and compared it to their DRDB counterparts. Another thing that I noticed was that their conclusion felt out of place. They introduced a concept called the 5-min rule and extend it so that the 5-min rule becomes the 10-min rule once memory becomes cheaper. How did they get to 10? Is this an arbitrary number? When will this min rule normalize (reach a point where it doesn't increase)?


Review 12

This paper describes the advantage of a DBMS stored primarily in main memory. Ordinarily, due to its large size, a database will have to be stored on disk, with portions brought into memory as needed. However, in certain circumstances, the entire database can be stored in memory at once. The primary advantage of this is the increased speed, but there are several differences.

Memory access time is much faster than disk access time, so database operations will generally run faster. Also, main memory doesn’t have very much of a difference between sequential and random access, so a DBMS in memory doesn’t have to worry about storage locations. Pointers can also be used effectively in a main memory DBMS, since random access is much quicker. On the other hand, optimizations built for serial reads are less effective in a main-memory storage. In general, optimization will be based more on processing time, instead of disk IO, which generally only happens due to backups.

Main memory is volatile, however. Even in the best-case scenario, main memory will fail under certain circumstances. As such, any memory-based DBMS needs to have a backup on disk. In order to effectively restore the database, frequent backups should be generated. Backing the database to disk takes much longer than most of the database operations, however. A log can be kept of database operations, and only the log pushed to disk whenever transactions commit. In this case, whenever the system crashes, the database can be reconstructed from the backup and the log. Pushing the log to disk still adds unnecessary time to transactions, though, so the log should also be pushed as infrequently as is safe.

Serializability can have different priorities in a main memory DBMS as well, due to the increased transaction speed. Since transactions are faster, they’ll hold onto locks for much less time, and so a standard locking scheme will have much better performance due to less contention for the locks. In fact, if the transactions are fast enough, the DBMS can force transactions to run serially, which eliminates the need for any concurrency control.



Review 13

The paper gives an overview of main memory database systems. It introduces the difference between memory and disks. Given the answers the paper gives to the three frequent questions about MMDB, one can quickly grasp a basic understanding of MMDB, like in most cases, the entire database fits in main memory and for those exceptional cases, data can be partitioned into different sets and also fits in MMDB. Like DRDB with a very large cache is similar to MMDB but reading from memory is more efficient than reading from disks. Like main memory is not always volatile and reliable and we need backup strategies to maintain it.
The paper discuss the impact of memory resident data comprehensively in aspects from concurrency control to data clustering and migration. The most likely situation (with low contention) in concurrency control is for a transaction to lock a free object, update it, and to release its lock before any other transaction waits for it. In this case, both the lock and the release can be done with a minimal number of machine instructions, avoiding the hash table lookup entirely. To protect against media failures, it is necessary to have a backup copy and to keep a log of transaction activity. One observation common to all main memory access methods is that the data values on which the index is built need not be stored in the index itself, as is done in B-Trees. Main memory databases can also take advantage of efficient pointer following for data representation. Query processors for memory resident data must focus on processing costs, whereas most conventional systems attempt to minimize disk access. Backups of memory resident databases must be maintained on disk or other stable storage to insure against loss of the volatile data
In addition, it discuss the systems for memory resident data comprehensively in systems from OBE to System M.




Review 14

This paper is an overview of features in main memory database system. The main difference of MMDB from conventional database system is that data resides permanently in main physical memory instead of resides in disk. This is feasible because the semiconductor memory becomes cheaper and chip density increases. The main advantages of MMDB include shorter access time, more volatile memory, faster random access, and directly accessible by processor. However, there needs to be a more frequent memory backups. The paper discusses impact of memory residency on some of the functional components of MMDB. For concurrency control, since the time to access memory is shorter, locks will also be held shorter, which benefits large lock granules. Also, the logical database content does not hold lock status, we now can afford to have a small number of bits representing lock status since access to memory is faster. For commit processing, there needs to be backup copies and log of transactions to protect against media failure. These need to be stored in redundant disks in MMDB for stability, which can be a bottleneck. Paper presents two solutions: pre-committing to reduce blocking delays and group commits to relieve log bottleneck. For access method, deep but not bushy B-tree and inverted index are suitable in main memory database. In query processing, MMDB needs to focus on optimization of processing cost instead of disk access cost. For recovery, topics of backup copies and transaction logs are revisited. Since checkpointing and failure recovery needs to access disk, they should interface as little as possible with transaction processing. To avoid spending too much time on restoring bulk data from disk, restoring on demand can be used. Since backup and checkpointing tend to be bottlenecks of MMDB, the algorithms for the two are critical when it comes to performance analysis. There are several database management systems for memory resident data have been proposed and implemented: MM-DBMS. MARS, HALO, OBE, TPK, System M, and Fast Path. This paper evaluates these implementations on how they address the issues in MMDB.

Overall, I like this paper because it is brief and addresses the key impacts on functional components of a database management system. What it can improve is to have a summary section to conclude the advantages and disadvantages of MMDB and compare performance of MMDB and DRDB. Some experimental data comparing MMDB and DRDB will be very helpful.


Review 15

This paper is a good overview of the field of main memory database, which is very different from the conventional database systems.

In the introduction section, the paper gives a background that memory becomes cheaper and cheaper, which makes MMDB possible. Also, the paper answered some questions about MMDB, some assumptions, and conceptual ones.

The main contribution of the paper is giving an overall observation of the functional components of DBMS. Including concurrency control, commit processing, access methods, data representation, query processing, recovery, and performance. In each of the aspects, the paper gives the key difference between conventional DBMS and MMDB on the aspect, giving the reader an intuitionistic understanding of how MMDB holds its own feature.

Another contribution of the paper is analyzing several MMDB systems from the functional components. The table on page 6 is really illustrative for readers to get an idea of the different systems.

Overall, the paper looks good to me. I don't have some downsides about this paper.


Review 16

In this paper, the authors provide an overview of main memory database systems with a survey of the major memory residence optimizations and discuss some of the memory resident systems that had been designed or implemented. This is summary like paper which mainly focuses on comparing the difference between traditional DRDB and MMDB from several aspects of the design of database systems. This problem is valuable since MMDB can be pretty different with DRDB in the performance, backup and recovery mechanism, concurrency control and etc. For example, MMBD can provide much better response times and transaction throughputs which is more preferable for real-time applications. Making a tradeoff between DRDB and MMDB in different scenarios and select a correct one based on the use case is always significant. This paper did a good job on providing an overview of MMDB, it first points out 3 common problems with MMDB, then they discuss the impact of MMDB in several aspects, besides they introduce some MMDBs have been proposed or implemented and show how they address the issues in MMDB.

The first concern for an MMDB is that whether the data can be fit into memory. In some scenarios, it is fine to put the whole schemas into main memory. However, it is natural to have some large tables which never fit in memory. A good solution for this problem is to partition the data into hot data and cold data, store the hot data into main memory. For MMDB, something needs to note that an MMDB is not equivalent to a DRDB with a very large cache. Because the access to data in DRDB always needs buffer manager, no matter your data is in RAM or not, you always need to map the disk address to memory address, resulting overheads. Next, even if your RAM is nonvolatile and reliable, it is always a good practice to backup frequently. In the concurrency control of MMDB, it is suggested that very large lock granules are most appropriate for memory resident data. Since the RAM is volatile, it is very important to keep a log of transactions to make sure we can recover after a failure, however, the logging can become a bottleneck of the system. To solve this problem, one uses stable RAM, pre-committing or group commits. For access methods in MMDB, one can use pointers in the index that can save space and accelerate access. Also, these pointers can be used to represent data values, for the same value we only store once, this achieves good space efficiency. For the recovery of MMDB, checkpointing and failure recovery will access disk resident data, very large block should be used to achieve better I/O performance. For the application with MMDB, it can access the object more efficiently because the actual object address can be easily obtained, and the usage of private buffers are eliminated.

Generally speaking, this is a great survey paper summarizing the MMDB issues in the 1990s. I think the main contribution of this paper is that it makes a very good comparison between MMDB and DRDB in different aspects. For example, why logging and recovery can be treated differently, why the heavy usage of the pointer can bring better performance. From this paper, I clearly understand how MMDB works and what are the potential bottlenecks for such systems. Besides, another good point of this paper is that it provides real-world examples of MMDB systems, from which people can understand how the design of MMDB can vary to achieve different goals.

Personally, there are some drawbacks to this paper. First of all, in the concurrency section, this paper only talks about locking, even if transactions in MMDB is faster, using locking concurrency control is not a good choice. Maybe they can consider other concurrency levels like snapshot isolation or serializable for high concurrency case and explore whether these isolation levels can improve concurrency. Besides, since this paper was written more than 25 years ago, I think it looks a little bit old-fashioned. DBMS evolves fast in past two decades and there are some new techniques invented to solve the problems mentioned in this paper, for example, the SSD has greatly improved the I/O of the disk, by adopting new techniques, maybe some of the questions addressed by this paper is no longer a question.



Review 17

This paper is a survey that describes the main differences between main memory database systems and disk storage database systems, as well as several advantages/disadvantages of each type of system. The main difference is obviously the placement of data; in MMDBs, data is stored in memory and DRDBs store data mainly on disk (with the possibility of caching some data in memory). This differences influences several parts of a DBMS implementation, such as indexing (B-trees aren’t as effective in MMDBs), transaction processing (MMDB’s can process locks so fast that 2PL isn’t that bad), and recovery (MMDBs have to worry about more frequent failure).
The chief advantage of MMDBs is the huge speed performance benefits that in-memory operations have over disk storage; in particular, not having to worry about maximizing sequential disk access while minimizing random access is a big benefit. Also, two-phase locking is more feasible for MMDBs because lock contention is lessened due to the increased speed.
There are several limitations and weaknesses for MMDBs, however. One obvious limitation is that MMDBs only work when the datasets are small enough to be stored in memory. Another weakness is that memory is much more volatile than disk storage, which means that frequent writing to logs is more important—also, maintaining a log in memory is dangerous for the same reasons, but keeping a log stored on disk becomes a bottleneck, as every other operation will be much faster. Also, disk storage systems can replicate many of the benefits of MMDBs simply by caching data in memory—it won’t be as efficient but will avoid the reliability issues that MMDBs have to worry about.
One final weakness of the paper itself is that I thought that especially considering that it is only 8 pages, it seemed to go into very little detail on various topics. In particular, I felt that the sections related to describing why B-trees are not optimal for MMDBs and keeping a log on disk vs keeping a log as a portion of main memory could have been expanded upon much more while still staying in scope for this paper.