System R has had a lasting influence on the design of commercial DBMSs, with its disk-based indexes, multithreaded execution of concurrent transactions, and lock-based concurrency control. Unfortunately, this architecture may not be optimal for high-throughput OLTP on modern systems, where all data fits in main memory. Today's OLTP systems can typically fit in memory and have queries that run in a short time, so multi-threading is unnecessary. In addition, most transactions in a typical workload do not conflict, so optimistic concurrency control may be faster than locking-based concurrency control, particularly in a grid computing environment with locking bottlenecks.|
The authors of “The End of an Architectural Era” present H-Store, a new OLTP database system designed to be orders-of-magnitude faster than general-purpose DBMSs for transaction processing. H-Store runs in a shared nothing grid cluster. It automatically partitions a database horizontally so common queries run “single-sited,” meaning all data needed to answer the query is located on one server. This reduces contention and increases throughput in workloads where most queries touch only a small fraction of all tuples. A special feature of H-Store is that it requires all transaction types to be pre-defined as stored procedures, so that the system can optimize the storage of tuples for efficient querying.
The main contribution of the paper is H-Store, an implementation of a novel OLTP architecture, along with performance tests that show an 82x speedup on the TPC-C benchmark relative to a popular commercial DBMS. The authors claim that logging overhead is the main reason a commercial DBMS is slower than H-Store, as up to two thirds of its CPU time is spent logging; concurrency control is the next-most costly system in a traditional DBMS. H-Store reduces concurrency control overhead through an optimistic system, which executes transactions without locks but aborts potentially conflicting transactions, after checking timestamps of various operations performed. H-Store has no persistent redo log, but relies on replicas for reliability. This allows H-Store to reduce the overhead of logging.
H-Store has greater throughput on TPC-C than a traditional DBMS, but it comes with many limitations. One of the greatest is that H-Store does not allow ad hoc queries. Every transaction or query must be saved as a stored procedure when an H-Store node is set up. H-Store is optimized for deployed OLTP use, but it is not general-purpose.
The paper argues that modern Relational Database Systems have outlasted their years and its time for a complete overhaul. The original RDBMS systems were built to cater the needs of the business data processing market with architectural features including disk oriented storage, multithreading, lock-based concurrency control and log-based recovery. The authors specify design of the new system where database can fit in main memory, transactions rarely wait making single thread cores optimal, replication for fault tolerance, using optimistic concurrency control methods among other specifications. For this purpose, they propose a new OLTP prototype engine namely H-Store.|
H-Store is a Shared-nothing, main-memory, row-store relational database which specifies transaction classes and table definitions in advance and has a grid of computers with rows of tables placed contiguously in main memory and conventional B-tree indexing. A conventional query optimizer is proposed for this. An automatic physical database designer will be incorporated which will specify horizontal partitioning, replication locations and indexed fields. Apart from this, replication will be transactionally updated. Since OLTP transactions are very short-lived, single threaded executions are suggested. Experimental results prove H-Store was 82 times faster than a traditional DBMS.
The paper is successful in meeting its premise with a viable solution in the form of H-Store. It shows significant performance advantage over a popular relational database. The architecture of H-Store with its various components are explained appropriately with discussion of problems as well.
However, it can be argued that replacing a general purpose query language with a custom one can create problems with the migration of old databases to new taking a long time since it changes the convention from “one size fit all” model. The paper doesn’t give solutions to some problems such as a cluster-wide power failure in the new proposed system can cause the loss of committed transactions and in case of a network partition, some queries will not execute.
What is the problem addressed?|
They have designed a new DBMS engine for OLTP applications. Enough of this engine, H- Store, is running to enable them to conduct a performance bakeoff between it and a popular commercial RDBMSs. Their experimental data shows H-Store to be a factor of 82 faster on TPC-C.
Previous papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets. The current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store to a popular RDBMS on the standard transactional benchmark, TPC-C.
1-‐2 main technical contributions? Describe.
The paper presents five major issues, which a new engine such as H-Store can leverage to achieve dramatically better performance than current RDBMSs.
1. Large main memory is feasible now, and makes disk-oriented relational architecture for OLTP applications obsolete.
2. OLTP transactions are very lightweight. In such a world it makes sense to run all SQL commands in a transaction to completion with a single- threaded execution model, rather than paying for the overheads of isolation between concurrently executing statements.
3. It seems plausible that the next decade will bring domination by shared-nothing computer systems(cluster), often called grid computing or blade computing. Hence, any DBMS should be optimized for this configuration.
4. In the future, we see high availability and built-in disaster recovery as essential features in the OLTP (and other) markets.
5. A much better answer is to completely rethink the tuning process and produce a new system with no visible knobs.
1-‐2 weaknesses or open questions? Describe and discuss
The rise of multi-core machines suggests that there may be interesting optimizations related to sharing of work between logical sites physically co-located on the same machine.
This paper discussed an issue whether “one size fits all” in commercial RDBMS ends. To be more specific, the author create a new OLTP prototype called H-Store and evaluate the performance on a standard transactional benchmark TPC-C. The results show that H-Store can outperform the traditional RDBMS by a factor of 82 times (almost two orders of magnitude). This result shakes the old belief and indicate that there might be a need to complete rewrite the design systems for today’s requirements. This paper gives an overview about the new observation and moves to the explanation of design considerations that can achieve the significant outperformance. Then it explains the design for H-Store, as well as the performance evaluation on both H-Store and a popular RDBMS. At the end of the paper, it also provides several future recommendations. |
The problem here is that today’s technology has changed a lot (fit in memory, better computation speed), and the old design idea “one size fits all” might also be changed too. The recent research shows that the near 30 year old legacy code should be retired and designers completely restart the design for today’s needs. For example, old database systems are mainly developed for business data, however, we have other different forms of data nowadays, including dat warehouse, stream processing, test, and scientific database markets. Another example is that the popular RDBMS often inherited from old system (System R) that was architected 30 years ago, when hardware characteristics have changed a lot today.
The major contribution of the paper is that it provides a detailed discussion about the new OLTP design H-Store, including the design concerns and how it increases the performance. Also the results show that the new design performs 82 times better than a popular RDBMS. However, it might be better if the paper talks more about how to evaluate this performance difference and provides some graphs to help illustrate the evaluation methods and process.
One interesting observation: I noticed that there are always two sides for designing system. One is to make optimization on previous framework or code lines to add more features to meet today’s needs. Another is to change the old design or redesign with new architecture. I think both work, but it also depends on current technology and requirements. It is always hard to find “one” solution for all because things change too fast.
This paper proposes an idea that many may not want to hear - that we should throw away all the current database code, and rewrite it all, from scratch. The idea is simply that, for any specialized application, a system can be built that offer 1-2 orders of magnitude more performance than a general purpose RDBMS. The overhead of transactions may not be worth it for many applications, and the authors are purposing a system that can beat current commercial RDBMS in OLTP. Current systems are designed for disks, are multi-threaded using locks for concurrency control, and using logging for recovery - all businesses data processing needs, not necessarily the needs of all database users.|
They purpose a system that executes transactions in a single thread, instead of the multi-threaded version of current systems. They claim that the locking, and scheduling overhead isn't worth it for most OLTP workloads. The transactions are short - just run them one at a time, and don't worry about it. They also want their system to scale with the number of machines easily - it should not require bulk loading the whole database again. Another point the authors mention is that legacy systems were designed assumes that a company only had one database server, and it needed to be able to bring itself back online. Now, however, companies have several - failing machines can be rebuilt by looking at other machines. The authors then discuss their H-Store system, and how it addresses some of these problems. It is designed to run on a grid, run SQL queries one at a time to completion, and tune itself.
The authors pose some interesting ideas towards the end of the paper. It's interesting that Stonebreaker is an author on this paper - he wrote "What Goes Around Comes Around", in which he argued in favor of the relational model, showing how all the other models that have come have fallen. Now, however, he is suggesting that the Relational Model may not be the answer. However, this would certainly seem to be the case. Web-scale companies such as Google and Facebook have had to create their own custom data storage engines, because traditional RDMS have not been able to meet their scaling needs. They also suggest that SQL is not the best tool for the job - an opinion probably not shared by all companies selling DBMS.
This paper does a good job of showing the problems with current DBMS. They were built for a different era, for a different workload - we could do a lot better with a complete rewrite for each workload. They did a good job explaining how their system was able to fix some of these problems.
I would have liked to have seen some more discussion at the end. While I agree that the relational model may not be suited to all problems, I think a large percentage of data is naturally structured as tables - users, inventory, financial data, etc. I think throwing out the relational model is a bit preemptive.
The paper puts forward a problem that the relational DBMS has no significant changes for 25 years and is greatly behind the time. It is because the hardware has been improved with much higher speed, larger memories and storages, while the behaviors of DBMS almost remain the same. And now more markets other than the traditional business data processing that the RDMS is not optimal in. Also, the user interface of RDBMS should be improved. The problem is important since the traditional “one size fits all” view could not meet the needs of the market anymore.|
The solution proposed by the authors is completely rewritten the current RDBMS for OLTP applications. The approach is to design a new DBMS engine called H-Store which is 82 times faster than TPC-C. It is because the design of H-Store meets the current properties of OLTP. First, the OLTP can fit in the main memory. Second, Without desk operations and user stall, a single thread is almost enough for the execution in OLTP, which decrease the overheads in multi-threading execution. It is applications’ instead of DBMS’ work to divide the long transaction into small ones. And long queries will be directed to a data warehouse system instead of OLTP. Third, following the computer systems will be shared-nothing with grid architecture. Fourth, the new architecture should support multiple replicas and fast recovery. Lastly, H-Scores should redesign the tuning process to avoid knobs. It is solved by creating a database designer.
The strength of the paper is that it points out a direction for the development of DBMS, which is making professional and specific DBMS for different areas instead of a general one for everything. It is an eye-opening article to inspire the DBMS designers to pay attention to the market and think out of the box.
The weakness of the paper is that it provides too little examples and graphs for the idea, which makes it hard to understand. For example, when talking about the constrained tree application, which is an important part of the transaction theory, it only provides a literal explanation. Though it is enough to convey the concept, the paper can be more accessible if the authors could provide more specific contents.
This paper analyzes the limitations and problems of the traditional Relational Database Management System (RDBMS) in processing modern OLTP workloads. The paper points out that the technology and workloads that RDBMS was designed on has undergone drastic changes over the years, making some of the design decisions that were made in creation of RDBMS incompatible with the modern day, and that DBMS that are optimized for specific workloads will lead to much better performance and efficiency. To illustrate the point, the paper use a prototype DBMS, H-Store as an example of how inadequate RDBMS in today’s setting.|
The paper points out that RDBMS were originally designed for business data processing, and were built on the following features:
- Disk based storage structure
- Multithreading to hide memory latency
- Locking-based concurrency control
- Log-based recovery.
These features are no longer applicable to modern day workloads, even for OLTP, as main memory has become large enough to encompass all the OLTP database. This means the database can be main memory base, reducing the requirement to try hide the memory latency through multithreading. Furthermore, it is possible to minimizing locks and concurrency control depending on the transactions and workload of the database. Combined with grid system and high availability system, logs can be reduced to a bare minimum. The paper implements this system in H-Store architecture, specifically designed for OLTP workload, and show two orders of magnitude better performance than a traditional RDBMS.
The paper does an excellent job presenting the initial assumptions made by the RDBMS design, and the problem of “one size fits all” approach of the DBMS design back then. The paper makes a strong argument for a workload-specific DBMS that are optimized for better performance. The paper however is quite weak in its evaluation of the experimental result, as very few quantitative data is provided, with minimum analysis.
This paper introduces a new prototype designed for OLTP market, H-store and tries to use H-store as a good evidence to prove that the RDBMS is out of date. H-store is a in-memory single-threaded databases without locking and most of the logging compared to traditional RDBMS. When working in a cluster, it implements a modified version of optimistic concurrency control to ensure serializability and minimize overhead in communication.|
This paper, on a positive side, shows that it is possible that a new kind of design strategy for DBMS will perform better than general purpose RDBMS. As the paper mentions that “one size fits all” era is ended, a dedicated DBMS for a certain use may outperform a general relational DBMS, since now different market may expose different workload on the aspect of databases. H-store becomes an alive example for this. Its design takes the features of the current market workload into consideration. For example, considered a single transaction in OLTP is always light-weighted, it uses single thread to avoid the overhead of multithreading and concurrency control in a single site. It also removed many component like logging, transaction manager to speed up. Tested on the benchmark TPC-C, H-store outperforms the traditional relational databases in almost two orders of magnitude.
However, there are some points in the paper may need to be justified:
1.H-store assumes that the workload in OLTP is mostly read only and short transactions. But this may not stand because like sellers may sometimes change prices or even begin to sell different product. The update transaction may still take some certain part.
2.H-store distributes data horizontally, and the workload is balanced to each site. But as the workload always changes, user may want to query different stuff and data may need to be redistributed. The frequent redistribution may come a problem when H-store tries to scale up.
Computer hardware has advanced greatly since 1970, but many databases still have features that were developed based on the assumptions of that era. Some of these assumptions include: databases cannot fit into main memory; consecutive transaction processing can destroy response time because of waiting for disk I/O, user stalls, or just CPU power. These assumptions are no longer true in the OLTP context. In addition, databases with ancestry from the 1970s era also were not designed for grid computing and use logging for disaster recovery, which decreases performance.
The authors of this paper make the case that in order to get away from these assumptions, traditional database systems need to be completely rewritten. These databases are already outclassed by other databases designed for specific applications (Data warehouses, stream processing), and the authors posit that they aren’t optimal for OLTP applications either.
To show this, the authors design a DB called H-Store, which is built on a grid of nodes which each run single-threaded. H-Store analyzes a workload and tries to find a hierarchical tree-structure for the data to minimize the number of branches needed to be traversed by each query. The authors note that the data commonly used for OLTP fits this archetype- e.g. one customer has many orders which each have many products etc. The data can then be partitioned according to the nodes of this tree and distributed among the computer grid- thus many queries will only need to contact one node, or can be split into a set of queries that contact one node each.
The authors find that the performance of their system is 82 times better than a popular traditional DBMS. The paper ends with two radical suggestions; Seeing as databases that are built specifically for certain applications are shown to outperform the traditional database, this could suggest that the relational model may also not fit as well for these applications. The same logic applies to SQL- it may be that we need specific languages for specific applications.
The authors present strong claims for why traditional databases are outdated, and back it up with a verifiable example of a better OLTP system. They are also not afraid to follow the implications of this trend in databases to suggest that even the standards of relational DBs and SQL may need to be changed.
Was the data sharding of the H-Store performance test done by hand or by an automatic physical database designer? If they were done by hand this puts a (slight) damper on the performance results.
Following the past papers that questioned the paradigm of commercial relational DBMSs, “one size fits all”, this paper goes even further and argues that the current RDBMSs are not even good at the business data processing (OLTP) market, which is their own specialty. The authors attempts to persuade readers with the benchmark result from their new OLTP prototype, H-Store, demonstrating that it is nearly two orders of magnitude faster than a commercial RDBMS.|
It is not difficult to see that the arguments of the paper make sense. It has been discussed a number of times in other papers. Much cheaper and larger main memory and the rise of shared-nothing distributed (grid) computing have accelerated the obsoleteness of traditional RDBMSs. The traditional RDBMSs have been designed more than 30 years ago and they were optimized for hardware characteristics at the time, which include disk-oriented storage, multithreading for concurrency and latency, log-based recovery, etc. These design principles had not been changed much since surprisingly. The paper claims that the time has come to completely redesign DBMSs for OLTP since the current DBMSs are not even good at what they are designed for.
The benchmark comparison between H-Store and a commercial RDBMS shows that it is certainly possible to build a new OLTP database optimized for the current hardware that easily outperforms the traditional RDBMSs. The benchmark result itself is surprising with the performance improvement of the nearly two orders of magnitude, but I personally think that their approach was obvious to achieve such improvement and not that surprising. The authors basically implemented an in-memory database without logging and concurrency control. The speed-up it shows in the paper is so obvious that I would be more surprised if it was not faster than the traditional DBMS in the benchmark. In my opinion, the commercial DBMS may have achieved a similar performance if it is entirely loaded into the memory without logging and concurrency control and this makes me not so intrigued by the design of H-Store.
The main takeaway points in this paper are that the architecture of commercial RDBMSs is obsolete and a complete rework from scratch is required to optimize them for the current hardware. The paper proves this point with their own new OLTP database prototype, H-Store. The performance improvement is very significant with somewhat obvious optimizations made to it. We are witnessing many of such changes in today’s databases with the flourish of many different types of special databases to fit the hardware configuration.
This paper presents the idea that current database design ideas are outdated and need to be rethought. In the past, business data processing was the only need for a database and so many of the design decisions were made based on this need. Furthermore, twenty five years ago, computer technology was not as advanced as it is today so some of the decisions were influenced based on resource constraints such as main memory. The paper states that technological constraints are no longer a concern and the needs for a database have expanded beyond business data processing. The old design was a jack of all trades, but a master of none and thus is insufficient in supporting today's needs such as OLTP.|
The size of main memory has improved quickly compared to the size of an OLTP database. As such, it is possible to fit the entire database within in main memory. This fact means that transactions run much more quickly due to a lack of disk I/O. The paper also makes several more observation of unnecessary fat that can be trimmed from databases to better support an OLTP setting:
1) Ad-hoc queries do not exist in an OLTP setting. There is a lot of overhead in a database that comes from supporting Ad-hoc queries.
2) Switching to single threaded execution can vastly simplify some of the structures within the database and result in better performance due to not needed to monitor resource sharing.
3) Dynamic locking is unnecessary in a pure main memory system. Many of the transactions are very short lived so an optimistic approach to concurrency control is better
The first observation is an extremely important one in my opinion because it allows authors to assume a workload is known entirely in advanced. With this assumption, H-Store, the system used for comparison in the paper, can make certain optimizations not possible in tradition system. First of all, it can create stored procedures for all possible transaction classes, which improve performance by reducing round trip communication costs. Secondly, query optimization can be performed before runtime since all possible queries are known in advanced.
The paper introduces H-Store as an example of a more modern system compared to a "popular commercial database". It describes some of the new design decisions that affected H-Store's design such as two-phase and sterile transactions. A comparison is done as well using a modified TPC-C benchmark and the results show that H-Store preforms almost twice as better as a traditional database.
This paper is a call for a redesign of database systems. With new technology, many of the old decisions are obsolete. The paper even remarks that SQL is flawed and should be replaced with a different language. Database research should be looking to improve different features in regards to OLTP design. Certain structures in databases may perform differently when stored only in main memory and thus should be optimized for this new setting. Furthermore, it seems that in an OLTP setting certain features, such as query optimization, are less important to improve since all information is assumed to be known before runtime.
One of the weaknesses of the paper was in the results. Although the one result published accurately portrayed the inefficiency of current design, it was only using a throughput comparison. I would have liked to see a latency comparison as well to get an idea of not only how many more transactions are processed, but also how much faster the response time is. Having multiple transactions processed per second could be meaningless if a user has to wait ten times longer. This left the results a bit incomplete for me.
This paper states that a generalized relational DBMS can be beaten by some "specialized engines" in all markets, including the OLTP market. That is, while trying to provide multiple capabilities, traditional RDBMS really is nothing superior to a set of specialized engines. This is because that current RDBMSs are mostly based on System R, which was designed for the market back in 1970s; the hardware technology characteristics and user interfaces obviously have become very different. Thus the authors think that the solution is a total re-design.|
In order to prove that popular commercial RDBMS can be beaten even in the business data processing (OLTP) market, they built a specialized DBMS engine (H-Store) for OLTP, and compare its performance to a popular commercial RDBMS.
The major points and contribution of this paper include:
1) They introduce that the bottleneck of the commercial DBMS are mostly logging and concurrency control overhead, which can be eliminated by some specialized design targeted for OLTP (discussed in Section 2).
2) Their experiment results show that H-Store can achieve almost two orders of magnitude better than the commercial RDMS on TPC-C transactions.
If there are any drawbacks in this paper, I would say that:
1) The experiment settings tuning for the commercial DBMS can be clearer (the paper only mentions "several days of tuning by a professional DBA").
2) Although each specialized engine can significantly outperform traditional commercial DBMSs in its specialized market, using a collection of specialized engines still introduces more burden when multiple capabilities are required. How do we measure whether the performance improvement is worthwhile when additional integration of several engines is needed?
This paper suggest that specialized engines perform better than "one size fits all" relational DBMS in the data warehouse, stream processing, text and scientific database markets. The paper compares H-store, a new OLTP database, to TPC-C, a popular "one size fits all" RDBMS. RDBMS are designed for the data processing market, but are easily beaten in every other market by specialized engines. |
There five major trends/issues in OLTP design that exemplify the superiority of a specific engine like H-Store over current, generic RDBMSs. The first change is that OLTP can now fit on main memory because the increase in main memory size of common machines. Thus, similar to what was discussed in the "OLTP Through the Looking Glass, and What We Found There" paper, the disk-oriented relational architecture for OLTP applications is antiquated and generic databases can be stripped down in multi-threading , transaction, and logging to achieve better performance. The second is that OLTP transactions are now light-weight, so multithreading is no longer as necessary. Single-threading can be utilized, and concurrent B-trees can be removed to result in more reliable and higher performance systems. H-store handles concurrency control by running single-sited and one-shot transactions with no controls, running other transactions with the basic strategy, running the intermediate strategy in the case of too many aborts, and escalating to the advanced strategy is still too many aborts. The third issue in OLTP design is that they are currently optimized for shared disk architectures, but the internet is now dominated by shared-nothing networks. H-store represents this by running on a grid of computers. The fourth trend is that high-availability now simplifies recovery because takes away the need for REDO log, removing large amounts of complex code. H-store implements at least two copies of each table that are transactionally updated, and there is no redo log; the undo log is written only if required, and is trashed when the transaction commits. Finally, the legacy of RDBMS has too much code requiring human action; the new system must be self-healing, self-maintained, and self-tuning. H-store handles this by building an automatic physical database designer that specifies horizontal partitioning, replication locations, and indexed fields.
The paper thus predicts the end of "one size fits all" systems, the inappropriateness of current relational implementations in any segment of the market, and the need to redesign the data models and the query languages for specialized engines. The paper suggests that there is a lot of future work to go. For one, there is a need to identify when single-sited, two-phase, and one-shot applications can be automatically identified. The rise of multi-core machines indicate possibility optimizations for sharing work between logical sites located on the same machine. The performance of various transaction management strategies need to be studied. The overhead of logging, transaction processing, and locking in OLTP systems can be used to determine the aspects of traditional DBMS design that contributes most to overhead. The in-memory data structures of the H-Store implementation is limited in performance, so more study on how to optimize those structures is needed. To allow systems similar to H-Store to exist with data warehouses, we need to integrate with data warehousing tools.
Overall, the paper was concise and comprehensive in defending its argument that specific databases are more effective than generic databases. Limitations of this paper are that it does not provide quantitative analysis of the H-Store performance over the TPC-C generic database. I would have also liked to see a discussion on how widely these H-Store improvements are used today.
This paper is titled "The end of an architectural era" because it's about how the authors believe it is time to change the way we use and develop relational database technologies and this is what they attempt to motivate in the paper. The authors argue that new specialized databases will be used for specialized domains and what are left will be RDBMS systems for OLTP. If this is the case, the authors suggest that this still won't be ideal. They prove that better systems can be created by examining the hardware assumptions that were in place at the conception of RDBMS systems and show that H-Store outperforms a popular commercial RDBMS that was tuned by a professional DBA by a factor of 82.|
This is a strong paper because it examines the issues with the assumptions that were made with RDBMS systems many years ago. These older systems were disk oriented, multithreaded, and locking and log based. The author discusses the importance of main memory, multi-threading, grid computing, upgrades, and availability concerns and the desire to have a "no knobs" database system. These are well motivated and clearly discussed. The authors choose a benchmark for OLTP transactions. This is great and is something I had suggested previous papers should have done, though I wasn't aware of the types of benchmarks available previously.
It is not clear to me that using just one TPC-C benchmark is the best way of presenting an empirical RDBMS result. It is a good result but it is still just one data point. I looked up the TPC benchmarks online and there are several other benchmarks available that they could have run this on. It’s always good to have a few more data points, especially since it seems like it would not have been too difficult to run these benchmarks when they had everything else already in place. I consider this a drawback. Additionally, in their discussion of programming languages I found a drawback in their discussion of preferred little languages. I thought this section was insightful, but I don’t think that if we have little languages for one set of applications and one set of little query languages for different types of database systems that it will be easy to manage. Worst case scenario will give you a number of interfaces to database systems equal to the product of these two numbers. I could see specialized languages being used for a purpose specialized enough that it would only use one type of little query language. The coordination of the construction of languages and interfaces within the domains that require this is not a question that the authors discuss.
Part 1: Overview|
This paper proposes that hardware technology change may put an end to the relational database architecture. As inherited from System R from the 1970s, many databases now still include features like, disk oriented storage structures, multi threads, locking based concurrency control, and log based crash recovery. However in the area like text, data warehouse stream processing, or scientific and intelligence databases, data size may fit into memory and can therefore utilize the fast access of RAM and enhance performance by random access.
They claim that OLTP databases should fit in RAM and be memory oriented. Also OLTP may be light weighted and therefore can get rid of multi threads and concurrency control. Single-thread model can achieve higher throughput. In real world databases systems, long running commands are rarely found in online transaction processing databases as they will be decomposed into several small commands and thus reduce the transaction time. Grid computing and fork-lift upgrades should be no longer used as shared nothing model is becoming the trend. High availability is becoming more and more important and companies are willing to pay for not going into disaster recovery down time. Shared nothing model is leading the trend and we should build the database to support shared nothing model from the bottom layer instead of top of old systems. Human resource would become more and more expensive and thus we need self tuning databases badly.
Part 2: Contributions
This paper points out the way of modern in memory database design for OLTP so as to fully utilize the new hardware technology. Also they summarized the possible improvements of in memory databases by pointing out the performance cost of the relational databases that are currently in use.
Part 3: Possible Drawbacks
They claim that there is no long running command in OLTP system, which limits the usage of the new in memory database design they proposed. As the history would probably repeat and the data size may probably go out of bound again as we are in the information explosion century. It is possible that in memory databases are suitable for some applications, however we still cannot discard those big data oriented, concurrent databases.
The paper claims that RDBMS doesn’t excel in anything even in the area of business data processing (OLTP) where it was primarily designed for. The authors propose a new OLTP engine called H-Store to demonstrate their claim. H-Store uses grid of computers where each computer holds rows of tables in main memory thereby avoiding any overhead associated with disk based OLTP. In particular, H-Store can achieve a significant magnitude in performance by minimizing overhead due to logging and concurrency control system. |
In order to avoid such overhead, OLTP transaction workloads and schemas need to have certain characteristics. This is inline with the authors advocating for specialized database engines as opposed to general purposed RDBMS. For example, if an application is single-sited i.e every SQL command in each transaction can be executed in a single computing node/site then such application can be executed without any control, effectively avoiding overheads associated with logging and concurrency control. Note that because of implementation of single threaded execution in a single computing site, there will not be any overhead associated with locking or other concurrency control mechanisms as long as the application is running in a single site. In addition, applications categorized as sterile and one-shot also don’t need any concurrency control mechanism. For other kind of applications which need concurrency control mechanism, H-Store uses a scheduling technique which keep track of conflict frequency among concurrently running transactions and avoid running such transactions together in the future effectively lowering conflicts.
The main strength of the paper is pinpointing weakness of RDBMS in its area of comfort i.e OLTP based applications. I found the author's approach in directly addressing this specific area than general areas more insightful and useful.
The main drawback of the paper is that the authors poised to forgo the benefits of being general purpose. Although specialized engines are good in improving performance, this approach increases development, maintenance, and deployment costs. It is because there is a need of designing different kind of database systems. In addition, customers are supposed to deploy different kind of DBMS to satisfy their need. Furthermore, there is a need of having trained human power in each of specialized database engines which may become a huge burden in both developer and customer sides. Consequently, the authors could have tried to explain as why there cannot exists any middle ground which balance both performance and general purposeness. In addition, I found the motivation of the authors move towards single threaded database not as strong as they claim. Even with memory-based database, multithreading technique is used to hide memory read latency which is an issue in current computing system and will only increases in the future. While the read latency cost is not as much as disk, it is still significant enough to create stall and degrade performance if we are going with single threaded database system.
Paper focus and motivation|
The purpose of the paper is to present reasons and experimental evidence that showed the end of relational DBMS. It will be outperformed by 1-2 ORDERS OF MAGNITUDE by specialized engines. "One size fits all" is the reason why RDBMS raise and why it will fall.
Problems for RDBMS
1. Hardwares today are very different from 25 years ago when RDBMS was first introduced.
2. Market requirements have become more specific and different from each other.
3. No user use terminal now, no direct SQL interfaces for users.
Roadmap proposed in the paper
The current relational databases are all from System R, so they contain outdated features. The paper sets up a road map for the next era.
1. disk oriented
Now we have much larger memory
2. multithreading to hide latency
OLTP are very lightweight now given the modern hardware available. There is no need to pay for the overheads of isolation. Going back to single thread earns us great benefits: removing concurrent B tree.
3. lock based concurrency control
Again, not needed for single thread configurations.
4. log-based recovery
Disk based log recovery and dynamic locking are unnecessary.
Weakness and limitations
Well, it's Stonebraker's paper, maybe he don't need much experiment data to support himself. Man like him has the privilege to say "You guys go do the tests..."
Beyond the paper
Michael Stonebraker, one of the authors, is one of the RDBMS pioneer. Now he is going to be the starter of another great era by ending this one himself.
This paper discusses about OLTP databases, about its past implementation and direction of future improvements. |
It consists of two major parts.
The first part first states how old RDBMS designs are challenged by a set of different new tasks and application areas such as Text Processing and Stream Processing. Then new OLTP design considerations are presented. New OLTP DBs take advantage of five major issues and achieves dramatically better performance than traditional RDBMSs. The five issues are:
1) Memory size
Many OLTP database can be loaded into memory completely due to dramatic increase of memory size
There is no need for multi-threading because the whole DB is now in memory. This reduce the complexity of OLTP systems
3) Grid Computing
OLTP can adopt grad computing easily
4) High Availability
Tradition RDBMS are more centralized and are difficult to support High Availability
Same issue as Memory size. Traditional RDBMS used many knobs for better performance. But this made it hard to manage.
These issues lead to some conclusion, which leads to part 2 of the paper, a new OLTP database system.
Their new DBMS is called H-Store. In the second part of the paper, they described the system architecture and how the take advantage of characteristic of their transaction to improve the performance.
Then result of a performance comparison between old RDBMS and their OLTP system is provided. They are 82 times faster than the old system.
This paper has 2 main contributions,
1) It identified many shortcomings of old RDBMSs under today’s situation.
2) It gives a good solution for this problem, which is the database they described.
For weakness, I think it might be better if they can give a more detailed result about the experiments.
This paper explores the idea that the major relational databases are a modified version of the databases that were created 30 years ago when memory was significantly costlier and do not have the capability of handling the hybrid markets that are present today. The paper concludes with the prototype H-store that the authors present as an idea to handle the data needs in today’s world.|
Some of the significant properties that need to be handled according to the author for today’s data requirements are
1.High availability – Databases should concentrate on being able to disperse their data across multiple machines and not just be a multi-machine support on top of a shared memory architecture. That way, performance can be improved by using multiple machines and failures will only cause a degraded operation.
2.Transaction processing – The data needs to be identified as being able to be partitioned vertically or horizontally so that horizontally partitioned databases can be handles as single-sited transactions and vertically partitioned databases can be handled as one-shot transactions where processing need only be done on the given columns.
3.Logging overhead – Logging seems to take a tremendous amount of CPU overhead, implementing two phase transactions where all read-only transactions are executed first and then queries are executed where there is no possibility of integrity violation.
4.Using single threaded processes instead of multi-threaded processes since the latter was designed keeping in mind the latency involved with disk read and writes.
I don’t believe the idea of doing away with multi-threading completely since using multiple processes can be more resource heavy. I think there has to be a way of being able to use the multiple cores in a system more effectively before we can completely do away with multi-threading. The paper also propagates the idea of executing query again on the parallel systems in order to maintain redundancy but I believe that is too much overhead and too much reliance on syncing of timestamps of multiple machines.
The authors have presented H-store as an example of implementing multiple relevant properties but I don’t think the result were too clear. The example was too specific to handle the kind of varied transactions real-time databases have to handle.
Overall, this paper is very informative with regards to the kind of processing current world database systems should be able to deal with.
This paper mainly reveals the fact that the major RDBMS vendors can be outperformed by specialized engines like H-Store in the data warehouse by 1-2 orders of magnitude due to multiple reasons, and the most important part is the development in hardware.|
Popular relational DBMS are all originated from the system R, which come from the 1970s. And they all include the following architectural features:
1.Disk oriented storage and indexing structures
2.multithreading to hide latency
3.locking based concurrency control
4.log based recovery
RDBMS are originally designed to process certain business data, and later on developed into a “one size fits all” system design. However, the current need for DBMS is no longer limited in business data processing, hence the relation model may not be the necessary answer. For example, in processing text data, stream data processing and data warehouse, RDBMS can’t provide the expected throughput and availability.Even in the traditional market of OLTP, RDBMS can be beaten by new systems like H-Store by two orders of magnitude on TPC-C benchmark.
And new engines can leverage over the five following issue to optimize the processing of OLTP transactions:
1.Increasing main memory: memory resident DBMS can be made to better suit the small size but high frequency of OLTP transactions
2.Single-threaded execution model: OLTP transactions are very lightweight, because long transactions split into smaller ones and ad-hoc queries are processed by other data warehouse system. The absence of disk operations and user stalls make the high processing speed of single transaction possible, hence the multi-threading and lock-based concurrency control are no longer needed.
3.Shared nothing grid computing: provides better extensibility and hence avoid fork-lift maintenance.
4.High availability: the peer-to-peer shared nothing structure can provide better failover performance than hot-standby, and hence don’t need to keep larger persistent redo log and only store transient undo log in main memory.
And in the end, the author made some prediction about the future development of DBMS architectures against the “one size for all” idea:
1.relational model doesn’t capture all market needs
2.SQl as a complex query language will be outperformed by little languages in terms of performance and user popularity.
To sum up, the strength of this paper is that it provided both strong theoretical evidence and trivial test result from H-store to support the ideas against the “one size for all” solution in RDBMS. And also the author put forward many thoughtful ideas in terms of the future development in DBMS architectures like (1)automatic identification of single-sited, two-phase and one-shot applications. (2) optimization in multi-core machines (2) integration with data-warehousing tools and so on.
But there are still some weak part of this paper, firstly, in the performance test for H-store, there isn’t enough testing data provided under different combination of queries, it just simply says it is 82 times faster. Secondly, in the H-store design, the author didn’t mention anything about (2.5)’No knobs’’ consideration. since the author himself identifies that in nowadays DBMS market the machines are getting cheaper, and DBA cost is rising, the performance/cost test should also include the manpower consumed.
In this paper, the author claims that the current “one size fits all” RDBMS can be beaten by specialized engines and good at nothing. The author thinks that it is the time to rewrite the database architecture instead of adding code on them. The paper represent a new database - H-Store for OLTP transactions that has much better performance.|
First, the paper shows the consideration for designing the new database that can achieve better performance. The in-memory is possible as the price of memory goes down and in-memory database will provide better performance. For in-memory database, the transaction will be done very quickly, so can run the database in single thread to avoid the overhead of manage the concurrency issues. In such case, the long running transaction will not be considered in this database. Then the gird computing is preferable and “fork-lift” upgrade should be avoided. High availability can be provided using hot-standby and peer-to-peer. Last, as the price for DBA exceed the price of machine, self-tuning should be developed.
Then the paper introduce the transaction characteristic and introduce the transaction class and workload partition. Also, it talks about the system architecture and how the database engine works like query execution, recovery problems.
Then the paper uses the experiment result to show that the H-Store have much better performance than traditional database can point out the key bottleneck is logging and concurrency control system. And in the end concludes that relational database and SQL such one size fits all designs will no fit the need in the future and should rethink the database model for each particular needs.
The paper first introduce the problem of current relational database and point out that the database model should be rewrite. Then the author use the database they developed to show that their DBMS is good at OLTP and have much performance than the traditional database and shows how they implement each part of database functionality to have a better performance corresponding to the feature of OLTP transaction. The paper also use some statistic report from market to support its claim.
The paper talks more about how the H-Store good at OLTP. It should talks about how it is not good at some kind of transactions such as long running transaction in details. Also, the paper should talks about more about the difference between RDBMS and H-Store such as buffer manager and index management which I think should be different from traditional DBMS.
This paper proposes a new database management system called H-Store that can be run across multiple servers to parallelize database computation. However, the main improvement that H-Store has over a traditional RDBMS is performance. Relational databases were developed to be a “one size fit all” solution, which the authors argue does not outperform specific DBMS solutions for that specific type of workload. In other words, by trying to solve all database problems, relational databases are now considered a legacy solution for all of the possible applications areas including text processing, data warehouses, stream processing and scientific processing.|
The paper starts off by introducing the design considerations for H-Store as follows. Modern hardware has improved to include the entire database in memory. CPUs have gotten faster, so most transactions can be executed in microseconds. Databases are slowly transitioning to being stored on shared-nothing architectures. Finally, for availability, many databases have multiple copies stored, so logging is not needed anymore.
With these considerations, H-Store is implemented across a grid of shared-nothing sites. Each site has a B-tree to sort its information, has rows of information stored in memory, and executes transactions in a single thread. There are three types of transactions to execute for H-Store: single-sited, one shot, and general transactions. Only general transactions involve multiple sites, executed as follows. The transaction is sent to multiple sites where they are executed if no conflicting transaction is at that site or aborted to the coordinator if there is.
While H-Store is better than traditional RDBMS in theory and results show that we can gain two orders of magnitude performance increase, I want to point out a few weaknesses with the approach and paper:
1. When the DBMS is experiencing peak volume, there might be a transaction that needs multiple subplans to execute before it can commit. How often will it be aborted multiple times because it conflicts with many different smaller transactions, and is there a mechanism that can prevent such transactions from starving?
2. The experiments ran compared H-Store with a lock based RDBMS. How does the performance compare with an optimistic concurrency controlled RDBMS?
This paper discusses the need to reevaluate the principles or relational database design given advances in technology over the last several decades. Many of the assumptions that went into the design of traditional databases are not true anymore. Specialized databases have advanced to the point where they generally outperform one-size-fits-all databases in all areas except OLTP. The authors of this paper develop an OLTP database called H-Store that outperforms a major OLTP database by a factor of 82, demonstrating that, even in OLTP, the former specialty of major vendor DBMSs, specialized databases have become superior.|
The authors of this paper designed H-store to ensure high data availability, prevent the need for forklift upgrades by providing "hot active" copies of sites, and remove much of the overhead associated with multi-threading, transaction management, logging and disk I/O. They accomplish this by adapting their query workload in a manner that provides consistency without using any of these systems. Specifically, they partition data in ways that make single-sited and one-shot transactions possible, while also attempting to guarantee that all transactions are two-phase and sterile.
My concern with this paper is that the workload is not necessarily representative of a typical SQL workload. The authors use a scheme that can be molded to fit all of their criteria for providing the ACID properties, but many databases are much more complex, much larger, and may not be able to guarantee all of the properties that the authors of this paper desire. It would have been nice to see results of experiments run with a more complex database to get a better idea of how H-Store would function in real applications.
This paper discussed the reasons and did some experiments to verify that current RDBMSs can be beaten by new DBMS engine for OLTP applications. RDBMSs are very popular nowadays. However, in fact, the hardware characteristics are now much different than the day when RDBMSs were conceived. For example, the processors are thousands of times faster and memories are thousands of times larger. Therefore, in this paper, the authors identified these changing situations and different markets in which databases are used and implemented a H-Store database in order to compare the performance with RDBMSs.|
First, the paper mentioned about many different situations between current times and the time when RDBMSs were conceived, including hardware characteristics and database markets. At the time relational DBMSs were proposed, there was only a single database market, which is business data processing. However, we have different markets now, such as text, data warehouses, stream processing, and scientific databases. In these different markets, the relational databases had no advantage. Thus, the paper argued that we should have a different approach for databases design.
Second, the paper described H-store, which is a different DBMS engine, and how they used it to implement an efficient OLTP database. For system architecture, the H-store runs on a grid of computers, and is single-threaded. In addition, since the H-store implements two or more copies of each table, it has issues of consistency. It is accomplished by directing each SQL update to all replicas. To verify the concept, the authors implemented a TPC-C database on both H-store and RDBMS. The results indicated in same configuration, H-store ran 70416 transactions per second, while RDBMS ran 850 transactions per second. Therefore, the authors suggested a different era of DBMS design based on the discussion.
The strength of this paper is that it gave sufficient motivation of the paper. Before the paper proposed its approach for DBMS design, it provided sufficient reasons why they want to do these modifications. It compared many different aspects between now and the old times when RDBMSs were conceived. Thus, the paper can convince reader why their approach can be used in modern database markets.
The weakness of this paper is the insufficient experimental results. After implementing TPC-C database on H-store, it only compared its performance with a relational database on one test case. To fully illustrate the ideas of the paper, I think the authors can provide more comparison with different commercial relation databases on several database markets.
To sum up, the paper suggested a new era for DBMS design and provides the reasons to illustrate this idea, which should be considered for DBMS designer nowadays.
This paper was about the end of a “one size fits all” era of RDBMS. It argues (rather successfully) that the market has shifted from what it once was when SQL took over the scene and RDBMS code was initially written. It has already been seen that performance can be improved in almost all DB sectors by writing more specialized code but the one that was remaining and considered still high performance without needing to be specialized was OLTP. This paper introduces H-Store as a new OLTP database and it outperforms the current standard by a factor of 82! |
Similar to other papers the main argument this paper uses for promoting the change is increasing memory in DB’s. Because of this major increase in memory from when the systems were initially designed there is much less need to wait for disk access. Because of this you are able to single thread to remove lots of the overhead of threading and increase performance. This results in a system that is easier to understand, more reliable, and even quicker (wins all around).
Another optimization that can be made now that wasn’t in the past is “high availability”. With hardware becoming cheaper it makes more sense to have duplicates of data ready in the event of a failure and not need to worry about redo logging. Logging is another significant overhead that can be removed and increase transaction performance.
By removing logging and locking (the two main performance bottlenecks) H-Store was able to complete the benchmark TPC-C at a rate 82 times faster than a commercial system. They were even able to have it run faster than the best-known TPC-C implementation.
I think this was a strong paper that was a good read and easy to understand. It was able to convince me that removing threading and logging might actually be a good idea (not an easy thing to convince me) and could lead to much greater performance. I agree with this paper in that change is coming shortly away from the standard commercial one size fits all, into more specialized higher performance subsets.
If I have to pick a weakness of the paper (which I don’t think there really was one), I would say the visuals were lacking. It would have been nice to see a graph of some sort in the results to clearly show how much better H-Store is. But I don’t blame them for not including one as it was really just a 1 on 1 comparison without many data points; and they did a good job explaining without graphics being needed.
This paper, as with many other Stonebraker papers, evaluates the state of DBMS systems in some fashion, and attempts to identify drawbacks and areas for improvement. However, although relational database management systems have made significant progress over the years, the numerous extensions and optimizations that have been adopted are a sign that it is probably not optimal in every situation. Specifically, for different markets (warehouses, text, scientific/streaming data), there are significant advantages to using technologies alternative to row-based storage. Essentially, the authors imply that “one size fits all” relational database systems are being sold as 30-year-old legacy technology that is good for nothing. For example, TPC-C performance with H-Store exhibits improvement by a factor of 82 over the traditional RDBMS scheme.|
By focusing on market-specific optimizations, there is a drift away from a system that attempts to solve all problems at once and ends up being mediocre at everything. Multi-core computing gives rise to potential areas for sharing work between operations, integration with data warehouse tools would be popular as the increase in usage of both applications comes to pass, and optimizations on OLTP systems (essentially what was covered in the OLTP: Through the Looking Glass paper), are needed for performance speedups in every market.
There was a good discussion on many of the key improvements necessary in OLTP systems and RDBMS scheming; perhaps additional topics that would make discussion more interesting would be on places where optimization is not necessary (i.e. improvement has already been done, move on to other market areas). It would also be interesting if there was some way to take advantage of the “hot standby” mentioned in the paper, so that, in addition to realtime failovers, there might be other circumstances in which multiple primary sites can be utilized to increase throughput or performance while still actually remaining as “hot standbys” without simply becoming additional primary sites.
In this paper, the traditional models for RDBMSs are carefully analyzed and dismissed in favor of an architecture that is suited for todays workloads and hardware. The authors of this paper explain how traditional System R features such as Disk orient storage, multithreading to hide latency, log-based recover and locking-based concurrency control should all be relics of the past and should no longer be included in RDBMS design.|
This paper also introduces a new RDBMS H-Store which is constructed with a “new-age” architecture. H-store runs on a cluster and all objects are partitioned over the nodes of the cluster. One particularly interesting design choice for H-store was to make each node single threaded. The threads do not communicate with one another thus each node is essentially stand alone. This means each node can perform the incoming SQL queries in a serial fashion without any interruption. Another interesting perspective provided by the authors is the idea of embedding a language like Ruby on Rails directly into the database rather than using SQL from a web application to communicate with the database. This could be a revolutionary idea if it can be implemented successfully because it removes the necessity of a middle language like SQL. It allows application programmers to focus on writing applications rather than efficient ways to interact with their app data.
The drawback with the approach taken in H-Store is that it is not particularly favorable for applications that have high scalability requirements. The techniques described in this paper rigorously maximize the throughput of a single CPU at the cost of availability. This notion that high availability is no longer a requirement only serves some use cases and certainly is not a absolute rule. Another issue with H-Store is with the idea that all transactions run in stored procedures. Having this model removes the ability of running ad-hoc or long running queries which can be very useful in certain applications.
This paper’s aims is to show that there is no single “one size fits all” database system, but instead different database type may require different database structure. With the differentiating market for DBMS, if the rise of specialized database engine keeps going, all that is left would be OLTP market and hybrid market where more than one kind of capability is required. However, the current RDBMS has not changed much since 25 years ago. This paper argues that it is time to complete “rewrite”. The paper shows this through experiments comparing H-Store – an OLTP-specific DBMS designed by the writers – to an RDBMS on the standard transactional benchmark TPC-C. |
The paper starts with describing major issues in RDBMS: main memory, multi-threading and resource control, grid-computing and forklift upgrades, high availability, and “no knobs”. Then it moves to the environment assumption on which the two systems would be run. It identifies the potential bottlenecks hierarchically. However, those also depend on the transaction and schema characteristics. After that, the paper talks about H-Store; from the system architecture, query execution, database designer, and, lastly, transaction management, replication, and recovery. In this case, H-Store is an engine that runs single-threaded operation, performs SQL command without interruption, employs automatic physical database designer (thus lifting the significant burden from DBA’s shoulders), does not use undo log, minimize the concurrency control (no control for sterile, single-sited, one shot transactions), and tries to cease the use of dynamic locking (for non-sterile, non single-sited, non one-shot transactions). After that, the paper explains the result of performance comparison, in which H-Store beats the RDBMS.
A major contribution of this paper is that it points out the weakness/incompatibilities of typical RDBMS in current database system market, even for business data processing market. It shows that the trend has moved toward specialization, in which it is quite futile to keep forcing a DB engine that could “do it all” while delivering mediocre performance to the market. It is a very interesting paper in a sense that it challenges the flat data model that has been dominating the RDBMS world until today.
However, the writers admit that they only implement partial TPC-C specification. I would like to know how it would run if it implements wait time. I also wonder if the suggestion of partial vertical partitioning of read only part of the tables is wise. If there is update on that part of the table, how does that update affects the whole performance? Also, H-Store requires that the workload to be specified in advance. Since it is an OLTP, it may be quite safe to assume that the transaction types are stagnant. However, if there is a new transaction type, how would the automatic physical designer in H-Store deal with it?
This is another sort of Stonebraker survey paper that seeks to show that the “one size fits all” relational DBMS structure that is present in more or less all commercial database instantiations is not sufficient for an evolving workload and application space that differ from the business data processesing tasks for which relations DBMSs were originally developed. They show how RDBMSs do not handle such applications as data warehouses, stream processing, and scientific databases as well as they could. Thus they see a motivation for users with these kinds of systems to look for a large-scale change.|
The technical contributions of this paper is a thorough discussion of components of DBMSs as they were developed in the 80s and still stand today (today, in this case, being 2007) and how these various components are outdated based on todays computer systems (e.g. almost all databases these days can live in-memory, so optimizing for disk operations is not necessary), as well as the ways in which they believe that DBMSs should think about evolving to support these various workloads. Another technical contribution is in their presentation of H-store, its system achitecture and how it improves over existing DBMS systems, including a brief and unsatisfying empirical results section.
I think the strength in this paper is that it points out importantly problems and outdated concepts that are inherent in relational DBMSs that are still used currently. It also presents logical progressions and conclusions derived after consideration of evidence. They also present some of their suggestions in an implementation, but that brings us to weaknesses.
I think there are several weaknesses in this paper. First, some of the justification for their statements is “we believe this to be so, and we’ve worked in the database field for a long time” or “by our personal observation, we believe this to be the case”. I am not convinced by such wishy-washy logic, though I don’t discount personal experience as a useful tool, I wish there were more concrete reasoning in certain sections. I was also quite confused reading about the specifications of the system. It was not clear in the paper which parts were hypotheticals and future work and which items were actually incorporated into their current implementation of H-store.
Paper Title: The End of an Architecture Era|
Reviewer: Ye Liu
Paper Summary: This paper demonstrates the trend that the then current RDBMSs can be overwhelmed by nearly two orders of magnitude in the OLTP market and in general concludes a trend that the “one size fits all” fashion of commercialized relational DBMs paradigm was coming to an end.
The paper starts with some statement claiming that there are significant advantages of specialized architectures over the old designs. Following that it provides some detailed informations about the aspects where the specialized architectures surpass their ancestors. The paper would be boring and less evaluated if it simply stated “the recent design of database architectures are a lot better than the old ones”, which is actually how the paper started. However the value of this paper, in my opinion, is that it provides detailed analysis of in what way the defects and drawbacks of the old designs can be overcome by taking advantages of the modern development in hardware to improve the performance. What’s even more interesting is that, in section 6 it actually provides some prediction of the trend of future database system’s developments.
An interesting observation from reading this paper is that the languages used in it is rather not as formal as one would expect from an academic paper. This trend actually starts from the title. After looking a bit more into the authors informations it turned out that this kinda makes sense. Considering that the major group of the authors are from MIT’s CSAIL while the rest of them are from companies’ research facilities, this is not abnormal anymore. As a matter of fact, I find such a style more preferable as it makes reading easier (in a sense this can be considered a factor that makes the paper reader-friendly).
This paper points out that existing heavy database systems needs to be replaced by new DBMS that focus on a specific area. It points out that old DBMS are specifically design for business data processing and they are not aware of the new areas in DB field, which includes data warehouse, stream processing, text DB, scientific DB and so on. Moreover, by conducting experiments on a new OLTP system H-store, it proves that by focusing on optimization on one field can brings nearly two magnitude of improvement.
It first introduces the history of currently existing commercial DBMS system, shows that they all have the same ancestor: System R, and the architecture of these program haven't changed for 25 years.
It then analyze the OLTP design consideration: 1) OLTP are in memory DB since OLTP database are small and memory are cheap, 2) single thread is enough, since OLTP transactions are light weighted and long transactions can be split to small ones. 3) It should use shared nothing architecture since the incremental scalability is desirable. 4) High availability is archived by p2p system or hot standby or the combine of them. 5) Automatic tuning is desirable but knobs can provide more control to its users.
It then describe the system design of H-store, which take advantages of the OLTP design considerations and provides an in memory, grid computing, cost base optimized with more considerations of OLTP features like depth of OLTP transactions, auto physical designed system. Moreover, H-store, as a shared-nothing system, use time stamp as its concurrency control schema.
This paper also provides a complete analysis of its performance with commercial DBMS on a widely used third party benchmark, TPC-C. Moreover, it also introduces the future work along with the analysis of the development of next generation DBMS System.
1. It designs a H-store system that take advantage of the single purpose DBMS system proposed by the paper itself, and archive 82x improvement. This future proves its proposal that a redesign is needed to prune performance for specific tasks.
2. It both implement several TCP-C on both H-store and commercial RDBMS, which make the experiment results more convincing.
3. It lists its future work as well as the future trend of DBMS development, which I think is very inspiring.
1.The H-store system utilize the semantic of fixed workload to optimize its performance, which may leads to an unfair experiment result since the commercial OLTP system may not take advantage of that. Moreover, this requirement also makes the H-store system more constraint, which may not be easy to deployed and used.