Review for Paper: 2-Anatomy of a Database System

Review 1

Commercial DBMSs rely on many architectural tricks and techniques that are not mentioned in most textbooks or review articles on databases. Yet these techniques are vital to the efficient and correct functioning of commercial DBMS software. In “Anatomy of a Database System,” Michael Stonebraker discusses several core architectural components that underpin relational databases, including the process manager, the query processor, and the transaction manager.

Stonebraker highlights elements of the architecture of real-world DBMS implementations that are critical for their success. In the process manager, modern DBMSs may use a server process architecture with a dedicated I/O process for each disk, so that a heavyweight process is not needed for each user, yet all disks can be kept running asynchronously to maximize throughput. A DBMS may even implement its own threads, which would be more lightweight than OS threads, and its own scheduler for switching between users and queries. In the query processor, a DBMS restructures users’ queries to remove redundant clauses or joins, replaces view references with table references, and optimizes by ordering WHERE clauses based on selectivity estimation. Finally, the transaction manager of a DBMS provides for concurrency control and crash recovery, via locks and logging, respectively. Transaction management can provide varying degrees of isolation, depending on whether locks are required for reads, and on when locks are released.

The paper contributes a review of patterns from commercial DBMS architectures. These patterns are essential to DBMSs for business and other serious users, yet are not commonly addressed in textbooks. The author provides considerable detail on how vendors such as Microsoft, IBM, and Oracle approach problems in DBMS construction. The author discusses the design tradeoffs behind alternative approaches for low-end and high-end database customers, such as whether to use raw disk access or a large heap file for writing to disk, and whether to use a process-per-user or server-process model.

Unfortunately, the article glosses over some of its central points, failing to explain in depth why the author prefers a certain implementation alternative. Instead the author digresses into coding details such as SARGs and conditional_lock, while he does not fully explain more basic ideas such as write-ahead logging. The paper could be improved by reducing attention to coding details, using the space to more clearly explain core concepts.



Review 2

This paper outlines the architectural skeleton of modern database systems with frequent background information about its development over the years. Typical components of a database system include four pieces as follows –

1. Process Manager – A process manager takes charge of processing various queries thrown to it with the use of two major components: Admission Control and Connection Manager. The assumptions made for describing various models include High-performance OS Threads and Uniprocessor Hardware. The writers point out that such assumptions maybe true today, but they weren’t applicable since Day 1.

Paper outlines three models for a DBMS based on the level of architecture: Process per Connection, Server Process and Server Process with Input Output Processes incorporated in it. The writers then explain how the data is transmitted among threads and impart basic knowledge on the concepts of DBMS Threads, OS Processors, Parallelism, Process Models and Memory Coordination. It also spends time on different storage models available to the DBAs.

2. Query Processor – The query processor helps in the Parsing of the query and making use of relevant practices which make the system understand the requirements of the user and provides output based on his/her authorization level. It stresses how the query is rewritten by the system to make it simple and how it is optimized using the various practices such as Plan Space which focuses on left-deep query plans, Selectivity estimation technique based on use of joins and Search algorithms which were used by Microsoft and Tandem, which is based on top-down search scheme.

The writers then explain how the queries are executed after it is processed. They follow it up with Access methods applicable for managing storage access attending to the concept of reliability.

3. Transactional Storage Manager – The transactional Storage manager deals with Concurrency control and recovery. It makes use of two major concepts of Locks and Logs. A relevant term ACID is introduced which intertwines these two concepts and help achieve Atomicity, Consistency, Isolation and Durability. The writers then point out that these two don’t necessarily help in eliminating the durability problems but it’s a step further in this respect.

The writers help in providing an insight on locking and logging processes in Indexes. It also explains how latching works in B+ trees. They close the topic by telling what the interdependencies of Transactional Storage are.

4. Shared Utilities – This includes additional topics on Memory Manager, Disk Space Manager, Replication Services and Admin utilities.

The paper provides information rarely found in the traditional text-books about the structure of DBMS. The monotonicity of the paper makes the paper a dull read at times but it is full of concepts required for understanding the architecture of a DBMS on a deeper level.



Review 3

This paper tried to combine the database system field and computer architecture. Coincident with the title it investigate the backbone of database: architecture. Because the author notices that in database field some concepts have been reinvented several time(the same as the first week's), and he give two explanations:

1. small community
2. academic treatment of database is ignored architectural issues

, the author thought it's time to think about architectural aspect of database system.

The author presents four main components for DBMS: process manager, query processor, transactional storage manager, and shared utilities.

The first chapter discuss the use of process or threads. Start with assumption of uniprocessor and high performance and weaken those to present parallelism and mapping of threads and process. For parallelism, it presents four classes: shared memory, shared nothing(aka distributed), shared disk architectures, and non-uniform memory access. They all have their trade-off. Shared memory is similar to uniprocessor model but lack of scalability, in shared nothing architectures, data partitioning is a hard problem, and partial failure is another thing to be managed. Shared disk has become an attractive approach due to the advent of network attached storage devices, but the distributed lock manager facility can be bottlenecks for workloads.

For storage models, it presents spatial control and temporal(buffering), the first is important because as speed gap between sequential access and random access which is often neglect in theoretical discussion, the second deals when data get physically written to the disk and most DBMS implements a large shared buffer pool in its own memory space instead of using the manage system by OS.

The following chapter is about query processor which contains parser query rewrite optimizer and executor. The parser convert SQL statement into internal format and checks permission for these operation with catalog manager which is a metadata that store information about data in system. The query rewrite module optimizing the query without changing its semantics, and the optimizer will create an efficient query plan by minimizing the number of operation and data used during the queries.

A drawback which is also a strength of this paper is that it's opposite of the current trends. As the knowledge grows, it's not easy to excel both fields. Even through there might be some improvement by consider these things together, instead to take architecture as a black box, is the improvement really worth the cost? I think it's really good introductory survey to database system that covers every parts of database from architectural aspect, but not good research direction.




Review 4

This is just a dummy text


Review 5

This paper provides an overall discussion about database system architecture, and many of them involve advanced topics. There are four main parts in a typical database system, process manager that controls tasks, query processor that handles queries, transactional storage manager that knits together different components (storage, buffer, concurrency control and recovery), and shared utilities (memory management, disk management, replication services and administration utilities). And the discussion goes along with these four main parts for a modern relational database management systems (RDBMSs).

The problem here is the author is concerned that the lessons of database system architecture are not known broadly. Given that the applied database system community is small and the architectural issues have been often ignored, there are a lot of important lessons for building database system not been communicated widely. For example, students are often given lectures about algorithm and theory but not system architecture. Therefore, the author presented this paper to deliver those important lessons and discussion on database system architecture.

The main contribution of this paper is not quite about original research, but an overall, full-scale, and systematic discussion about database system architecture. It gives the reader a better understanding about database systems besides algorithms or theories. For example, the term “ACID transactions” is widely mentioned which stands for Atomicity, Consistency, Isolation and Durability. I also learned this concept during undergraduate database management system courses. However, I didn’t have a good understanding how they are related to the architecture. We might know that Isolation is implemented by locking and Durability is by logging, but we might not know how many kinds of locks there are, what the difference is between lock and latch, etc. This paper gives detailed explanations about those kind of questions which are very helpful.

In general, this paper is good for providing an overall discussion about database system architecture. However, some lessons or ideas related to operating systems are a little bit hard to understand. I have to go back to review some of the concepts learned in undergraduate operating system course. It might be me not very familiar with operating systems, but it also innovated me that research needs a good understanding on many areas. With a better understanding at a higher level, the innovation might come out.


Review 6

The purpose of this paper was to examine and explain the inner workings of a database management system. The authors claim that their intent was to focus on the components that are often ignored or under-represented in traditional database textbooks or courses. Their focus is on those components needed for the core relational operations, with less emphasis on the many extensions available in current commercial systems. The paper does provide an interesting prospective as to how hardware has evolved alongside the database systems, as well as operating systems. It was interesting to see how one would go about design a multi-user server, without today's modern thread libraries. The original developers of many database systems had to create their own thread libraries (replicating code that may have already existed in the OS kernel) in user space, and then attempt to map these to OS facilities. While originally DBMS threads were mapped to OS processes, after OSs evolved enough to support lightweight threads, DBMS shifted to single processes with lots of threads. It's really interesting to see how many traditionally-OS issues come up in database design - things like buffering disk access, managing where bytes go within a file, paging, and replacement algorithms for virtual memory pages to prevent issues with thrashing, where pages move in and out of virtual memory too quickly to be used efficiently. The authors then took a look at the query optimizer. This is one area where I learned a great deal from the paper. While the optimizer might seem simple on the surface, once you dive in you see how many pieces are involved. The parser, rewriter, optimizer and executer are all required to convert SQL statements into the results that the user seeks. The authors then move onto the monolithic piece of the DBMS, the transactional storage manager. This piece, again, overlaps a great deal with facilities provided by the Operating System. They address the ARIES log protocol, ACID, lock managers, and latches. While this is something that the OS would provide in a perfect world, the OS cannot know how to roll back DB updates, or how to update indexes, so this must all be done within the DB.

The paper does a great job of showing, in one read, what the individual parts of a DBMS are, and how they work together. It does a great job of showing how the current state of DBMS technology has been shaped by hardware and operating system history. I learned a lot from the historical perspective. After taking 482/Operating Systems, I was under the impression that a single process, with enough threads, could provide more than enough concurrency for any application. It did a great job of showing how these systems have to run on real hardware, and presented the complications that arise because of that, instead of only focusing on the theory and algorithms. This paper did a great job of educating me, and hopefully many others, of the innovations developed within the database community.

The paper briefly touched on scaling systems up for more than one machine, and mentioned the different models - shared memory, shared nothing, shared disk, and NUMA. However, the discussion of these topics was very brief and did not address many of the issues that still need to be taken care of to make these distributed databases more robust systems. This may just be my own lack of knowledge on the subject, but I feel like there was a lot more that could have been said about these systems and the architectural trade offs involved.



Review 7

The paper mentioned that even though the database is commonly used, the architecture of database is not well known because of the limitation of designers’ community and the ignorance in the academic area. Thus the authors proposed to show the main architecture knowledge of database in the paper.

Firstly it talks about the overall architecture of database. A DBMS uses threads and processes of operating systems a lot. The most commonly used way of process model is server process, that is, a multithread process manages all the activities of the DBMS. And the corresponding I/O method should be asynchronous I/O. The most common way is disk I/O.

The second part is about storage issue. The file system of operating system does not support database well in many time. DBMS usually manages the storage of its data instead of letting OS to maintain those data. It is because DBMS has better understanding of its workload to make the access to storage more effective. To implement it, DBMS can either access the disk directly or create a giant file in file system of the operating system to accommodate data. And for the buffering, the DBMS also implement a cache in its own memory instead of adopting OS’s buffering pool.

Thirdly the paper shows details of query processor. For a query, it needs to be parsed, rewritten and optimized before executed to have a better performance. Redundant expressions will be removed or rewritten and the suitable algorithms will be applied. Also, the processor can evaluate the query to decide the use of advanced algorithms like parallelism and auto-tuning.

And lastly it discusses the architecture for concurrency control and recovery. This is used to guarantee the monolithic of DBMS. When multiple threads are working, the DBMS used a lock manager to implement serializability. And a log manager is adopted to keep records for the rollback and recovery needs. They are used to promise the ACID of the transactional storage.

The strength of the paper is that it talks about the architecture of database very systematically. It covers all the architecture parts and discusses both theoretical principle and practical methods. Thus it is a good complement of textbooks in the area of database.


Review 8

This paper goes over the high level details of Database Management Systems (DBMS), the different hardware organizations of the system, various components of the software organization, and the pros and cons of the different design choices. The main contribution of the paper is not providing new idea or insight, but giving the reader a great overview of the various aspects of DBMS and providing concise architectural view of the whole system.

For hardware organization, the paper goes over processor and memory system. The processor models of DBMS are: process per connection, server process, and server process + I/O process. For memory organization, there are shared memory, shared nothing, and shared disk. The important tradeoffs in each of these designs are performance, implementation complexity and scalability of each solution. Which solution a DBMS takes depends highly on the characteristic and size of the target workload.

In the software side of things, the storage model of DBMS maintain spatial control with direct interfaces with the disk, as well as temporal control through large buffers. The query engine of DBMS consists of receiving a query statement, validating it, optimizing, and executing the query. The concurrency of DBMS are managed through a lock table and a transaction table that are managed by the lock manager, The log manager maintains logging protocols to support recovery and reliability for the system. While log manager can be simple, the complexity of the logging increases dramatically when trying to optimizer for better performance. All design decisions and any additional features added to DBMS have direct implications on the performance, scalability and complexity of the system.

The various design choices and tradeoffs present in the paper gives no doubt that DBMS is a complex system that requires large amount of expertise and experience to build. One design that worked in one company doesn’t necessarily work well on the other, as shown by the example of query optimization approaches where IBM prefers reliability over performance, while Microsoft forgoes reliability for higher performance. Also, it seems like there are many potentials for improvement, especially in designing dedicated hardware DBMS.


Review 9

This paper introduces almost the whole database system by going through several issues in main aspect of DBMS:

process and hardware structure: it illustrates several process models under different assumption of the underlying hardware.
storage models: there are two options of the storage model. One is DBMS directly access the disks and the alternative is use OS file systems. The first one seems to have a better performance at high end but with difficulty in portability to disk access interface. The latter approach, it provides relatively good space locality without DBMS accessing the file directly.
query processor: it covers the brief topics on parsing, query rewrite, optimizer, executer and access method.
transactions: ACID are the main principles for the transaction. To achieve this goal, DBMS has lock manager and log manager. Further considering the performance(concurrency), the strategy becomes more complicated.

This paper not only brings big picture but also brings some insight of the DBMS, which gives me some new thoughts. For example, it stresses the importance of the buffer management here: ‘a well tuned database installation is typically not I/O-bound’, which is quite different to the ‘common sense’ of a beginner.


Review 10

This paper gives an overview of the main components of a database system, and discusses the problems faced by each component and possible solutions to the problems.

The paper first discusses the process manager, which uses the processes/threads provided by the OS and the processor configuration provided by the hardware to process user queries.
- DBMS have moved away from forking a process for each user request, because processes are too heavy-weight. Instead, modern DBMS use a central multithreaded process for requests and optional processes for I/O devices.
- The DBMS may choose to directly use OS threads, or higher level threads within the application which may be mapped to OS processes or threads. Using the higher level threads allows better portability, but it means the DBMS must manage its own threads.
- Processors in a computer can either share RAM and disk, share only disk, or share nothing. Sharing nothing requires the DBMS to shard its data across the disks owned by each CPU. The other two systems have their CPUs query data from the same location.

The paper then discusses the storage model, which is the interface that the DBMS has with storage on the OS. There is often a conflict between the policies of the OS and the desires of the DBMS- the DBMS knows better where it wants its data to be stored physically, and when to read/write data from disk to cache and back. To solve this problem, the OS may provide the DBMS functions that allow it to have lower-level access to where and when data is written. In recent times, the DBMS will often create one large file in the OS filesystem, and have control over where data is placed in that file.

The query processor is discussed next, which validates queries, removes redundant or unneeded logical operations, creates an optimized query plan, and executes the query. Optimization involves figuring out an order in which to fetch/transform data that is most efficient, based on estimating the time-to-completion of different options.

Finally, the mechanisms for enabling transactional behavior in a DBMS are discussed. Locks and latches are used so that a query isn’t allowed to see another query’s transitional states, and logging is used so that changes can survive a crash, and so that the DBMS can return to either a pre-query or post-query state (instead of a transitional state). Providing a correct system using these tools is fairly easy, but providing a fast system is hard. Locks/latches and logging often limit the ability of the DBMS to effectively utilize concurrency or caching. In order to optimize, locking/logging rules are allowed to be “broken”, as long as the DBMS is able to recover from any problems that may arise. This results in the transaction code being extremely complicated.



Review 11

The paper explains each of core components of modern database systems just as its title suggests – “Anatomy of a Database System”. It provides several in-depth discussions on various architectural aspects of database systems that are not usually addressed in textbooks. The authors focus on relational database management systems (RDBMSs).

Typically, there are four main components in a database system: process manager, query processor, transactional storage manager and shared utilities. The provided document has been truncated and only the discussion of the first three components is available, yet its discussion on these components is enough for readers to be engaged and understand the internal mechanism of modern database systems.

For each of the main components, the paper summarizes criteria that the component has to meet for a good database system, then describes different models, scenarios or techniques in order to discuss the context and objectives of those different approaches. With such descriptions, the paper enables readers to easily understand why a certain design decision had been made for a particular component at the time. For example, many DBMSs had separate I/O processes in order to prevent I/O operations from blocking each other when asynchronous I/O was not supported by operating systems.

Its rich survey of many techniques in different components of modern database systems is excellent for readers to know the fundamentals of modern database systems that are not often discussed in textbooks. From the paper, readers can appreciate the fact that modern database systems are inherently complex and how they are evolved into the current status. It is worthwhile to note that most of design decisions are made through the battle of database performance versus data integrity, consistency, durability, etc. and the advancement of related technologies (e.g., OS threads and I/O processing) can lead to the significant change in database’s approach to a particular problem (e.g., from shared memory to shared nothing or shared disk).

Even though the paper already contains lots of useful information, I think that its discussion could be improved by including applications and references of techniques mentioned in the paper to other types of database systems apart from relational database systems if possible.


Review 12

This paper provides architectural aspects of database management systems, which are usually beyond the scope of discussion in common database textbooks. The authors discuss different system designs of DBMS involving data storage management, operating systems, and network connections, as well as some examples of how the companies employ those designs in their database systems in practice.

Three choices of process models for DBMSs (Process per Connection, Server Process, Server Process + I/O Processes) and three parallel hardware configurations (Shared Memory, Shared Nothing, Shared Disk) are given in this paper. Choosing among these options is often a trade-off between complexity of implementation and performance, however, it can also depend on the evolution of systems. For example, applying a shared-nothing model could be highly difficult if the DBMS was originally based on a uniprocessor configuration.

The same interesting thing also happens to storage models: Most modern DBMSs now support the storage model that utilizes filesystems to allocate a single large file in it and manipulate the data stored in the large file using some interfaces provided. However, the raw disk support is still employed since it is developed very long ago before filesystems are available.

The authors also go through the important components (parser, rewrite engine, optimizer, executor and access methods) of query processors and give some SQL commands as examples to illustrate what those components do. The components of transactional storage manager are discussed as well.

Another interesting part of this paper is about the logical difference between DBMS and OS, such as the page replacement schemes and the "double buffering" problem. Although the authors do not provide much opinion on the system architectures in this paper, they do mention that copying data in memory can be a serious bottleneck in modern DBMS (rather than I/O bounded). So it is important to coordinate between DBMS features and existing OS features to avoid such overhead and make the system more efficient.



This paper provides architectural aspects of database management systems, which are usually beyond the scope of discussion in common database textbooks. The authors discuss different system designs of DBMS involving data storage management, operating systems, and network connections, as well as some examples of how the companies employ those designs in their database systems in practice.

Three choices of process models for DBMSs (Process per Connection, Server Process, Server Process + I/O Processes) and three parallel hardware configurations (Shared Memory, Shared Nothing, Shared Disk) are given in this paper. Choosing among these options is often a trade-off between complexity of implementation and performance, however, it can also depend on the evolution of systems. For example, applying a shared-nothing model could be highly difficult if the DBMS was originally based on a uniprocessor configuration.

The same interesting thing also happens to storage models: Most modern DBMSs now support the storage model that utilizes filesystems to allocate a single large file in it and manipulate the data stored in the large file using some interfaces provided. However, the raw disk support is still employed since it is developed very long ago before filesystems are available.

The authors also go through the important components (parser, rewrite engine, optimizer, executor and access methods) of query processors and give some SQL commands as examples to illustrate what those components do. The components of transactional storage manager are discussed as well.

Another interesting part of this paper is about the logical difference between DBMS and OS, such as the page replacement schemes and the "double buffering" problem. Although the authors do not provide much opinion on the system architectures in this paper, they do mention that copying data in memory can be a serious bottleneck in modern DBMS (rather than I/O bounded). So it is important to coordinate between DBMS features and existing OS features to avoid such overhead and make the system more efficient.



This paper provides architectural aspects of database management systems, which are usually beyond the scope of discussion in common database textbooks. The authors discuss different system designs of DBMS involving data storage management, operating systems, and network connections, as well as some examples of how the companies employ those designs in their database systems in practice.

Three choices of process models for DBMSs (Process per Connection, Server Process, Server Process + I/O Processes) and three parallel hardware configurations (Shared Memory, Shared Nothing, Shared Disk) are given in this paper. Choosing among these options is often a trade-off between complexity of implementation and performance, however, it can also depend on the evolution of systems. For example, applying a shared-nothing model could be highly difficult if the DBMS was originally based on a uniprocessor configuration.

The same interesting thing also happens to storage models: Most modern DBMSs now support the storage model that utilizes filesystems to allocate a single large file in it and manipulate the data stored in the large file using some interfaces provided. However, the raw disk support is still employed since it is developed very long ago before filesystems are available.

The authors also go through the important components (parser, rewrite engine, optimizer, executor and access methods) of query processors and give some SQL commands as examples to illustrate what those components do. The components of transactional storage manager are discussed as well.

Another interesting part of this paper is about the logical difference between DBMS and OS, such as the page replacement schemes and the "double buffering" problem. Although the authors do not provide much opinion on the system architectures in this paper, they do mention that copying data in memory can be a serious bottleneck in modern DBMS (rather than I/O bounded). So it is important to coordinate between DBMS features and existing OS features to avoid such overhead and make the system more efficient.



Review 13

The purpose of this paper is to discuss the main architectural aspects of modern database systems. DBMS is a very important issue in data management, applications, OS, and networked services. Architectural innovations are regularly reinvented because lesson in database system architecture are not widely known. This is because the community for applied DB systems is fairly small, as market forces support only a few competitors, and architectural issues are often ignored in academia, as the focus is generally on algorithms and theoretical issues. Currently, relational database management systems are the most developed model in production. A database has 4 main parts: process manager, which schedules tasks in the system, statement-at-a-time query, shared transactional storage subsystem, and shared utilities.

The paper discusses 3 natural process model options for a DBMS with 2 assumptions: high-performance OS threads (thread package that allows for many threads in each process) and uniprocessor hardware (design for a single CPU). The first natural process model is Process per Connection, in which the user runs client tool and a connectivity protocol connects to main dispatches process at db server machine. The advantages of this protocol are that it is easily implemented, as each DBMS unit maps to an OS process and the OS scheduler manages timesharing, that the programmer can rely on OS protection to isolate bugs like memory overruns, and that debuggers and memory checkers fit well with this model. Some complications with this model are from sharing data structures (lock table, buffer pool) explicitly allocated in OS supported “shared memory” that is accessible across processes. Performance is also not attractive in this model. It does not scale well to the number of concurrent connections, and the processes are heavyweight with memory overhead and high context switch times. This is not good for high-concurrency transaction programming. The second process model is Server Process, in which a single multiple thread host all main activity of the DBMS and the dispatcher listens for SQL commands. Shared data structures (lock table and buffer pool) live in the process’ head, and are accessible to all threads. The challenges to this model are that the OS does not protect threads from each other’s memory overruns and stray pointers, debugging is tricky with race conditions, and the software is difficult to port across OS because of different threading interfaces and multi-threaded performances. The third model is Server Process and I/O Processes, which assumes asynchronous I/O, allowing the DBMS to issue a read or write request and work on other things while the disk device works to satisfy the request. Advantages of this model are that it allows DBMS to schedule an I/O request to multiple disk devices, parallelism, and split-phase programming model (the separation of requests from responses). The fallbacks of this model are that it is a recent development.

The paper also discusses memory coordination in platforms with multiple CPUs. In shared memory machines, all processors can access the shared data structures (RAM and disk). Server process architecture parallelizes to shared-memory machines easily because the OS supports the transparent assignment of dispatchable units across all processors. The challenge of shared memory architecture is to modify execution layers for the ability to parallelize a single query across multiple CPUs. In shared-nothing architecture, clusters of single-processor machines communicate over a high-speed network interconnect. It provides no hardware sharing abstractions, leaving coordination between machines in the hands of the DBMS. Partial failure, where failure of a single node doesn’t affect the others definitely but affects the overall behavior of the DBMS, is a possibility. Shared disk parallel machine allows all processor to access the same disks but are unable to access each other’s RAM. An advantage is usability (no consideration for partitioning tables across machines) and that the failure of single DBMS processing nodes doesn’t affect the other nodes’ ability to access the full database. Disadvantages are the complexity due to local memory for locks and buffer pool pages, and coordinating data sharing across the machines. Thus, there is a need for distributed lock manager facility and cache-coherency protocol for managing the distributed buffer pools.

The main drawbacks of this paper are that it is very long and covers a lot of information. Sections 2-4 make sense because they cover the main components of DB architecture (process manager, shared transactional storage subsystem, and shared utilities). Section 5 on Transactions seemed a bit of a transition because it is less about the architecture of the process manager and shared utilities, and more about the mechanisms of transactions.


Review 14

This paper is another decade old paper by Hellerstein and Stonebraker. It attempts to describe the main architectural aspects of modern database systems that the authors think have not all been written down, or communicated effectively outside a small group of researchers that works on database systems.

This is an important paper because it discusses relational DBMS systems which serve the data needs of many industries and as the authors point out - is not information that is all widely known and contained in a single document. They divide this information into four sections including process management, query processing, transactional storage management, and shared utilities. It is important for people using databases or doing research in the area to know how most systems work and what types of features have been tried in other systems. This helps prevent reinventing the wheel, which the authors list as part of their intent in writing.

The paper is very well organized and covers a lot of material. Section 2 and 3 covers some aspects of buffer and thread management that are discussed in the 1981 Stonebraker paper. It is interesting that he noted these issues 24 years earlier and better solutions were still not found by the publication of this paper.

There is not much to critique in this paper. It is a relatively objective view of database practices. There are a few times when the authors informally refer to an observation they had which is questionable. For instance, the authors say that MySQL uses entirely heuristics for query optimization with too many tables the last time they checked. We don't know when this is or that it is still true. In general though, the paper is informative and well written. It stops for some reason in the middle of section 6, but I assume that is an issue with the PDF generation.


Review 15

This paper analyzes the overall design of a database system. It divides a DBMS into four main parts: Process Manager, Query Processor, Transactional Storage Manager, and Shared Utilities.
Part 1: Process manager
Possible process models are three folds, process per connection, server process, server process together with I/O processes.
Assertion 1: It should be kept in mind that asynchronous IO is no longer a new feature as contents wrapped up in this paper are kind of old.
Inter-process communication utilizes buffers and sometimes socket programming. When it lacks OS process support, DBMS would maintain its own threads for query processing, dispatching, as well as I/O optimizing. Most of the time DBMS can rely on OS provided “dispatchable units”. DBMS threads can share memory, disk or just nothing.
Assertion 2: share nothing sounds a solid design as shared memory can lead to consistency issues and may be hard to debug.
Admission control is important when there is high pressure on memory or other shared resources.
Takeaway 1: DBMS server process triggers many threads which sometimes serve the OS role to cut instructions into dispatchable units.
Part 2: Storage models
Although raw disk control can optimize DBMS performance, sometimes (e.g. in NAS or SAN systems) we need to rely on file system to optimize.
Assertion 3: Buffering must be done in DBMS as OS level buffering depends on contiguity of physical bytes offsets. This makes a well-known problem in DBMS research.
Part 3: Query processor
Queries would be parsed, authorized, and optimized through some query processor. System R optimizer keeps left-deep query plans. DBMS today also values parallelism, extensibility as well as auto-tuning.
Assertion 4: Companies like IBM focusing on the high-end customers would want more customized query optimizers while MS focusing on the low-end would spend more time on self-tuning. Research directions may be driven by the customer needs.
Executers carry out the real data flow. Iterators can couple dataflow and control flow. However problem may come out when one need to modify the data right after fetching it.
ACID (atomicity, consistency, isolation, durability) transactions are maintained by locks and logs. Index are used by DBMS however invisible to user programs. Non-transactional locking and next-key locking are also developed according to the ACID fashion.
Finally, independencies of transaction storage are also mentioned that many concurrency problems and recovery problems are intertwined and need to be considered together, and under most situations they are system-specific.



Review 16

The paper "Anatomy of a Database System" by Joseph Hellerstein and Michael Stonebraker teaches readers about important architectural aspects of modern database systems. The authors believe that current database education fails to cover architectural issues that arise when designing databases and much of what has been learned concerning the topic is not well communicated. As a result, the authors have written this paper to provide database users and students with context for the database algorithms and techniques they have encountered.

The paper first outlines the main components of a relational database management system and then describes them in the following order: the process manager, the transactional storage manager, the query processor, and shared database utilities. Within each section, the authors note architectural issues that arise when implementing certain sub-components. For example, when discussing persistent storage interfaces, one option is to use standard file system interfaces provided by the OS. However, these interfaces may delay or postpone writes, which can cause correctness and performance issues.

The paper does a great job in presenting the numerous complex issues that arise when attempting to implement a database and different solutions to those issues. Initially, operating systems did not have high performance threading packages so databases had to provide their own package, but not such options are available. However, not all problems can be solved using operating system provisions. Temporal control of data by the operating system can actually result in performance overhead and as a result, databases attempt to circumvent the provisions provided by the operating system. These two contrasting examples give readers awareness of the complexities in database architecture that they may not have considered previously.

The paper also provides very in-depth discussions of database components readers may be aware of, but may not understand such as the query manager. The authors not only discuss the numerous optimizations performed on a query and the logical complexities involved, but also provide easy to understand examples such as the Halloween problem. Overall, I feel that the paper is more of a dissection rather than a summary of the internals of a database system, but is a useful read for those in the database field.


Review 17

The paper discussed about a general overview of architecture of a relational database management system (RDBMS). A typical RDBMS includes components like process model, storage model, query processor, and transactions control and recovery. The process manager manages and schedules the various tasks in the system. There are different design choices in designing the process manager model. In a simplified model, the paper introduces how process manager is implemented in a uniprocessor model. In this regard two main options are discussed. The first option is the “process per connection” which forks new process rather than threads per user database request. This option doesn’t scale well as processes are heavyweight entities with sizable memory overheads and high context-switch times. The other option is “server process” which contains a single multithreaded process with a dispatcher which keep a set of threads and listen to incoming request and assign incoming SQL commands to idle threads. The paper described this options as the most natural architecture for efficiency during that time.
While a uniprocessor model is good platform to explain the process model, most of the processing platforms have multiple CPUs. The paper discussed various models in this category including shared memory, shared nothing, shared disk, and NUMA model. A shared-memory parallel machine is one in which all processors can access the same RAM and disk with about the same performance. Where as, a shared-nothing parallel machine is made up of a cluster of single-processor machines that communicate over a high-speed network with no mechanism of directly accessing the memory or disk of another processor. The shared disk model allows sharing disk but not RAM and the NUMA architecture is a shared memory model where some processors take longer time to access RAM than the other. The paper mentioned that marketplace at that time supports a mix of Shared-Nothing, Shared-Memory and Shared-Disk architectures. In addition, it mentions that the Shared-Nothing model has a good price-performance for running complex queries on very large databases, whereas the other two typically perform better for processing multiple small transactions.
The other main component discussed in the paper is the storage model. This model has two options. One of them is for the DBMS to directly interact with the Disk and the other option is to use the services provided by the operating system. In order to have a good performance DBMS should place database blocks on the disk so that queries can access data sequentially which is much faster than random access. In addition, DBMS should control when to write/read database block to the disk by using buffering. While the paper revealed that modern os filesystem offers reasonable spatial and temporal control to the DBMS through allocating a single large file in the filesystem on each disk, and let the DBMS manage placement of data within that file via interfaces like the mmap suite, it also assert that raw disk access remain prevalent as it provides explicit performance control to the DBMS.
Furthermore, the paper discuss about query processing component. Essentially the task of the query processor is to read a declarative SQL statement, validates it, optimizes it into a procedural dataflow implementation plan, executes that dataflow and pulls the resulting tuples. Also the transaction control and recovery is another important component. The transaction control is implemented by guaranteeing the satisfaction of ACID properties. The paper mentioned that modern DBMSs implement Isolation via locking and Durability via logging; Atomicity is guaranteed by a combination of locking (to prevent visibility of transient database states) and logging (to ensure correctness of data that is visible).
Generally the paper has a good general overview of the main components in RDBMS including various options and their significance in performance and consistency issues.



Review 18

Anatomy of a Database System

In this paper, architectural fundamentals of modern database systems are brought up and discussed with a few advanced topics. At the beginning, two assumptions are made:
1. memory overhead of each thread is small and context switch is free.
2. uniprocessor hardware.
These two assumptions are relaxed in later discussion.

First, process model and hardware architectures shall be selected from multiple options. They have profound influence on the system to build on top of it. We have a lot of options here, some of them does not scale very well, while some of them do but are tricky to debug. General rules of resource deployment are discussed here in process model, memory model selection and so on.

The next key aspect is about storage model. It determines the extend to which DBMS can control the storage in both space and time. This lead to the topic of buffer management.

As we see from the discussion, a lot of system supports are replaced by some customized design. But as OS support for DBMS improves, some commercial practices start to shift.

Then the author stand on a query's point of view and explain how a query is processed. It involves parser, re-writer, optimizer, compiler and executor. Same here, we have a lot of algorithm as design options in each part.

This paper provides a great overview of the database system starting from architecture aspects and to key query processing modules. Overall, the idea mentioned in this paper are still important guidelines of the field.


Review 19

This paper aimed to inform people about the architectural design of modern database systems. In modern DBMS, there are mainly four components: process manager, query processor, transactional storage manager and shared utilities. For each component, this paper discussed what the component is, as well as several design choices can be made for that component. (The chapter about shared utilities is cut from the article.)
1) Process manager
After giving three basic models (process per user, server process, server process + user process) for simple uniprocessor machine, different configurations regarding DBMS threads and parallel models are discussed.
2) Query processor
A comprehensive description about each stage of query processor, parsing, rewrite, optimize and execute, is written. One can understand the general procedure of a query processor very well after reading this.
3) Transactional storage component
This part discusses how ACID is achieved using locking and logging.

The first contribution of this paper is that 1) it described how modern database systems are organized, and 2) what is the job for each component and how these jobs are done. This gives a clear picture of what DBMS is and what decision should be made to design a DBMS.
Another contribution is that it shows modern designs of DBMS and gives reasons for why they are designed in this way instead of some alternative. For example, in the process management chapter, it gives a detailed list of how a modern BDMS maps DBMS threads into OS process. Examples like this made this paper closer to modern design, not just theory in textbook.

This paper provides an anatomy of DBMS. Each components and relationship between components are described clearly. I didn't find any weakness while reading.



Review 20


This paper introduces DBMS in an architectural perspective. It focus on system design, trying to give a overall context for related literatures. The author divide a typical DBMS into four main pieces, a process manager, a query processor, a transactional storage manager and some share utilities, and explains each one in a separate section.

Basically the paper introduces modern DBMS in the following four sections (there seems more):

Process Model and Hardware Architectures:

Modern DBMS uses both "Process-per-user" and "Server-Process" models, while the latter is more efficient but also more complex. These models are based on assumptions of high-performance OS threads and uni-process hardware.

Early DBMS circumvented need for OS thread support by emulate it using DBMS threads. Current commercial systems evolves with OS and has different ways to maps DBMS threads to OS dispatch-able units.

Modern DBMS has three architectures, “Share RAM”, “Share Nothing” and “Share Disk”, to utilize multi-processor platforms. Among these “Share Disk” is increasingly attractive since DBAs don’t have to partition tables onto different nodes.

DBMS also has an admission control to limit the workload to avoid thrashing.

2) Storage Models:

DBMS has two basic option for persistent storage: directly manipulate device, or build on OS file system. DBMS has its own spatial advantage when directly store since it understands its workload, but new technologies like RAID diluted this advantage. And for temporal considerations, in terms of correctness and performance, direct manipulation is better. But development of commercial filesystems support database storage considerably well.

3) Query Processor:

The query processor first parse the query and check for authority. Then rewrite it with semantic optimizations etc.

Then a query optimizer is there to produce an efficient query plan, mainly described in Selinger’s paper as a preliminary research. SQL also supports query preparation to hide the overhead of parsing, rewriting and optimization. But DBMS has to recompile while the system evolves.

The next part is the executor which employs the iterator model. It is a concatenation of iterators with one output be the input of the other, thus couples data flow with control flow.

4) Transactions: Concurrency Control and Recovery

Database systems tries to provide “ACID” transactions with locking and logging. And DBMS has a more light-weighted lock - “Latch”, whose deadlock should be avoid. And the author briefly introduced four levels of isolation.

The logging system uses a write-ahead-logging protocol, and in order to maximize efficiency, operates in a “Direct, Steal/ No force”.



Review 21

The aim of this paper is to understand the architecture of a typical database system with emphasis on specific fundamentals that support relational features. It also specifies how the database system needs to work cohesively with the operating system. The authors begin with a simplistic approach and add features as they go in order to explain more complex theories.

The paper goes on to explore parallelism in OS to work with database systems. The markets support a mix of Shared memory, Shared-Mothing and Shared -Disk architectures. The observations are that Shared-Nothing is cost-effective for running complex queries on very large databases whereas the remaining two perform better processing for multiple small transactions. The paper specifies that the Shared-Nothing architecture is a niche market however with the current advent of big data and companies wanting to get into the field, this can become a bigger market than it is.

DBMS storage managers require sequential access to disk blocks for queries to access data. As an alternative to raw disk access, if a very large file is allocated on an empty disk, the offsets in that file (also applicable to virtualized storage systems) corresponds to physical proximity of storage regions.

The DBMS must control when data gets physically written to the disk since the commits to the physical storage system must be preceded by the log writes. One of the issues that the paper highlights is the redundancy in buffering mechanisms between the DBMS and the OS. This can result in a huge memory waste as specified in one of the other papers “Operating System Support for Database Management”.

The next important part that the paper talks about is the database catalog. The idea that the catalog is stored in the same format as the data, the system is more compact and a lot easier to use. The authors insist that the code and language reuse is an important lesson and must not be overlooked in early stage implementations.

Query rewrites happen on account of multiple reasons including view rewrite and constant arithmetic evaluation. In the paper, only System R and INGRES query planners have been mentioned and it might be a slightly skewed view in comparison to other query planners that may exist. There also seems to be discrepancy about the fact that interpretable query plans might be a mistake since as the authors mention that they are used on a different level of abstraction in order to enable cross-platform portability.

The authors seem to be slightly biased against MySQL engine optimizer while talking about it in relation to randomized query optimization heuristics. I believe that a little more in-depth description would have been apt with regards to the engine.

Iterators are the part of the database system that couple dataflow with control flow. One of the main advantages about the iterator that is highlighted is that support for parallel query execution can be provided with no changes to the iterator model or query execution architecture by just encapsulating parallelism and network communication within special exchange iterators.

The Halloween problem is specified as a significant problem as well however, as long as a clustered index is being used or a non-clustered index just based on the salary is avoided, we should ideally not encounter the Halloween problem at all.

Overall, this paper provides an idea about the architecture required of database systems and a deep dive into specific parts that are significant in terms of performance and memory.



Review 22

This paper mainly talked about the building block components of relational database in four parts: the overall architecture of DBMS processes, storage issue, query processor, Transactional storage manager and shared utilities and the interaction between those parts.
Earlier DBMS system started with using only uniprocessor hardware, and there are three main process models: process per connection, server process and server process + I/O process. The last one provides dispatcher thread, pool of worker thread and asynchronous I/O. As the Disk I/O buffer is used and the client communication buffer enqueuing the fetch results for fetch results, the last model improves performance of DB and reduces race condition.
Historically those advanced features are not available on the os thread packages, many commercial DBMSs use their own lightweight DBMS threads. And often different company choose their own dispatchable units, it can vary from a single process to multiple DBMS threads to one OS thread to one DBMS thread.
And three parallelism scheme are shared memory, shared nothing and shared disk. The last one is the most widely used one since it take advantage of a distributed lock manager facility. Admission Control is also used in DBMS to avoid thrashing by control the number of connection as well as the number of queries being executed. In terms of the storage model, the raw disk manipulation would be the better option to achieve high performance than utilize natural OS support.
Query processor can parsing and authorising query statements. And it uses catalog to keep the metadata of all data, which can be also used to rewrite the query to improve performance. Then the processor would use the optimizer to generate an efficient query plan before executing each query. The most common use case is to use heuristic search schemes to optimize the search algorithm. Modern DBMS support intra-query parallelism to speed up single query. They are also supporting UDT and UDFs, and some of them even provide auto-tuning options. Prepared queries are often used in modern DBMS, but only low end product re-optimize prepared plans, because the high end ones always put predictable performance at the first place. Often the executor uses an iterator model to access tuples, and in this way it can easily run in parallel. As for data modification statements, the plans are always more complex and inefficient.
Transactional storage manager are consists of lock manager, log manager, buffer pool and access methods for organizing data in disk. The lock manager helps keeping atomicity and isolation. Most BDMS enforce strict 2-PL to keep serializability. And the log manager also helps keeping atomicity and durability by adopting the write ahead logging protocol. Indexes are physical structures for accessing data in database. latching in B+ trees and predicate locks like next-day locks help keeping the concurrency among the range indexes.
Some thoughts after reading this paper are, since the relational database are hiding the optimization options inside the query optimizer itself and only a few DBMS are supporting auto-tuning in this part, it would be helpful if some tools like human assisted optimization can be provided to end users to further improve the performance.



This paper mainly talked about the building block components of relational database in four parts: the overall architecture of DBMS processes, storage issue, query processor, Transactional storage manager and shared utilities and the interaction between those parts.
Earlier DBMS system started with using only uniprocessor hardware, and there are three main process models: process per connection, server process and server process + I/O process. The last one provides dispatcher thread, pool of worker thread and asynchronous I/O. As the Disk I/O buffer is used and the client communication buffer enqueuing the fetch results for fetch results, the last model improves performance of DB and reduces race condition.
Historically those advanced features are not available on the os thread packages, many commercial DBMSs use their own lightweight DBMS threads. And often different company choose their own dispatchable units, it can vary from a single process to multiple DBMS threads to one OS thread to one DBMS thread.
And three parallelism scheme are shared memory, shared nothing and shared disk. The last one is the most widely used one since it take advantage of a distributed lock manager facility. Admission Control is also used in DBMS to avoid thrashing by control the number of connection as well as the number of queries being executed. In terms of the storage model, the raw disk manipulation would be the better option to achieve high performance than utilize natural OS support.
Query processor can parsing and authorising query statements. And it uses catalog to keep the metadata of all data, which can be also used to rewrite the query to improve performance. Then the processor would use the optimizer to generate an efficient query plan before executing each query. The most common use case is to use heuristic search schemes to optimize the search algorithm. Modern DBMS support intra-query parallelism to speed up single query. They are also supporting UDT and UDFs, and some of them even provide auto-tuning options. Prepared queries are often used in modern DBMS, but only low end product re-optimize prepared plans, because the high end ones always put predictable performance at the first place. Often the executor uses an iterator model to access tuples, and in this way it can easily run in parallel. As for data modification statements, the plans are always more complex and inefficient.
Transactional storage manager are consists of lock manager, log manager, buffer pool and access methods for organizing data in disk. The lock manager helps keeping atomicity and isolation. Most BDMS enforce strict 2-PL to keep serializability. And the log manager also helps keeping atomicity and durability by adopting the write ahead logging protocol. Indexes are physical structures for accessing data in database. latching in B+ trees and predicate locks like next-day locks help keeping the concurrency among the range indexes.
Some thoughts after reading this paper are, since the relational database are hiding the optimization options inside the query optimizer itself and only a few DBMS are supporting auto-tuning in this part, it would be helpful if some tools like human assisted optimization can be provided to end users to further improve the performance.



Review 23

This paper introduces database systems architecture which has four main pieces: process manager, query processor, transactional storage manager and shared utilities(not included).In each part, it introduce many options to build such module and analysis the pros and cons.

First, it talks about process models and hardware architectures. It talks about many options for DBMS process models. It begin as a simplify condition: lightweight(high performance) thread and uniprocessor, and mention three different process model: process per connection: one process per connection, bad scalability as heavyweight process; server process : a single multithread process host all activity, but meet race condition problems; server process+I/O process: have one process for each device to make all devices doing I/O in parallel. Then it talks about when no high performance OS thread packages. And then talks about multi-CPU. It introduced three hardware architectures: shared memory(share RAM and disk), shared nothing(every processor has its own RAM and disk) ,shared Disk(share disk but separate RAM) and NUMA. In the end, it talks about admission control to dealing with “thrash” in OS when many active pages will result many loading to memory. There are two ways: one to limit number of connection, another is to estimate each query’s resource requirement and to limit tasks.

Then, it talks about storage models which is the choice of storage interface to use. There are two options: one is directly interact with device driver and another is via OS filesystems. There are two factors needed to consider: spatial control(control accessing the storage), temporal control(control when to write to disk). It also talks about how DBMS manage the buffer which I think is little similar to OS virtual memory control.

Then move to Query Processor. Given a SQL statement, it first parsed to internal format, check the table reference, authorization and constraint checking. Then query rewrite module simplify the query for later optimization. Then optimizer will produce an efficient query plan. Then executor will in charge of executing the query plan. It also talks about how the DBMS manage access to different data structures.

At last, talks about components in transactional storage manager and intertwine among fist three. Frist is lock manager which manage all the locks for data pages. Then the log manager which responsible for maintaining log records on disk to provide atomicity(undo) and durability(recovery) and some strategy to improve the performance of logging like stealing and not forcing. The author exclusively talks about locking and logging on index data. In the end, it talks about why these three component intertwined with each other.

The paper goes through every small component in the DBMS and analysis ways to implement these component. Each part start from a basic or easy implementation method to complex and common used method. If there are more than one way to implement such component, the author will explain companies' choice from the performance, prices and customer's aspect. For example, IBM seems have more high-end customer and cares more about performance and sometimes rely on good DBA. And Microsoft likes more user-friendly character.


Review 24

Hellerstein and Stonebraker describe the parts that make up a DBMS in this paper. They split up the paper into three sections: the process manager, the query processor, and the transactional storage manager.

The process manager takes care of multiple connections to the database simultaneously. When multiple users are connected, the DBMS can choose to solve this problem in one of three ways: one process per connection, a single server process that gathers all of the connections, and a multithreaded server that issues multiple I/O requests at once. They also go over the physical structure of the system. DBMS can have one of three types of layouts: shared memory, where the disk and the ram are shared among all of the processors; shared nothing, where nothing is shared; and shared disk, where the disk is shared but the memory is not.

The query processor takes a series of steps to convert a query into a set of instructions for the DBMS to execute. First, it authorizes the query to make sure all of the syntax is correct. Then, it rewrites the query to remove redundancies and make the query faster. An optimizer then lays out the steps needed to efficiently complete the query. Finally, the executor performs the query and returns the result.

Finally, the transactional storage manager makes sure that each transaction is atomic, consistent, isolated and durable. In other words, transactions should be “all or nothing”, should not be able to read what other transactions are doing and should leave lasting changes in the database when committed. It does so by using locks on multiple levels.

Even though this paper does not present a new concept, one of the positives of this paper is that it covers the modern techniques used in DBMS as well as how these techniques have evolved over the years. It also goes over all possible configurations and corner cases for some of the concepts. However, it does not go in depth about how modern query optimizers work.



Review 25

This paper provides readers who have a working knowledge of database and operating system principles with a summary of the components that make up most proprietary DMBS applications, the challenges faced in designing each of these components, and the merits and drawbacks of several different implementations of each component. The authors state this paper can provide better context and understanding of many design principles that are commonly known among database researchers and professionals, but that many textbooks and classes fail to cover.

One of the main strengths of the paper was its discussion of how different implementations of DBMS components provided advantages in certain use cases but disadvantages in others. There is no "one size fits all" solution. I found this discussion with regards to parallelism and memory coordination particularly interesting. For example, a shared nothing architecture may be the best fit for a business that needs a great deal of scalability at low cost. However, shared nothing schemes require an experienced DBA and are prone to partial failure. A business that needs consistency rather than high availability from their DBMS might opt for a shared disk system, which does not suffer from partial failure, since each CPU has access to the entire database. Shared disk systems do not scale particularly well (likely due to the bandwidth limitations inherent in bus-based cache coherence protocols), but this may not pose a problem for a small business.

I also found the discussion of next-key locking to be very interesting. Although it was only a small portion of the paper, it emphasized that physical data objects could act as surrogates for logical concepts in some cases. It was a reminder that an implementation can often be extended to handle more complex tasks with minimal additional logic if researchers are willing to consider problems carefully from several different angles.

One thing I would have liked to see from the paper was more coverage of how current ideas evolved from past failures. I found the evolution of ideas in What Goes Around Comes Around very helpful and would have liked to see a few examples of early attempts at transaction protocols or query engines rather than just reading a description of the way things are now.


Review 26

This paper talks about the main architectural aspects of modern database systems, including process models, storage models, query processors and transactions. It points out that the lessons of database systems architecture are not widely known for two reasons. First, the applied database systems community is small. Second, the textbook presentation of database systems usually focuses on algorithmic and theoretical issues instead of architectural issues. Therefore, this paper put emphasis on the architecture of database systems.

A typical database system has four parts: process manager, query processor, transactional storage manager, and shared utilities. In the following, I will talk about the key points of these four parts mentioned in the paper.

Based on the uniprocessor assumption, there are three process model options: process per connection, server process and server process + I/O processes. Process per connection is the simplest one, but does not scale well. The server process model with asynchronous I/O allows the DBMS to issue a read or write request, and work on other things during the I/O request. When the uniprocessor assumption is relaxed, parallel hardware is important in server situations. The paper mentioned three kinds of machines: shared memory, shared nothing and shared disk. In shared-memory machines, all processors can access the same RAM and disk; a shared-nothing machine is a cluster of single-processor machines; a shared-disk machine is one in which all processors can access the same disks, but unable to access each other’s RAM. One key of advantage of shared-disk system is usability.

The next section talks about storage. The DBMS have to control where on the disk should lie, and when data get physically written to the disk. For spatial control, sequential access is much faster than random access. For the temporal control, correctness and performance are the two main issues.

The query optimizer is a key point of performance for database systems. It will parse the query and produce an efficient query plan. In most systems, queries are broken into SELECT-FROM-WHERE blocks. Last topic in the paper is transactions. Four components of transactional mangers are lock manager, log manager, buffer pool and access methods.

To sum up, this paper points out the importance of the architecture of database systems, which is rarely covered in the textbook. The typical architecture includes process manager, query processor, transactional storage manager, and shared utilities.



Review 27

This paper is a history of database management systems (DBMS) to make it more well known for the purpose of educating future students. Similar to the first paper we read (“What goes around comes around”) this paper was written because history of DMBSs is not widely known and in order to foster better future development it should be more widely known.

One thing that I really liked about this paper was how they compared certain parts of a DBMS to what a standard OS does. I just took an OS class so these comparisons hit home with me and made it easier to understand what the need was for the DBMS and how it was implemented. This was especially relevant in the “Process Models and Hardware Architectures” section.

I also think the paper did a good job of using diagrams throughout to help demonstrate what it was explaining. This is something that is very helpful to understand what is going on and I don’t believe it is utilized enough in some papers. I learn a lot more looking at a properly made and described diagram than I would reading just plain text. They strayed away from diagrams the further into the paper and thus could have had better use of diagrams but I still believe the use of diagrams was good.

The last main strength of this paper I will mention is that it did a really good job of condensing years of upgrades in various subjects into concise sections. A paper with this much information could have easily gone on for a lot longer but they did a good job of summarize and highlighting the main points and implementations.

One downside I noticed from this paper is it doesn’t touch on any current problems or areas that are currently being developed on. It did a good job talking about history and what the need was and how it was solved but it failed to mention the current needs that are being solved. Certainly there are parts of DBMSs that are still being worked on (likely in every section) and I would have liked each section to end with describing some current problems being worked on for that area.



Review 28

This paper, like the other broadly scoped Stonebraker paper, provides a broad overview of the architectural necessities in facilitating a relational database. These include components of a DBMS that handle operating system-specific methods of concurrency and threading, or even DBMS-level abstractions of handling multiple activities, storage constructs, query optimization, and transaction management. It is important to understand why and how certain implementations for RDBMS behaviors should be in place, and this paper attempts to provide a thorough review of such explanations.

Most of the writing branches into process models, storage models, query parsing, transaction management, and the sub-systems that comprise each of these machinations. Given that many modern processing machines can handle multi-core computing, and many industry-standard servers have relatively low costs for relatively high powered hardware, DBMS softwares are pointed out to be scalable and portable across many different platforms. This may be a side effect of a main point in the other Stonebraker paper, where industry-driven standards (i.e. the demand of modular, scalable systems) drive requirements for technical development. The authors of the paper make a good effort to describe how a DBMS should handle many different potential process and storage architectures. On a related note, it was interesting to me that some DBMS’s were intentionally built for specific operating systems to exploit certain features for efficiency or optimization (rather than inconvenience).

While some parts of the paper were dense and difficult to fully comprehend, there was a thorough explanation of the mechanisms behind how a DBMS can handle the implementation for relational data. One thing I could have liked to see in this paper was a more thorough description of load-balancing procedures across distributed systems, and how DBMS-level infrastructure is constructed to handle such events. Many companies’ data ingestion pipelines take this into critical consideration, as it can save tens of thousands of dollars or more on computing costs (i.e. if they are buying time on AWS servers). This was actually the case at my last internship, where another intern and I were working on a way to create a more cost-effective load-balancing scheme of their db/computing servers.




Review 29

dfjnjdzcnvx


Review 30

This paper basically captures the main architectural aspects of modern database systems of a relational database management system (RDBMS), without the extension features. Initially, the writers were concerned by the fact that there were not many discussions regarding the architecture of DMBS, as in “what makes a DMBS tick” and the development/innovations in database systems. The paper points out two reasons: (1) the size of database systems community is fairly small and (2) architectural issues is not the main focus in database academic treatment. Knowing that, this paper focuses on the system design of a DBMS and the issues that are not typically discussed in textbooks.

In this paper, a typical database system could be divided into 4 pieces: a process manager, query processor, transactional storage manager, and shared utilities. For each component of DBMS component, the paper explains the type of implementation that are usually found/practiced, the pros and cons for each type, and the standard practice that is implemented by commercial RDBMSs.

I think this paper is quite thorough in explaining the DMBS components, especially the pros and cons of an Also, since DBMS is as system, it concerns both logical and physical component so I like that this paper takes both software and hardware perspective. We get to see and understand that performance optimization is not only about the data model or the query structure, but also depends on the architecture of the database system itself. Understanding the correlation between components is a good basis for database tuning.

Another strong point of this paper is that it dedicates one section to discuss the standard practice. Sometimes after reading so many implementation alternatives, it makes us question which one is the best. Is it only one of so many or could it be a hybrid solution? Explaining standard practice helps us see what are being done now and why (if there are limitations in doing the best implementation).

However, I feel that it would be better if we can see how all this component work and interact with each other as one DBMS system (since I only have 42 pages with me, I do not know if the writers discuss this in latter chapters). We also know that in order for a DBMS to function, it needs OS as the foundation. How does this architecture interacts with OS architecture? These things are not explained in the paper.



Review 31

The authors’ goal in this paper is to educate the reader on the main architectural aspects of a DBMS since, as they state, these structural innovations and lessons are not frequently taught in classes or covered in textbooks and are known only by the niche community responsible for the developments and innovations. The authors believe that giving a structural context to the typical algorithmic and theoretical textbook discussions will allow the reader a deeper understanding of database management systems as a whole.
I think that the structure of this paper was very thoughtful. The authors begin by making some strong assumptions to get at the basic models for processes and then they relax those assumptions to show how those models are modified and how workarounds are implemented in the “real world” when the assumptions don’t always hold. As the authors work through these different models they also offer examples of real-world applications for the different systems, for example a Shared-nothing parallel machine is most useful in decision-support systems for data warehouses. The authors work to describe at a high level the different components of system infrastructure and how the components interact rather than delving into very low-level details of a specific component. This allows the reader to glean a more complete sense of the structure of a database without getting bogged down in the details.
One weak point of the paper (as with the paper last week we’re still looking at a survey paper, rather than a research-based one), was the fact that the sections of the paper weren’t as interconnected as they could have been. There was some redundancy in explanation, which I recognize could be good for some readers, but it appeared to be the result of disorganization rather than a reminder to the reader of a previously-explored concept.



Review 32

The purpose of this paper is to describe the main architectural aspects of modern DBMS, with some detail discussion of some advanced implementation topics that are ignored by textbooks. It divides DBMS system architecture into 4 major components: 1.Process Manager, 2. Query manager, 3. Transactional Storage Manager, 4 some Shared Utilities. Each of these is decomposed to a series of sub-modules and is discussed in details. However, the paper does not strictly follows the structure of the components. Instead it decompose the Transaction Storage manager to section 3: storage model and section 5: Transaction to first give more detailed introduction to the storage model (without concurrency control) and then introduce the concurrency control with storage model in a retrospective fashion to connect these two concepts in-depth yet more comprehensible.

A detailed summarization of this paper:
1. Process Manager
The paper first introduces the Process Manager with a simplified OS abstraction to help readers understand the different among the Process per connection model, server process model and sever process + I/O process model. With each model, it discuss its advantages as well as its drawbacks. Then it continue the introduction by extend the simplified version to increasing complicated and realistic models: it first discuss the thread mapping between DBMS and OS in details with the usage of different kind of threads in DBMS; it secondly discuss the changes need to be made for the uni-processer model for difference hardwares, including shared memory, shared nothing and shared disk architectures. At the end of this section, the paper introduces the admission control and then comes to a summary of standard practice in DBMS.

2. Storage models:
The paper started this section by comparing the two options of persistent storage interface (DBMS control it by itself or use support by OS) to use in both space and time. It comes to a conclusion that with some support of the OS (hooks), DBMS can use OS file access in a more efficient way. After that, the paper introduces with implementation details about the buffer management and then ends this section with a standard practice of storage models in DBMS.

3. A Query processor
The paper decompose the query processor into 5 parts: 1) Parsing and authorization 2) query rewrite (a part of the query optimization in my opinion), 3) query optimization, 4) Executor, 5) access method. For each of these parts, it describes typical implementation methods in details, which gives its reader a comprehensive idea of how it works.

4. Transaction
The paper introduce this section by first declare the ACID concepts to help its reader to understand the importance of transaction control. Then it introduces the Lock manager, which used to mange the transactions (external); and then the latches, which used for concurrency control for DBMS data structures (Internal). Then it introduces the Log manager, which used for database recovery. After that it introduce the locking and logging in indexes, with some detailed standard practice for implementations (next-key locking). Then it connects this section with the previous introduced storage models to give its reader an overall picture.


Strengths:
1. This paper gives a comprehensive overview with practical implementation details of the DBMS systems, which helps its readers to have an overall idea of DBMS architecture as well as enough information for the major components.
2. For the Process manager part, the paper start with a simplified version of abstraction to introduce the basic models and then extends the abstraction to more sophisticated ones to meet the reality. This way helps its reader to better understand the materials.
3. Not only does this paper gives detailed information of the introduction, but also it suggests some extensive reading materials for more details for some complicated topics, which make it very concise and helpful.
4. For some of the best practice (e.g. surrogate), it not only introduces them as a DBMS technique, but also defines them as general tricks for system design with a clear recipe of when and how to use them.

Weakness:
From my point of view, I think this is a great introduction for DBMS system design. I didn’t find weakness worth to write.



Review 33

This paper introduces the architecture of modern database systems by presenting the design and innovative techniques in each component. A modern database system has four main pieces: a process manager that encapsulates and schedules the various tasks in the system; a statement-at-a- time query processing engine; a shared transactional storage subsystem that knits together storage, buffer management, concurrency control and recovery; and a set of shared utilities including memory management, disk space management, replication, and various batch utilities used for administration.

The process model are fundamental to DBMS design. There are a few options including process-per-connection model, server process model and enhanced server process + I/O processes model. To achieve data communication across threads, disk I/O buffers and client communication buffers are used. The important OS features like high-performance thread libraries were only available recently. So in the past the DBMS implements their own light-weight thread facility. On multi-CPU systems, there are several models of organizing computation resources and achieving parallelism, including shared memory, shared nothing, shared disk and Non-Uniform Memory Access (NUMA). There is also an admission control component that limits the number of concurrent load to protect the system from thrash.

Another basic consideration when designing DBMS is the choice of the persistent storage interface to use. The spatial control is achieved by either accessing disk directly or storing very large files in os file system. Both of them ensures the locality of data, which is suitable for sequential access. The temporal control is implemented with buffering. The buffer manager organize the buffer pool as an array of frames. Disk blocks are copied and manipulated in native format in memory.
The SQL queries are executed by the query processor. A query is first parsed into internal format and examined for authorization. Once passed the check, it is rewritten into an optimized and semantically equivalent query. Then the DBMS further analyzes this query by consulting the schema and statistical information stored in catalog and comes out with an efficient query plan. At the end the query is executed through an iterator.

To achieve ACID property, the DBMS has a transactional storage manager. It contains lock manager for concurrency control. The lock manager implements serializability via two-phase locking and also provides light-weight, monitor like latches for exclusive access to database internal data structures. The transactional recovery is implemented with logs.