Review for Paper: 7-The Google File System

Review 1

Some corporate software environments have unusual requirements, which make it reasonable to develop infrastructure tools in-house. At Google, production code and research code require access to many terabytes of data with high reliability. This code often appends data to files instead of overwriting old data. Many files on disk are scanned sequentially rather than read randomly. Google chose to develop the Google File System (GFS) as a store that supports this work pattern.

GFS has many distinctive features: It splits data into fixed-size chunks, stores multiple replicas of each data chunk, routes data lookups through a single master node, and offers a “record append” operation that adds data at least once at an arbitrary location in the target file. The unusual design of GFS provides high throughput for large sequential reads and writes. Another key design goal is reliability in a distributed store with hundreds of unreliable disks and servers. High throughput is obtained by caching chunk locations in client nodes after initial requests to the master node, and by pipelining writes linearly across a chain of chunk replicas. GFS ensures reliability by keeping 3 replicas of each chunk by default and by prioritizing copying a chunk if one of the replicas is lost. HeartBeat messages between the master and chunkservers let the master determine when a chunk is corrupted or lost.

The GFS design offers many insights and ideas to the broader database and storage research community. Because other large businesses may share some of Google's needs, such as the need for affordable, highly-reliable storage of multiple-terabyte data sets, the GFS design may be useful to other groups besides Google. GFS is remarkable for the simplicity of its design, which uses replication instead of error correcting codes or parity for reliability, and stores all information about the file system and locations of data from each file in a single master server instead of in a distributed manner.

The GFS has some limitations, such as lack of support for POSIX features, high storage overhead, and specialization for record appends instead of small random writes. Small-scale users might be better served by a traditional POSIX file system. For data that is usually appended to, not overwritten, and where extra commodity disks are cheap enough to buy at 3x coverage, the GFS is a useful tool for working at large scale.


Query optimizers are an essential part of production database systems, because they make declarative queries and updates run efficiently. Since the foundational work on query optimization behind System-R, much research has explored ways to estimate the cost of alternative query plans accurately, and to enumerate the space of possible plans quickly. More recently, researchers have studied how to optimize queries for distributed databases, or for databases allowing user-defined functions.

The review article “An Overview of Query Optimization and Relational Systems” by Surajit Chaudhuri describes the query optimizer in System-R, before introducing the main components of an optimizer and how they can be improved. System-R is worth studying because its query optimizer influenced many later systems. For example, System-R improved query plans by accounting for “interesting orders” of query plan steps, such as whether intermediate outputs would be in sorted order. However, System-R took simplifying shortcuts in its optimization, such as considering only linear join arrangements, which allowed dynamic programming to be used to determine the optimal order for a series of joins. These limitations in System-R left the way open for later research into improvements.

The article explains many ways queries can sometimes be made more efficient. For example, unlike in System-R, multiple joins can be done in a “bushy” rather than a linear order, to save time at the expense of greater memory usage for intermediate results. In some cases, uncorrelated subqueries or views within queries can be unfolded to produce a single-block query, which is faster to evaluate than the original version. After many ways to rewrite a query are enumerated, a query optimizer can use statistical analysis of the columns being queried to estimate the number of tuples returned by each step of each query plan, and so estimate the cost of the plan. The plan with lowest cost, of course, is recommended.

Unfortunately, the article ignores some important aspects of query optimization, such as automatic index creation, and stored query plans. In production environments, frequently-used queries may employ stored query plans, to obtain an efficient plan without repeatedly paying for optimization. In such an environment, a query optimizer can use copious offline planning time to obtain a cheaper plan. The article discusses only the online case, where the optimizer must restrict its search to a small space to save planning time.


Review 2

The paper gives information about the Google File System (GFS) developed to cater data-intensive applications using a scalable distributed file system. The data processing needs of Google required a file system addressing all the usual goals such as performance, scalability, reliability and availability, but with different design philosophies. The assumptions for creation of this file system are –

• Inexpensive commodity hardware.
• Component failure is the norm rather than the exception.
• Storage by a system is usually large in size (Multi-GB files are the common case).
• Workloads consist of large streaming reads and small random reads.
• Large sequential writes are made which append data to files.
• Parallelism for clients should be maintained for read and write processes.
• High sustained bandwidth is more important than low latency.

Keeping the above in mind, an architecture is made which consists of a single master and multiple chunkservers (servers which contain files in unit called chunks) and is accessed by multiple clients. The master is responsible for metadata storage, management of files namespace, chunk creation, re-replication and rebalancing as well as periodic communication with chunkservers.
The paper sheds light on how the read and write processes go about between a client and a GFS. Writes are usually done by appending data to already created ones instead of overwriting them. The system uses the concept of replication of chunks, increasing write overheads but making the system fault tolerant. It also sustains data integrity by using checksums.

The authors have done performance tests, once using micro-benchmarks to determine reads, writes, record appends, storage and metadata performance theoretically. Then real world clusters are used to compare the results using one cluster designated for research and development by over a hundred engineers and other cluster dedicated for production data processing.

The paper has helped in highlighting a file system deviating from the usual norm, which can support large-scale processing workloads on commodity hardware. These are fault tolerant and are optimized for huge files and consequently helps in meeting Google’s storage needs. However, it is only viable in a specific environment and has limited security. It cant be adopted by large variety of applications as well since it lacks data consistency.



Review 3

This paper describe Google File System(GFS) designed to meet Google's internal data process demands a scalable distributed file system. GFS provides fault tolerance while running on commodity hardware and delivers high aggregate performance to large number of clients.
During the development of GFS there are some key observations that make the design different from previous file system design assumption. First the failure of component is normal not exception, since the commodity hardware is inexpensive and as the scale of the system is large. That virtually guarantee failures in either software hardware or operation. Therefore constant monitoring, error detection and automatic recovery must be considered.
Second files are huge compared to previous standards which makes I/O operation and block sizes have to be redesigned. Third, there are pattern about write and read. Most file are mutated by appending rather than overwriting, and files are only read and often only sequentially. Given those access pattern appending is focused in this implementation. Finally GFS is designed for specific applications, and relaxed some consistency model which increases the flexibility of design.
Why important?

A GFS cluster consists a single master and multiple chunk(block) servers. The master stores metadata is used to chunk location for a given range of a file. Unlike other file system, GFS does not have a per-directory data structure,, but logically represents its namespace as a lookup table mapping full pathnames to metadata. With prefix compression, this table can be efficiently represented in memory.
To avoid the single master becoming a bottleneck, GFS minimize the communication between clients and the master by reading ans writing file through the chunk servers. Moreover the lease mechanism is designed to reduce the overhead of master. The chunk size was chosen 64MB(which was large in 2003). Large chunk size reduces interaction between clients and master, reduces network overhead, and allows all the metadata to stored in the master. On the other hand some chunkservers may become hot spots if many clients are accessing the same file which is fixed by storing it with more copies.

Instead of the kind of use cases they are expecting, The performance section tests some benchmarks and describes typical workload. On the other hand the to what extent the inconsistencies occur is not showed in measurement section. However, it's still an important paper that give us a glimpse how "real world application" is aimed for, and promote the research toward this direction.



Review 4

This paper describes a new distributed file system called Google File System (GFS). It was designed to support large-scale data processing workloads on inexpensive commodity hardware, especially for research and development as well as production data processing within Google. GFS provides high aggregate performance to a large number of clients with a lot of fault tolerance. The largest cluster provides hundreds of TBs of storage across thousands of disks over a thousand machines, and hundreds of clients are accessing the cluster concurrently.
In this paper, the author discussed the file system designs, system interactions, master operations, fault tolerance and diagnosis, and measurements from both micro-benchmarks and real world usage examples.

The problem here is how to design a good distributed file systems to meet the rapidly growing demands from Google’s data processing needs, in terms of performance, scalability, reliability and availability. To achieve such goal, many innovations needs to be developed on top of the traditional distributed file systems. There are several points different from traditional methods, and we summarize it below:

component failures are the norm, instead of exception
files are huge by traditional standards
most files are mutated by appending new data, instead of overwriting existing data
increase flexibility through co-designing the applications and the file system API

The major contribution in this paper is that it provides a good description about GFS, as well as detailed discussion about the designing ideas and concerns behind it. There are a lot good innovations ideas in GFS design. For example, to achieve a high aggregate throughput to multiple concurrent readers and writers, the designer separate file system control from data transfer so that data transfer only happens between chunkservers and clients. Another example is about how to make a good design ideas out of workloads in reality. The designer realize that the workloads primarily consist of large streaming and small random reads, and large sequential writes, so they take use of atomic append method. This improve the performance a lot because multiple clients can append concurrently to a file without synchronization between clients.

Another design idea I found very interesting is how they take use of a large amount of cheap disks that might have a high failure rate rather than expensive disks. We can accept disk failure as long as it can be recovered without affecting system performance and the replacement is cheap.



Review 5

This paper describes the Google File System which is a fault tolerant, distributed file system specially designed for Google's workload. So, GFS is on some assumption especial to Google's workload such as:
-We just want to append to a file, rewriting is not supported.
-Almost all files are very big (multi-GB files).
-The goal throughput, not latency.

Main contributions:
-Files are divided into 64MB chunks and chunks stored in a cluster. Clusters have a master node and multiple chunk servers.
-All metadata operations are done in the master server and all metadata is stored in the memory.
-Every chunk has it's unique 64b id and a version number. Chunks are stored as files in the underling OS file system.
-Separate data and control flow for maximizing throughput.
-lazy garbage collection, that is, deleted files are not deleted immediately.
-peer-to-peer transferring, that is, master is not involved when data is being transferred between two nodes.
-most of operation are batched a executed at once instead of being executed synchronously. This helps increasing disk bandwith.


Flaws:
-Although I think they have done a really nice job for the goal they had, the assumptions they have are not true in general environments and this makes using GFS hard as a general purpose FS, in contrary to HDFS.
-the master can be bottleneck, although they have used many nice techniques to prevent master to become bottleneck, such as peer-to-peer transferring.
-lazy reclamation of deleted files space may prevent new files to be created.


Review 6

This paper is motivated by the fact that Google had to deal with data at a scale that traditional file systems could not sustain. Another issue was the fact that Google's workload involves very large files, mostly append only, and should be optimized for large read and writes. Google also decided to use large amounts of commodity machines, so failure becomes the norm, as opposed to the exception. They are less concerned about consistency, and so can relax some constraints. Google is also more concerned about high bandwidth, and less concerned with low latency. These very different requirements led Google to create a new architecture. Google decided to create a distributed systems, with one master and many "chunkservers". The files are broken up into chunks of some size, and then distributed across multiple machines. Replication is built in - by default, each chunk is replicated three times. Applications requesting to read/write to/from a file issue a request to the master, and the master responds with the chunk handle and chunk location. The application can then use these to connect to the correct chunkservers and get the data their looking for. The master keeps track of where all the chunks are by using a "heartbeat" communication.

The strength of this paper is that they were able to design a very specialized system, while keeping the system design as simple as possible. They kept the system design simple by using a single master, but made sure it had to do as little work as possible. They didn't complicate the system by adding in features that they didn't need - for example, they didn't need to add in POSIX compliance, because their applications would be custom made specifically to work with their file system.

It seems that the master is a serious bottleneck for this system. Regardless of how fast the machine can process request, each request does have to go through that machine, so it is absolutely a limiting factor. The fact that the master also has to continuously send out heartbeats and deal with the responses means that the master will have to do more work as more chunkservers are added. While the authors claim that the master is not a bottleneck, it would seem to be as the number of machines is scaled up.


Review 7

This paper is written by three Google engineers presenting the motivation and architecture logics of the Google File System. The idea is proposed because the traditional file system could not support the large distributed system well. Since the cluster has large quantity and limit quality of components, both internal failures like exception and the external failures like out of power, machine damages and even catastrophe happens all the time. The component failures are more norm than exception. The size of files to be stored too huge in traditional view to handle quickly and the operation to the file is more appending than overwriting. Also, a self-designed file system is more flexible for Google. Thus the GFS is designed as a scalable distributed file system for large data-intensive application.

The Google File System is required to have high fault tolerance, handle large size operations, including store, read and write, concurrently appending and high sustained bandwidth. To accomplish the goals, the architecture is very different from the traditional one. It has one master, multiple clients and multiple chunk servers. A file is divided into chunks, which is a file storage unit similar to block while is much larger. The master maintains the metadata and controls the system activities, while chunk servers handle the chunks. It use lease to ensure the consistency, pipeline data flow to ensure the fast data exchange, replication and garbage collection to improve the fault tolerance and also check chunk version number to detect stale replica. Also, the system always tries to balance the storage of chunk servers to improve the performance.

One of the drawbacks of Google File System is that it is originally designed for back end system. But with the wide usage of GFS, more infrastructures are needed to be improved for the users. Another problem is that GFS is based on Linux operating system, which has some weakness in disk and reader-writer lock related problems.


Review 8

This paper goes over the details of the Google File System, the architecture and organization of the system, different operations and management, key design decisions and how it performed so far. The paper does a good job explaining why certain choices were made, what are the possible bottlenecks of the system, and how those problems were resolved. The evaluation provides good insight of how well the system performs, but it lacks any quantitative comparisons to how it performs compared to other existing file system.

The paper provides interesting insights about the conditions of the data server workloads. It fightings that 1) component failures are norm and that high tolerance to faults are important. 2) files are huge in order of multi-GB. 3) random writes are rare, and it’s mostly sequential reads which was something I didn’t quite expected.

Google File system has a single master, and multiple chunkservers that contain several replicas. The client needs to ask the master to provide temporal access permission and the list of redundant chunkservers that contain the data the client wants. The existence of a single master makes it simple to implement and organize the system. The chunkservers can have multiple replica, and because the clients have only temporary access to the clients, the read/write operations can happen in parallel with high performance.

I was concerned that the single master would be a large performance bottleneck and a huge liability for faults and failures, but it seems like amount of requests that master has to handle is rarely a bottleneck, and master replication mitigates most of the problems of handling faults. However, I feel like the scalability might turn out to be a problem as data gets larger and larger.




Review 9

This paper proposes a new kind of file system used for google. It illustrates workload assumptions they made and the dedicated system design to them. The system is evaluated by working clusters in google itself and gets a seemingly good overall system performance.

Google file system(GFS) has a master server and many chuckservers. Files are divided into big chunks and stored in chuckservers with duplicate. Master server only stores meta data and all the data transfer is only between chuck servers and clients. This minimize the workload of the master. The data flow between chuckservers is the minimal path that goes through all replicas of a chuck.

The consistency model is designed relaxed to get efficiency, that is, to minimize the communication between master and client. There is no distributed lock mechanism but the master assigns “lease” to a primary replica. And the primary replica will assign the orders for this chuck mutation.

The failure mode is simple. GFS does not distinguish between normal termination or a system failure. Upon restart, each chucksever checks its own state by checksumming and the routine handshake with master. If its data is stale or outdated, it can request a copy from other replica. In addition, master sever is replicated for availability. The operation log and checkpoint is replicated on multiple device. On failure, there will be a shadow master working during recovery. But the shadow master only issue read-only request.

The design approach of GFS is really a bright point. The assumptions are made based on the
observation of its application workload. Then a dedicated file system is designed. In reality, there is few versatile solution. Learning how to balance the trade off and maximize the performance would be a important lesson from this paper.

This paper introduces a brand new file system at that time and brings a new thought to think how to design a good system. However, the paper it self verifies the performance of only on google’s working cluster. It would be more convincing if more generic workloads are tested.


Review 10

Problem: This paper describes how traditional assumptions for designing a file system did not apply in the environment that Google was working in. Google created the Google File System (GFS), which they optimized to be able to serve their client use cases, and to be able to take advantage of relaxed design constraints.

Contributions: This paper describes a distributed file system designed to serve high volumes of data at high throughput to many clients. Data is stored in blocks, which are distributed across many servers called chunkservers. The metadata of which files correspond to which chunks, and what chunk is stored on what server, is stored on a central master server. Files in GFS are typically very large (100MB or more), which means that there is very little metadata compared to data. This allows all metadata in the master server to be stored in memory.

The master server does not serve files- instead it points clients to chunkservers that serve the files. Each chunk is duplicated on several chunkservers, which allows higher throughput when serving the same chunk to multiple clients. This duplication also allows GFS to tolerate hardware failures, which it assumes are normal. When a chunkserver is lost, the master server directs the remaining chunkservers to re-duplicate the lost data which is still contained on other servers.

GFS optimizes file append operations rather than random write, and applications that use GFS are optimized accordingly. Optimizing file append allows higher write performance, as it allows GFS to service multiple file append operations concurrently, instead of having different clients hold mutual exclusion over certain shared parts of a file. GFS does this by guaranteeing that it will append the client’s data to the file, but gives no guarantee of the offset- this allows it to interleave concurrent requests.

Weaknesses: Overall I thought this paper was very strong, especially in its explanation of the assumptions that led to design decisions. However, I would have liked to hear more about why Google’s use case would result in bad performance on other distributed file systems, which was probably what prompted them to create their own.



Review 11

The paper explains the fundamentals of Google File System (GFS), which is the file system that is being used internally at Google (there is a next-generation file system, which is called ‘Colossus’ and is built upon GFS). GFS had inspired other distributed file systems, most notably Hadoop Distributed File System (HDFS). From the paper, readers can have a glimpse on the essence of what makes the state-of-the-art distributed file system. The current infrastructure of GFS after it has become ‘Colossus’ is not known since it is Google’s proprietary project, but I believe that data structures and algorithms described in the paper should be relevant even today.

The paper points out four key observations from their application workload and technological environment:
1) Component failures are the norm rather than exception.
2) Files are huge by traditional standards.
3) Most files are mutated by appending new data rather than overwriting existing data.
4) Co-designing the applications and the file system API benefits the overall system.
These principles actually highlight the characteristics of ‘big data’ and its applications.

A GFS consists of a single master and multiple chunkservers, which can be accessed by multiple clients. It is not difficult to see that HDFS has been influenced by this architecture. A HDFS cluster also has a very similar structure and it consists of a single namenode and multiple datanodes. A single master stores metadata about every file that a GFS cluster stores and responsible for managing chunkservers. A file is divided into multiple chunks and these chunks are then replicated on multiple chunkservers. Note that this is somewhat similar to block-interleaved distributed-parity disk array in the RAID paper. We can see that necessary principles from one area can be transferred to another.

In conclusion, GFS has inspired many other distributed systems and the fundamental of its design has remained the same. Readers should relate this topic to distributed database systems. Distributed database systems have been a popular topic for some time and Distributed database systems are likely to implement their own file system architecture or rely on existing file systems. Knowing the design principles of distributed file systems is very important in building efficient distributed database systems, as the key observation #4 has pointed out in the paper.



Review 12

This paper introduces the Google File System (GFS) as an alternative distributed file system specifically tuned to support large-scale data processing workloads. The file system was motivated by four basic observations:

1) Modern file systems are composed of lots of cheap components and are used heavily. Due to the number of components use as well as their cheap quality, some are expected to fail and possibly never recover.

2) Files stored on systems today are much larger than in the past when file system design first started.

3) Most files are changed by appending new data to the file rather than overwriting existing data. Furthermore, once files are written, they usually are only read.

4) Google also designs the applications which use the file system, so they can increase the flexibility of their overall system.

The paper reevaluates file system design based on these four observations and either changes the architecture of the system or better tunes certain parameters such as block size.

The GFS architecture consists of a single master, some backup masters, with multiple chunkservers. The master is involved in name space management, chunk placement, chunk creation, chunk replication, chunk rebalancing, and garbage collection. However, it does not allow client data to flow through it. In order to perform these operations, a client will query the master for the chunkserver, which currently holds the chunk as well as the locations of the duplicated chunks. Once the client is given this information, it performs all I/O operations through the chunk. This design prevents the master from becoming a bottleneck.

One of the important design decisions was to increase the block size of a chunk to 64 MB. In doing so, Google reduced the number of interactions between the client and the master. Furthermore, it reduced the network overhead since clients are more likely to perform multiple operations on large chunks. It also reduced the amount of metadata the master needed to store.

I feel that an important takeaway from the paper is the fact that traditional system design is not able to effectively support the needs of the modern world. An example is the fact that hardware failure has become the norm in distributed file systems, but traditional system design expected otherwise. To get around this, Google created a new system specific to the workloads their applications were designed to run. Other companies should ask themselves whether the systems they are using are outdated and need to be redone similar to Google. Although the GFS was designed to work specifically with Google workloads, it introduces new file system design ideas such as larger block sizes and error detection that may better support modern systems.


Review 13

This paper gives an overview of the design of Google File System. The GFS is designed to support large distributed applications. The system is designed based on observations of their workloads and tailored to meet the users' needs.

They have some observations and assumptions on their workload such as the files are usually "append-once-read-many". And in terms of writes, applications mutate files usually by appending rather than overwriting; as a result, small files and small random reads/writes are usually supported but not the main focus of optimization.

The GFS architecture are composed of a single GFS master and a lot of GFS chunkservers. Each chunk of data and replicas are stored in chunkservers. The master communicates with chunkservers through handshaking or heartbeat messages. In order to minimize the master's involvement in all operations so that the master does not become a bottleneck, they design the system such that:
1. Clients get metadata from the master and ask for chunk data from chunkservers.
2. Clients do not cache file data, however, they do cache metadata from the master for future mutations. The master can also send the information of chunks following those the client requested.
3. The size of chunks is set to be very large to reduce traffic between clients and the master.

As for the chunkservers, they also have some ways to improve efficiency such as using higher replication factor, and decoupling the data flow from the control flow to improve performance by using the network topology. A "copy-on-write" scheme is also used for making snapshots of files/directories by deferring copying the files/directories until a client wants to write to a chunck (reference count > 1).

Comparing to other file systems, one of the differences of Google File System is that it does not have "inode-like" data structures. The results are:
1. When modifying a file under a directory, it only requires a read lock on the directory and a write lock on the mutated file. Thus concurrent mutations in the same directory is allowed.
2. If users attempt to create two files with the same name under a directory, the requests can be serialized because the write lock on file name needs to be acquired first.
3. File names are used as a global ordering to prevent deadlock.

The main contribution of this paper:
1. It provides detailed discussion about the architecture of Google File System and how they design it to provide efficient reads and writes specifically for their workloads.
2. They also provide some parameters currently exploited in the Google File System, and show us the statistics of some of their real world clusters.

There is a few things that are not clear to me in this paper:
1. In their measurements for writes requests (in MB/s), the performance of writes does not drop (remains as 50%) when the number of clients grows. It is mentioned that their pipeline data propagating scheme does not work well so the write rate is only 50% of the network limit, but why is the write rate not getting worse since the collision is more likely for large number of clients?
2. In their measurements for of record appends, they show the performance drop for N clients appending simultaneously to a single file but claim that this is not a significant issue because it is more often that N clients append to M shared files. However, they do not provide experiment results under such scenario and prove that the chunkserver network congestion is really not a significant problem.



Review 14

The purpose of this paper is to detail the design of the Google File System, which is “a scalable, distributed file system for large distributed data-intensive applications.” The GFS manages to induce fault tolerance upon inexpensive hardware. The motivation for GFS is the need for a FS that takes into account application workloads and technological environment.

The main assumptions of GFS are:
1. Component failures are the norm rather than the exception (so constant monitoring, error detection, fault tolerance, and automatic recovery are vital characteristics of the system)
2. Files are expected to be very large (leading to radical changes in I/O operation, block sizes, and other design parameters)
3. Primary workloads are large streaming reads (hundreds of KBs to 1MB or more per operation) and small random reads (few KBs). Applications built with performance in mind sort small reads to advance steadily through file.
4. Files will be modified by appending new data rather than overwriting existing data
5. Flexibility of the system will be motivated by designing with both the applications and the file system API in mind (GFS’s consistency model is looser in order to simplify the file system without imposing many requirements on applications)
6. Efficiently implement well-defined semantics for multiple clients that concurrently append to the same file.
7. Processing large amounts of data at a high rate (high sustained bandwidth) valued over low latency

GFS’s interface has files organized hierarchically in directories and accessed through path names, with create, delete open, close, read, and write functionality. Unlike traditional file systems, GFS has snapshot (low cost copy of file or directory tree) and record append (ensures atomicity as multiple clients are allowed to append data to the same file concurrently) operations. Architecturally, GFS has a single master—which simplifies the design by enabling the master to use global knowledge to make replication decisions and clever chunk placements—and multiple chunkservers, accessed by multiple clients. Master minimizes involvement with reads and writes by assigning chunk server to client, and then the chunkservers store chunks (fixed-size portions of the file) on local disks and handles reads or writes chunk data specified by a chunk handler and byte range. Every chunk is duplicated on multiple servers for reliability. The master maintains all file system metadata (namespace and mapping from files to chunks, which are persisted in an operation log to prevent rom inconsistencies should the master crash, and the current location of chunks, which the master asks the chunkserver about its chunks) and controls system-wide activities like chunk lease management, garbage collection, and chunk migration. Clients communicate with the master to obtain metadata operations, and gives all communications that contain data to the chunkserver. Clients may request and the master can give multiple chunks for the request so client-master interactions may be minimized. GFS is also set apart because the client and the chunkservers (stored as local files, already in Linux’s buffer cache) do not cache file data, getting rid of cache coherence issues and simplifying the client and the system overall. Many GFS clusters are deployed with different purposes. The chunk sizes are large, 64MB, and lazy space allocation is used to avoid internal fragmentation. Advantages of this are reduced client-master interaction, reduced network overhead as the TCP connection is persisted while clients perform many operations in a given chunk, and reduced size of metadata stored on master, allowing for keeping metadata in memory. The disadvantage is the occurrence of hot spots, which occur because small files only take up one chunk, so chuckserves containing these files may become overloaded with many requests. Google solved this by storing those files at a high replication factor and making the batch-queue system stagger application start times. The long-term resolution may entail allowing clients to read data from other clients.

I found this paper very interesting and enjoyable to read because it teaches about what sets the powerful and efficient Google file system apart from traditional systems.



Review 15

This paper is about the Google File System which is a scalable, distributed file system for large distributed data-intensive applications. It is fault tolerant and runs on large amounts of commodity hardware. This paper takes a look at a system that has been developed and deployed at Google prior to the publication of this paper in 2003. This paper explains the assumptions of previous systems that were examined in the decisions for the redesign of their file system.

I think one of the most important parts of this paper are the assumptions about previous file systems. This is a very important part of research. The assumptions a research project makes define its limitations. When those assumptions are no longer true contributions to further that research can often be made. This paper gives a detailed description of it's system configuration, the assumptions they make for their application and an evaluation of their system. Two of the main contributions include expecting hardware to fail and providing fault tolerant and efficient mechanisms to deal with this as well as having a centralized single master to control other operations.

The assumptions the authors examine are that (1) a system built from commodity components will fail often and needs to be constantly monitored, (2) the system should be optimzed for large files rather than small files as large files are more common, (3) workloads consist of large streaming reads and small random reads, (4) workloads are optimized for appending files rather than random writes, (5) efficient synchronization of files is important for supporting many users, and (6) high sustained bandwidth is more important than low latency.

Although I do like the evaluation of the paper I feel like the results are presented in a way that is not easy to compare to other systems. I realize that they need many servers to set up the experiment but it may be hard for someone to get 35 computers to perform a similar comparative experiment. Also, it would have been nice to see an experiment comparable with traditional mechanisms to get a better sense of the magnitude of improvement, specifically for something like the result in Figure 3.


Review 16

Part 1: Overview

This paper proposes the Google File System (GFS) at 2003. At that time this was a great breakthrough in distributed systems. GFS is designed to cater Google’s needs at the moment when their customers demanded for high rate of bulk load operations but not care the latency for single read or write. According to the real world experiences, single node in a distributed system could often fail, customer files are often large, customer operations consist large streaming reads, small random reads and large sequential write. Also availability to a large amount of customers at the same time as well as bandwidth provided to applications are crucial to GFS design.

Master-slaves architecture is implemented in GFS. Master node takes care of metadata of chunks and plays a role of redirector for client application transactions. Master often uses in-memory data structures for fast responding. Consistency is also handled by the master node. However, GFS only guarantees a relaxed consistency and need some cooperation from the user applications. Chunk-servers, perform as the slaves, take care of the real data flows. The master node handles locking, replica placement, creation, replication, rebalancing, garbage collection, and stale replica detection operations. System interactions are designed to minimize the workload of the master node. Atomic record append and copy-on-write snapshot operations are provided for components of GFS to interact.

Part 2: Contributions

Realistic assumptions from the real industry. GFS assumes that single components of a distributed system could often fail. Detection and recovery processes should be done in a routine basis. This is really practical and valuable.

Lease mechanism can share the master node’s heavy workload with chunk-servers and at the same time keep the mutation order. Atomic record append is provided as a basic operation for GFS users.

Emulation results from real world transaction data are highly valuable. For example, master load could vary from 200 - 500 transactions per second. Also append operations dominate overwrite operations.

Part 3: Possible ways to improve

High availability is one of the main focuses of GFS. Master and chunk-servers use logs for fast recovery. Details are hidden beneath the surface. We all know that logs can be used for recovery from crashes. However how is GFS different or how fast can GFS recover from a crash?

GFS was born to meet Google’s needs, those assumptions including high bulk load transaction rate may not be valid to some smaller databases who care about latency more than availability. This may be a limitation of this paper.



Review 17

The paper discusses different aspect of design and implementation of the Google File System (GFS). GFS consists of a single master and large number of chunk servers and is accessed by multiple clients. Files are divided into fixed-size chunks and stored in the chunkservers. Each chunks are duplicated across several chunkservers, with the minimum being three times. The master maintains the file metadata including mapping from filename and chunk offset number, which is determined based on the fixed size of chunks, to current location of chunkservers. The GFS client communicates with the master to identify the chunkserver which contains a particular chunk in the file and then directly communicate with the corresponding chunkservers to read or write data. The master only involves at the beginning of the communication, once the client setup communication with the chunkservers, the master doesn’t involves. This avoid the master from becoming a bottleneck. In addition, the chunk size is much larger than a typical file system ( 64MB). A larger chunk size minimize interaction of a client with the master.

Another important design parameter is how the consistency of the file is maintained. GFS implements consistent mutation operation (e.g file creation) by using lease. The master grant a chunk lease to one of the replicas which become the primary. The primary chunkserver is then responsible to propagate the changes to other replicas. During the lease time no other chunkserver is granted access to the same chunk. The changes is saved by the master once all the replicas acknowledge. This allows an atomic operation required for mutation operation. In addition, since a relatively high component failure is expected in a low cost commodity servers, a good fault tolerance system is required. For instance, chunk replication allows GFS to have a high availability, and using checksumming by the chunkservers allows to detect data corruption.

The strength of the paper is that the solution proposed were based on realistic workload demands of Google. The paper discussed key characteristics of applications and infrastructure at Google which includes frequent component failures, huge file size and mutation by appending with new data rather than overwriting. This allows the decisions reached on various design parameters to have a clear perspective.

There are various design decision discussed in the paper which might have their own drawbacks. For example, having single master might bring a bottleneck in the system and also leads to a single point of failure. While it is not a bottleneck in the size of cluster considered during the design, as the size of the cluster become large the master may become a bottleneck and limit the scalability of GFS. Furthermore, the fact that GFS implemented in user space might lead to security vulnerability. Most operating systems including LINUX are highly optimized in protecting applications from a security attack. In addition, the claim of having simple disk system being cheaper than RAID is not apparent especially when considering the reliability provided by RAID disk arrays. It could have been useful if the authors provided some analysis on cost-reliability value of RAID as compared with simple disk system. Simple disk commodity system has no guaranty of reliability and as a result fails more frequently requiring additional commodity servers into the GFS system.


Review 18

In this paper, some key features of the Google file system has been discussed with fairly detailed discussions and satisfying results. The design is driven by their application workloads and technological environment. The system is a scalable distributed file system, where cheap hardware failures are expected and properly handled.

The system is composed of one master and many chunk servers. The master is only responsible for overall data placement management and request redirection. The communication to the master is important but light weight. Special design decisions are made in aspects like chunk size, metadata content and location, and some other tradeoffs like whether to support aliases for less complex locking mechanism. As an overall result, one single master is able to manage great amount of data (even with multiple replications, at least 3) within memory.

As noticed, this paper is written in 2003, but it still serves as a great guideline for large scale distributed systems. I like this paper very much. For multiple times, we see that simplifying the system a little bit by removing some of the features and raise the bar and make some assumptions on the applications is wise and sometimes make the system much more efficient. Always observe what customers you are going to serve and try to make something better rather than all of them is the key inspiration of this paper other than many technical ones.


Review 19

This paper describes architecture of the Google File System (GFS).
GFS is designed with a few different assumptions in mind:
1. Component failures are norm.
2. Huge files.
3. Large sequential reads and append writes will be majority in read/write operation.
4. concurrency append must be supported efficiently.
This paper describes a file system that takes advantage of all these assumptions. The result is a file system cluster contains multiple machines. This cluster will divide files into large chunks, typically 64MB, and uses a master-slave (master-chunkserver) model to manipulate them.
Each chunk is identified by a 64 bits handle and stored on chunkserver’s disk. For reliability, each chunk is duplicated and stored on several different chunkservers.
The master machine contains all the file system metadata: namespace, access-control information, file-chunks mapping and chunk locations.
To read the file, a client sends request to master, and master will send all the locations of chunks of that file to client. Client then requests each chunk of the file from nearest chunkserver that has that chunk.
To mutate a file concurrently, a client also sends a request to the master. The master will give a lease to a chosen chunkserver. Then client distribute new chunk to all chunkserver who has the chunk. The chosen chunkserver will determine the sequence of write for all clients writes to that file.
The GFS also uses many techniques to improve reliability and performance. It ensures that there will be a certain number of copies exist for each chunk at any given time, or it will duplicate that chunks.

Contribution:
This paper shows an innovative way of building a file cluster. Based on their commercial requirements, the authors created a file system using cheap hardware that is highly available and concurrent. I think this is the biggest contribution.
It also showed some interesting techniques such as leases and mutation order. This greatly reduced the need for lock and made the file append very fast.

The best part of this paper, I think, is that master doesn't store the chunk information permanently. It polls all chunkservers for this information instead. This greatly reduced the complexity of this file system.

Weakness:
This file system has a few weaknesses. First, it has the best performance only when the application has all its assumptions. Second, this system has at least 1 redundant copy for each chunk of data, which means it has lower user file capacity comparing to RAID.



Review 20

This paper discusses the distributed storage system used by Google and its corresponding applications. This system seems to be custom designed for the specific kind of usage that Google sees with its applications especially large read and writes.

One of the forethoughts in this system being designed is the idea that component failure will happen is not considered an exception but the norm because of which they have designed the system for high availability. The system is a master-slave configuration where the applications typically interact with the primary chunk servers after receiving their location from the master server. The master system typically stores only the metadata for the chunk servers. This organization results in a good example of a distributed system.

The Google File system also introduces the idea of record appending where multiple applications can append to a given chunk file system at the same time. The authors have defined the status of mutations of a given file in terms of it being defined and being consistent. I think the situation where concurrent successful mutations leaving the region undefined and consistent could have been explained better.

The GFS prioritizes chunk recreation and replication based on the total number of replicas a given chunk has at the time and live files over deleted files. I see the idea of prioritizing on the basis of usage of files at a given time as a great advantage.

The garbage collection is done lazily and in batches which in turn amortizes the cost. However, as the authors have pointed to a situation where a particular application might repeatedly create and delete temporary files might be hindered from using the storage right away. I see the need or the possibility of an on-demand request for garbage collection being implemented for specific scenarios like these.

In terms of graphs provided by the authors to explain throughput, they have specified results for a cluster with only 16 chunk servers and 16 clients. Considering the fact that their actual scenario involves a scale lot larger than that, performance graphs for the actual GFS system scenario would have helped make this paper a little more exciting since you would be able to understand how all the specific features for this system come together when it comes to that amount of data.



Review 21

This paper presents the GFS as an data intensive distributed file system that can satisfy the needs of google’s storage needs. The main focus of GFS design is to manage the large scale data files in a effective and reliable way.

GFS is built with following assumptions that, it has to use the inexpensive commodity components, which of course are not very reliable. And the files that are handled in GFS is usually huge files, multi-gb files will be commonly seen, on which most data manipulation would be just appending instead of writing.

What google comes for the solution in GFS is using a single master and multiple chunk server design that divides files into 64 MB chunks that are striped across multiple chunk servers. Each chunk is replicated in a way like the RAID level 1 system, but it serves as in a primary-secondary order(compared to mirroring). The master only maintain the file metadata as file and chunk namespace, the mapping from files to chunks and the location of each chunk’s replicas. And actual chunk replicas are maintained by the chunk servers.

Because of the architecture of the GFS, data flow and control flow are naturally separated. Each client application only need to ask the chunk position of each replica, and go ahead to access data from chunk servers. In this way, the mater workload is minimized, as the master only caches the location information of each chunk, and the data cache are done by the linux OS on the chunk servers. As the metadata in the master is so small that are all stored in the memory, the performance of the master is further improved.

In terms of the data integrity, all file data are protected by its secondary replicas, and the master uses the heartbeat to monitor the status of each chunk servers. On the chunk server side, checksum and chunk version number are applied to detect stale data. Chunk replica will be reconstructed once any of those checks fail. As for the metadata on the chunk master, it is protected by the operation log.

GFS is a distributed file system that provides high availability, low recovery time and high aggregate throughput to multiple users. It successfully support the research and development data processing needs within google.

One obvious flaw about this design that is also admitted by the author is that the hotspot problems when small file accesses confronted with a sudden surge. Although google is trying to manage such problems with techniques like pre-distribution of highly demanded executables, it still looks like a partial solution.



Review 22

In this paper the author introduce the Google distributed file system for the large data set and large data processing. The author first point out the assumption for their file system and them introduce the architecture for the google distributed file system.
The system is consist of one master that store the metadata of file system and deal with the application’s request and multiple chunck-servers. The chunck-servers store the data chuck and may interact with the application. The file system use replicas to tolerate the server fails.
The paper explain many concerns with file systems and provide details of how they deal with such problem. For example, they use the lease and mutation flow to minimize the involvement of master server to avoid the bottlenecks of the file system; Using replicas to tolerate the component failure and use some strategies to detect the failure or data corruption; Providing atomic appending operation for multiple application writing on the same file.d
Advantage:
I like this paper very much because the paper explain how they design and implement the google file system in detail. In each section the author first points out the problem or difficulty for implementing this part and then provide a way to deal with such problem. It is also a good material for people want to learn something about distribute file systems.
Problem of this paper:
(1)The author mentioned many trade-offs of Google file systems, like the chunk size, the latency and the through-output, garbage reuse after deleting files and the number of replicas. The author just says that some performance is more important based on their needs but there are some potential problems the google file system may meet. It is better for the author to explain more about how they make the decision, i.e. some experimental analysis on why they use 64MB chunk size.
(2) One thing I notice is that the master reply on the memory to store the metadata(namespace, etc.), but maybe in the future there are some needs to have very large cluster of distributed file system and the problem then is not just the price.


Review 23

In the paper, the authors present a storage system designed to run on commodity, unreliable hardware. The abstraction presented resembles that of a traditional filesystem. Files are organized in a directories and are flat (i.e. not structured) and may grow in length. GFS was designed to store large files, often large enough to require being split across a number of disks. Most writes are expected to be appended to existing files, rather than overwriting anything. For this reason, GFS has been designed to support many writers that wish to append to the file. It should be noted that during an append, the exact offset of where the append will be made is decided by GFS, not the client.

GFS splits data into 64 MB chunks that are replicated at least 3 times on number of chunkservers. All chunkservers are managed by a single GFS master. Healthchecking is done using heartbeats. Mutations take time to propagate to all of the relevant chunkservers. If a new client searches for a file, the master will point it at the up-to-date replica and not an out-of-date one. However, because the locations of chunks is cached, existing clients may see stale data. The staleness is bounded by the expiration time of the cache.

The authors present a storage system that has (relatively) simple semantics. The system scales well with the number of clients and is fault-tolerant. The chunkservers opt to use the operating system's file abstraction for storing chunks over using direct access to disk for simplicity. The system is able to handle many writers that wish to append gracefully. Each append operation is atomic and does not block other writers (that are appending) from executing.

This authors did not justify the selection of the 64 MB block size. This is important because reading 64 MB from disk has latency that may make it difficult to support many clients from a single machine (i.e. it allows for high throughput but hurts latency). Finally, the use of a single master is likely not scalable nor fault-tolerant.


Review 24

This paper proposes a new file system that Google uses to store its data called GFS or Google File System. Google wanted to come up with a solution that could accommodate distributed servers and scale at the rate that Google’s data processing is scaling. To do so, the Google File System was created with an interface called the GFS client that interfaces with the GFS master and multiple GFS chuckservers to get the files needed by the user.

The authors did a good job of explaining the motivation behind the project and why creating a new file system is a practical use of their time. Furthermore, the architecture was explained clearly and examples were given about how operations on the files are processed. However, some of their decision making for certain features were not described as I will discuss in the weaknesses section. Another strong point about the paper was the way that they presented the performance data. Data about two in production GFS systems were given to prevent any bias of how the system will be used day to day.

Overall, the paper does an excellent job presenting the concepts and the experimental findings, but the following are some weaknesses:

1. Were any experiments run to show that a 64 MB chunk size is the most optimal chunk size? Even though the authors provided logic as to why small chunk sizes are less efficient with the file system, they never mentioned why they didn’t go with a larger chunk size.

2. The regular HeartBeat messages and the garbage collection that occurs in the background may affect read performance. There is no mention about if and how these tasks are scheduled to have less priority during peak times when all of the clusters are almost overwhelmed. Also, does this impact performance even when there are few users reading?

3. The authors mentioned that they store 3 replicas by default. What are the space, performance, and reliability implications with storing fewer or more replicas? I would have liked to seen an experiment performed to compare the pros and cons of each number of replicas.



Review 25

This paper discusses the design and development of the Google File System, which differs significantly from traditional file system structures. Google's design uses a scheme that focuses on supporting high throughput for many concurrent accesses to data spread across a distributed server system. The authors note that the system was designed to support a distributed system in which:

1) Files are very large in relation to typical file sizes

2) Files are typically modified by appending to the end of the file rather than modifying data located in random locations throughout the file

3) Files are available to many clients trying to access their data concurrently

4) Treats component failure as a norm rather than an exception

The Google File System makes use of one Master server that provides clients with the location of data chunks on the systems "chunk servers", stores a non-persistent copy of the metadata about the state of each chunk server, and manages garbage collection of deleted/corrupted/stale files. Shadow copies of the master server are kept to provide greater availability for read requests on files. The multiple chunk servers in the system store data in 64 MB chunks. Each chunk is replicated in several other places and spread throughout multiple server racks in order to maintain high data reliability.

This file system has a great number of technological merits. It reconsidered many assumptions made by previous file systems in order to accommodate a different set of needs. In particular, the recognition that most of their data is written once then treated as essentially read only allowed for some optimizations that would not be possible in file systems where this is not the norm. Business data and experimental data, for example, are gathered once then remain unchanged except for occasionally having new data added to the existing set. The data is used primarily for analysis, which only requires reading the data and doing in-memory operations on it.

My main complaint about this paper is that it does not offer a comparison of the performance of GFS and traditional file systems when used with similar applications. The reasoning that the authors offer for their design decisions seems sound to me, but they offer no experimental data to back up this reasoning. Even if their system offers increased performance for the applications that they specify, they don't give any comparative analysis to show this.



Review 26

This paper introduces many design aspects of Google File System (GFS), which is a scalable distributed file system for large distributed data-intensive applications. This design was driven by the application workloads and environment nowadays. Though some design decisions mentioned in this paper is specific to Google setting, many aspects can be apply to data-processing tasks of similar magnitude.

First, the paper talked about the design flow of the file system. It mentioned some assumptions, and some of these are different from traditional design views, such as component failures, file sizes, and the type of writing files. The architecture of the file system consists of a single master, multiple chunkservers and clients. Having a single master can simplify the design and make replication decisions using global knowledge. The master operation includes namespace management, replica placement, and garbage collection, making the architecture more simplified. The clients never read and write file data through the master, minimizing its involvement in reads and writes. Therefore, the architecture of having one master and multiple chunkserver is an important design of GFS.

After talking about the architecture, the paper mentioned how the client, masters and chunkservers interact to implement some operations. The mechanism of these operations must guarantee the data consistency and integrity, including mutation order, data flow and atomic record appends. All of these mechanisms followed the rules that the clients only send requests to the master, and transfer data through chunkservers. In addition, GFS keeps the overall system highly available by replication and fast recovery.

In conclusion, the GFS demonstrates the qualities essential for supporting large-scale data processing workloads. The paper also talked about the reason for different design assumptions and aspects of GFS from traditional file system. Since the data processing magnitude and environment mentioned in this paper is a trend nowadays, we can learn a lot and consider some of design aspects from this paper when researching or designing file systems.



Review 27

This paper introduces and discusses the initial implementation of Google File System (GFS). It describes the motivation for why it was created (fault tolerant parallel access to very large files that are typically read and only written to by appending data) and how it accomplishes those needs.

The file system was a major success and it introduced a few new ideas to a file system:
1.) Making failure the norm not the exception and building to prep for that. This was the first file system that accepted things were going to fail and prepared for easy replacement, which is necessary when you have as many machines as Google now does.
2.) Lazily doing garbage collection rather than immediately. When a file is deleted it will defer that work to a later time and delete it when need be. All it needs to do immediately is change the master meta data and the chunks will be updated later.
3.) The concept of “chunkservers” and files being saved in separate chunks that can be located all over. Rather than storing files so that they are all together GFS has a “master” that stores metadata about what chunkservers different parts of the file are on. A file could be made up from chunks on many different chunkservers providing better storage efficiency and the ability to be parallelized.
4.) GFS realized the importance of placing data replicas on different machine racks and located in different parts of the country at times. This is because if on entire rack goes down (or worse an entire data site) you still need to be able to access the data, and placing it in different locations allows for that as well as faster access if there is a chunkserver closer to your location.

A few downsides to this paper:
1.) I would have liked a little more discussion about the decision to have only one master machine. I get it that there can be a backup but it feels a little strange to me to build a whole new filesystem around failure being the norm not the exception, but then saying you can have only one master. I feel as if the master going down might not be that infrequent and not really discussion how you can only have one master felt insufficient to me.
2.) Another flaw (that in their defense they might not have seen the importance of at the time) is hot spots. They talk about how a small file being accessed very frequently (or a certain part of a large file) can slow down performance. I think this is an issue that should have been further addressed and is probably a large problem now with “trending” things getting lots of hits. When exciting news breaks that millions of people all want to read at once it can sometimes be a tweet or some very small amount of data that will be a hot spot and slow down performance for an entire chunkserver (and in theory maybe master as it will be hit more per chunk needed to be loaded).
3.) Lastly, I always like to comment on the graphics used in the paper and I felt this paper was lacking graphics. The ones that were used were very helpful but apart from graphs at the end they were few and far between. I think there could have been more graphics used in the explanation of how GFS works from a client fetching data all the way to completion. Figure 1 tried to accomplish the entire process in a small graphic, and I think it was a good graphic but I think certain parts should have been drilled down on and given their own graphics. Also figure 1 was the only important graphic I found useful, and it’s never good to only find 1 graphic useful in a 15 page paper.

All in all though I really liked the paper because I think it did a great job of making a very complicated concept make sense and it is obviously one of the most relevant papers in the history of technology. GFS is a good thing to understand and know about and try to think of other applications for it or how it could be advanced.



Review 28

The purpose of this white paper is to introduce a distributed file system now widely implemented by google. The aim of this file system is to handle huge amounts of data (on the order of tera- and peta-bytes) across many different machines using “commodity hardware,” while still accounting for large data files and inconsistent node accessibility.

I was impressed with the assumptions on which this file system was created, namely: (1) the assumption of frequent node failures, (2) the rare existence of random writes (data is mostly appended and often never re-written), and the choice of (3) high data throughput is worth the cost of higher latency. Data is split throughout a network of each of these commodity-hardware nodes, with “chunks” of data being replicated across each of these nodes. A clever technique is the increased replication of data “chunks” that are frequently accessed, making the system somewhat adaptive to the needs of the application using it. This redundancy system in a network provides a simple fallback mechanism against irreversible failures, while requiring manageable network overhead and metadata storage.

A drawback of the paper that seems to be a likely source of trouble is the chance of a small file becoming a “hotspot” that is frequently accessed, creating a small bottleneck in system performance. In addition, the paper fails to mention of discuss the inefficiency of random seeks (besides mentioning that small reads transfer a significant portion of read data because of “random seek” workload).


Review 29

This paper details the ins and outs of the Google File System which is a distributed file system designed for handling large volumes of data on commodity servers. Traditional filesystems cannot handle large datasets nor can they coordinate distribution of data, thus there is a need for such a filesystem that can orchestrate data distribution in an efficient and reliable manner.

GFS is directed by a single master server which commands numerous chunk servers. The master server uses “HeartBeat” messages to contact the chunk servers regularly. The master is responsible for keeping track of the chunk servers namespaces and mappings along with the locations of the chunk servers. GFS also maintains a checkpointing system to allow recovery from system failures. Record appends are introduced to guarantee atomic writes, which guarantee to be written at least once.

One glaring issue with the GFS lies with clients that cache chunk server locations. Since clients can cache chunk locations, it is possible that a chunk could go endure some sort of failure and a client reads from that chunk without realizing it is erroneous. Another issue with GFS is the design choice of a single master which introduces a single point of failure. Although it is possible to shadow the master, the shadow can fall behind the original thus introducing downtime.



This paper details the ins and outs of the Google File System which is a distributed file system designed for handling large volumes of data on commodity servers. Traditional filesystems cannot handle large datasets nor can they coordinate distribution of data, thus there is a need for such a filesystem that can orchestrate data distribution in an efficient and reliable manner.

GFS is directed by a single master server which commands numerous chunk servers. The master server uses “HeartBeat” messages to contact the chunk servers regularly. The master is responsible for keeping track of the chunk servers namespaces and mappings along with the locations of the chunk servers. GFS also maintains a checkpointing system to allow recovery from system failures. Record appends are introduced to guarantee atomic writes, which guarantee to be written at least once.

One glaring issue with the GFS lies with clients that cache chunk server locations. Since clients can cache chunk locations, it is possible that a chunk could go endure some sort of failure and a client reads from that chunk without realizing it is erroneous. Another issue with GFS is the design choice of a single master which introduces a single point of failure. Although it is possible to shadow the master, the shadow can fall behind the original thus introducing downtime.



Review 30

The paper discusses design, architecture, and implementation of Google File System (GFS): a scalable distributed file system for large distributed data-intensive applications. The file system itself was designed based on the observations of Google’s workload and technological environment. Interestingly, it is built on these assumptions: (1) component failures are norm rather than exception, so monitoring, error detection, fault tolerance, and automatic recovery must be an integral part; (2) files are huge by traditional standards; (3) most files are mutated by appending new data rather than overwriting existing data, therefore appending becomes the focus of performance optimization.

A GFS consists of a single master and multiple chunkservers. The master maintains all file system metadata, while the chunkservers store chunk (divided file of a fixed-size) on their local disks. Clients only interact with the master to get the information on file name and the chunkserver index (that points to the chunkserver where the file is stored), so read and write processes are only between client and chunkserver. For reliability, each chunk is replicated on multiple chunkserver. A master also has its “shadow master”, which basically mirrors the master’s state except for decision-making authority in placing chunks. Contant monitoring is done through regular “handshakes” between master and chunkservers to identify failed chunkservers.

One of the main contributions of this paper is the detailed explanation of the workings of a file system interface extensions to support distributed applications with large-scale data processing workloads and commodity hardware. Basically, the GFS tries to make thing simpler. It does not use cache (for file data. Cache is still used for metadata, which is smaller in size), it tends to lift the “burden” from the master to reduce the I/O processing problem, and it does not employ complex data flow model (i.e.: tree) but instead rely on linear data transfer chain from one chunkserver to its nearest neighbor and so on.

However, one impression that comes after reading this is that the Google File System is designed according to Google’s need. So there are certain specifications (and tolerances) to adhere to, which I believe are specific to Google. Although the writers believe that it may be implementable on different setting, the paper does not give hints – at the very least – as to how the system would behave in different setting (i.e.: using Windows).



Review 31

The main purpose of this paper is to introduce a new type of File System that utilizes a large network of commodity machines with the assumption that failures are inevitable. The paper presents all of the assumptions made and some basic specifications of the GFS system. It steps through the different components of the system and compares their design decisions with those in a typical file system, discussing the counter-intuitive decisions carefully so that all of their choices are justified by the types of data they plan to handle in the GFS. They make a clear point that this file system is not meant to replace existing structures, but rather is a new implementation that is suited for their specific needs.

There are many technical contributions in this paper. The authors step through a very detailed process, enumerating and explaining design decisions while building a clear picture of how their system works. They then further justify said design decisions by running a set of empirical evaluations and benchmarks of their system, including evaluating two such systems that were, at the time, in active use by Google engineers. Unlike many file systems, they are optimizing for appended writes (there are very few random writes), and large blocks of reads. They optimize for their most frequent operations often at the expense of the efficiency of the less frequent operations, but their empirical results show that there is not a decrease in efficiency of the whole system because of this. (And shows that their estimated breakdown of the various operations frequency was indeed reasonable).

I think one big strength of this paper is in its layout. The paper is carefully thought through and the components are presented in a logical order and broken down into small sections and subsections. This fine granularity in subsection breakdown allows for an organized and methodical run through of the system that is easy to follow and understand.

In my opinion, this paper has few weaknesses. However, I do think that some of the terms they are using could be defined more clearly. For example, though I know what a HeartBeat message is, it would be helpful to have a general definition presented with its first use, especially since it is always written in italics when mentioned in the paper. My other small complaint is about the readability of the tables in which they present their results (Tables 4 - 6 especially). I think that it is very difficult to read tables without horizontal lines or differences in coloring so that rows are easily followed across.




Review 32

This paper gives a summary of the design, invention and renovation of the Google File System. At the same time it provides some in-depth analysis of the unique characteristics, advantages and limitations of the system’s design and architecture as well as some analysis of the design spaces.

While many of the points brought up in this paper are interesting and inspiring, I find the section of High Availability articulating attractive because that is one key requirement that needs to be fulfilled when it comes to an huge, data-oriented, enterprise like Google. By regarding normal and abnormal terminations equally it simplifies the algorithm of task recovery, which in return increases the availability when large amount of users randomly starts and terminates jobs.

In Section 4.4 the paper mentions that garbage collection is not done immediately after each release of physical storage space. This is a reasonable choice given the amount of data traffic and the requirement of performance and reliability of the structure. Otherwise, if the immediate reclaim of storage space was required to implement, the complexity added by such implementation would very possibly increase the chance of operation failure (bugs). Meanwhile, given the scale of the system, it is not as much of a sacrifice having some storage spaces in vain between two garbage collection actions.

In Section 5.1.2 the paper states that it is expected that implement more complicated redundancy schemes can make the design be more robust to general cases where there can be frequent small random writes. However, some more details analysis of the cost and sacristies of doing so, such as processing time (complexity), infrastructure cost (hardware requirement), would make it a more fair comparison and a better explanation of why this is not already done.



Review 33

Summary:
This paper presents the design of Google File System (GFS), a fault-tolerant, highly available and easy scalable distributed file system. This paper introduces GFS by first identify the problems of designing scalable file system for large scale data-intensive applications:
1) For large distributed file system build on inexpensive hardware, failures are norm rather than exception
2) Files stored in the system is huge, would be access by mostly streaming read rather than occasionally random read.
3) Most files are changed by appending new data rather than overwrite the existing ones.

After made these assumption, this paper presented a three-layer-distributed-system architecture that contains:
1) The master layer: It contains a master server that builds on a highly stable machine and several shadow masters. Master server is responsible for meta-data manipulation and system coordination between chunckservers and clients, whereas shadow masters served as backups of master to provide high availability in case of master failure.
2) The ChunkServer layer: It contains several chunkServers, with each managed several chunks of data. ChunkServers are themselves organized as a master-slave architecture where the master is called primary ChunkServer. Primary ChunkServer was selected by “granting lease” from GFS master. It maintains the mutation order of data for simultaneously writes and delivers the order to other ChunkServers that contain the replicas of data.
3. The client layer: It contains applications that use GFS services. Each client application would be responsible for communicating with GFS master to get meta-data, and sending data to nearest ChunkServer if it writes data, or filtering the data by checksum if it reads data from ChunkServer. It also needs to handle errors if a read or write fails or new Chunk is added for a single write.

GFS system also have several novel features and design details that help it tackle the problems from large distributed file system:
1) It separate the data flow and control flow to use the network efficiently.
2) It has a fix size of files chunk it stores, default set as 64 MB, to ensure limited meta request from client to master, and also to reduce network load.
3) It has a garbage collection service maintained by the master and ChunkServer layer. Its simplicity helps to improve the reliability and its usage of piggyback as well as background activities helps to improve the throughout. It also has a deletion time out that improves the safety.
4) It has a replica policy to provide fault-tolerance as well as high availability by arranging chunks based on network topology and historical hotspots.
5) It has several other features including snapshot, namespace locking service, e.t.c. These features all improve the system throughout as well as fault-tolerance.

The paper also has a batch of experiment results collected from both working and research environments. The results prove that GFS is just as it designed: it is a high available, high throughout, fault-tolerance, large scale distributed file system.

Strengths:
1. This paper gives a detailed introduction of Google File System, a groundbreaking distributed file system tackles many design problems for large-scale data intensive file system. The architecture of GFS and also many of its design details inspires its followers, including HDFS.
2. This paper gives first hand, fully evaluated experiment results that show the advantage of GFS.

Weakness:
1. This paper also presents several problems of the GFS, including hotspot problem that can be solved by reading from clients. Though it is great to see the paper point out these problems, a further research and discussion is more than appreciated.
2. Though the GFS considered the its efficiency in many aspects, I believe by considering data locality with computation can boost the performance greatly. For example, if we put the application with the same machine that stores the data it requires, and then it cut the network latency problem. Moreover, since GFS storage would need minimum computational power of hardware, this can also help to efficiently use the GFS hardware where it deployed.
3. By using fix sized chunk can improve the throughout of GFS. However the mutation order algorithm may leaves paddings or duplicated data that would cause a waste of storage. I believe it is worth to optimized by a background cleaning and synchronization services.



Review 34

In this paper the author discussed the design and implementation of Google File System (GFS). This distributed file system is deployed on commodity hardware, but provides high performance and reliability and scales to over a thousand machines.
The whole design of GFS is driven by Google’s observation on their application’s workload. First, component failure are common to see, as they use inexpensive commodity hardware. Second, files are huge. Third, the most common workloads are file append and sequential read. Fourth, co-designing of application and file system can benefit the overall system.
One of the key design choices made by Google is to use single master. The GFS contains a single master and multi chunkservers. Files are divided into chunks and stored in chunkservers. The master node only stores small size metadata. Clients communicate with master to query metadata. Data is only exchanged between client and chunkservers, not through master. By applying this architecture, they ease the work of coordinate the whole system. As master only keeps metadata and exchange limited messages with clients, in practice it’s not a bottleneck at all.
Reliability is achieved by data replication and checksum. For every chunk, there are usually 3 copies store on different chunkservers. Master periodically examines the number of replicas and makes up for those lost replicas on failed chunkservers. Every block of file is stored with a checksum on chunkservers, so that the chunkservers can quickly detect corrupted data.
Another interesting technique is lease. The GFS need to maintain a consistent mutation order across replicas. Instead of coordinating chunkservers that store the same replica, the GFS relies on its single master granting chunk lease to one of the chunkservers. The primary chunkserver will pick a serial order for all mutations to the chunk through its lease period. This simple technique minimizes the management overhead at master. It is also revocable either by master explicitly sending message or waiting for it to expire.
Throughout this paper we can see how important it is to keep a system simple and not to overkill by introducing complex and useless techniques. The GFS is designed as a simple distributed file system with only one centralized master node. It didn’t use the more complex distributed coordination algorithms, which are hard to implement and prone to error. It’s also important to note that designated workload is the key factor that affects design decision. Compared to Amazon’s Dynomo, the architecture of GFS is drastically different, though they both well support their services.