This paper introduces design choices for the Google File System (GFS). The GFS design follows these assumptions: 1. system components often fail, so the system needs to monitor, detect, tolerate, and recover failures. 2. Files are large. 3. The workload consists of large streaming reads and small random reads. 4. The workload includes many writes than append data to files. 5. The system should handle concurrent write (update) well. 6. Throughput is more valued than individual request's latency. To meet goals, the GFS architecture is designed to have a single master (hold metadata for files, have some master replicas) and multiple chunkservers (hold replicated data and respond to client requests, for reliability). Data are separated into chunks, and a large chunk size reduces client-master interaction, network overhead, and size of metadata, while increasing the chance of having hot spots in the system. Checkpoints on the master node are used. Snapshots, operation logs, and master replicas are used to recover master states. GFS has a relaxed consistency model, and maintains several guarantees: file namespace mutations are atomic, and after a sequence of successful mutations the file region is defined and contain the data written by the last mutation. The concept of lease is used to help the system decide global mutation order of a set of mutations on the same data. During the order decision process, data flow and control flow are decoupled to fully utilize each machine's network bandwidth. GFS use read-write locks to manage namespaces. It has smart replica placement methods to maximize data reliability and availability and maximizes network bandwidth utilization. GFS also do re-replication, rebalancing, and garbage collection (lazily remove unused files) to fulfill these goals. The author uses two real-world sample workloads to illustrate their design assumptions are correct, and the design choices meet needs. This paper well illustrates the motivations to the GFS, and the designs made to meet different goals. The thing I like most is that it gives a clear explanation on the system architecture of GFS, and use real-world workload example to show the results they have on the system they built. This paper introduces design choices for the Google File System (GFS). The GFS design follows these assumptions: 1. system components often fail, so the system needs to monitor, detect, tolerate, and recover failures. 2. Files are large. 3. The workload consists of large streaming reads and small random reads. 4. The workload includes many writes than append data to files. 5. The system should handle concurrent write (update) well. 6. Throughput is more valued than individual request's latency. To meet goals, the GFS architecture is designed to have a single master (hold metadata for files, have some master replicas) and multiple chunkservers (hold replicated data and respond to client requests, for reliability). Data are separated into chunks, and a large chunk size reduces client-master interaction, network overhead, and size of metadata, while increasing the chance of having hot spots in the system. Checkpoints on the master node are used. Snapshots, operation logs, and master replicas are used to recover master states. GFS has a relaxed consistency model, and maintains several guarantees: file namespace mutations are atomic, and after a sequence of successful mutations the file region is defined and contain the data written by the last mutation. The concept of lease is used to help the system decide global mutation order of a set of mutations on the same data. During the order decision process, data flow and control flow are decoupled to fully utilize each machine's network bandwidth. GFS use read-write locks to manage namespaces. It has smart replica placement methods to maximize data reliability and availability and maximizes network bandwidth utilization. GFS also do re-replication, rebalancing, and garbage collection (lazily remove unused files) to fulfill these goals. The author uses two real-world sample workloads to illustrate their design assumptions are correct, and the design choices meet needs. This paper well illustrates the motivations to the GFS, and the designs made to meet different goals. The thing I like most is that it gives a clear explanation on the system architecture of GFS, and use real-world workload example to show the results they have on the system they built. |
Google File System team introduced their product in this paper. GFS is designed to meet Google’s enormous data processing need but now it’s used broadly for different purposes. It’s necessary because traditional distributed file systems suffer drawbacks like can’t deal with component failure. This paper provides a thorough overview of the GFS system design and performance. First, it introduces the architecture and model design of the system. Then, they show how the system interact with the client, master and chunk servers’ operations. Next section is the fault tolerance performance analysis and some bottlenecks in GFS architecture and implementation. At last the team shared their experience while developing GFS, the problem they faced and how they deal with that. Some of the strengths of this paper are: 1. GFS has high availability. Data is still available even if some of the node fails in the file system. In other words, component failures are the norm rather than exception 2. By running multiple nodes in parallel, GFS delivers high aggregate throughput to many concurrent readers and writers’ actions. 3. GFS storage is reliable. When data corrupted, it can be detected and recovered. 4. Earlier GFS has a workload bottleneck in Master. Current GFS has solved the problem by changing master data structures to allow efficient binary searches. Some of the drawbacks of this paper are: 1. For files with a small size less than 100MB, GFS is not optimized. 2. GFS can’t handle random write or modified previous files efficiently because of the appending mechanism it used. 3. GFS optimize a high data processing rate but not optimizing the time performance for a single read or write. |
The paper presented the design overview of Google File System, explained its mechanism to support large distributed data-intensive applications and reported measurements from both micro-benchmarks and real world use. As the demands of data processing needs keep growing rapidly, distributed file systems require better performance, scalability, reliability, and availability. Besides, the observations of application workloads and technological environment have changed into more common component failure, large file, more appending than overwriting, and increased flexibility by co-designing. The paper summarized the redesigned model as follows: 1.Design Overview The whole design is based on assumptions: 1) Components often fail. 2)Files stored are large. 3) Streaming reads and sequential writes outnumber random reads and random writes. 4) Concurrent appending and high bandwidth are important. The architecture of GFS is a single master with multiple chunkservers, which is accessed by multiple clients. Master maintains all the metadata and system-wide activities while chunkservers store file chunks with replicas on other chunkservers. Clients cache metadata but neither client nor chunkserver caches file data. Metadata contains the file and chunk namespaces, the mapping from files to chunks, and the locations of each chunk’s replicas, the first two of which are also kept in an operation log. The master can recover its file system by replaying the operation log. GFS has a relaxed consistency model, which guarantees atomicity, correctness, definedness, fast recovery and no data corruption, and this model can be accommodated by GFS applications. 2.System Interaction The system is designed to minimize the master’s involvement in all operations. 1)Leases are used to maintain a consistent mutation order across replicas in order to minimized the overhead at master. 2)The flow of data is decoupled from the flow of control to use the network efficiently. While control flows from the client to the primary and then to all secondaries, data is pushed linearly along a carefully picked chain of chunkservers in a pipelined fashion. 3)GFS provides an atomic append operation called record append. 4)GFS uses standard copy-on-write techniques to implement snapshots. 3.Master Operation 1)GFS allows multiple operations to be active and use locks over regions of the namespace to ensure proper serialization. 2)GFS manages chunk replicas throughout the system by spreading chunks across machines and racks. 3)GFS makes placement decisions, creates new chunks and hence replicas, and coordinates various system-wide activities to keep chunks fully replicated, to balance load across all the chunkservers. 4)GFS offers lazy garbage collection to reclaim storage for simplicity and reliability, as well as merging storage reclamation into the regular background activities of the master, and safety net against accidental, irreversible deletion. 5)GFS maintains version number to detect stale replica. 4.High Availability GFS is highly available by fast recovery and replication. It achieves data integrity by checksum and uses extensive and detailed diagnostic logging to help problem isolation, debugging, and performance analysis. 5.Measurement This paper presented micro-benchmarks to illustrate the bottlenecks inherent in the GFS architecture and implementation, and also some numbers from real clusters in use at Google. The paper provided a distributed file system with high availability, high throughput and reliable storage, which pioneered in industrial at that time. Instead of giving a high level idea of how to design this file system, the paper gave a detailed description and reason of this design, which makes reader clear about both the design and the reason why this design should be optimal. However, this design also has following problems: 1. This design wastes storage for small sized files. 2. A single-master structure restricts the scalability of the system. 3. This design is not suitable for a large number of random read/write operations. |
Problem & Motivations: The engineers want to design a scalable distributed file system for large distributed data-intensive applications on inexpensive commodity hardware which suits for google workloads. The system shares 4 distinctions with the tradition. 1. Enable fault tolerance. 2. Handle files with a large file size. 3. Different files operation patterns (append, large stream reading). 4. The flexibility of API. The authors propose the Google File System which can suit the requirements. Contributions: It proposes the Google File System. The GFS contains many useful talent design details like the file system structure with the chunk. However, the most important contribution is that it views the faults as common rather than exceptions. And by introducing replicas, it successfully built a system which relied on the inexpensive commodity hardware. Drawback: Contains too many details and yet lack a sense of the whole. If there is an example that can guide as started from the application request to how GFS track the data and send it back (step by step and detailly). It will be excellent! Problem & Motivations: The engineers want to design a scalable distributed file system for large distributed data-intensive applications on inexpensive commodity hardware which suits for google workloads. The system shares 4 distinctions with the tradition. 1. Enable fault tolerance. 2. Handle files with a large file size. 3. Different files operation patterns (append, large stream reading). 4. The flexibility of API. The authors propose the Google File System which can suit the requirements. Contributions: It proposes the Google File System. The GFS contains many useful talent design details like the file system structure with the chunk. However, the most important contribution is that it views the faults as common rather than exceptions. And by introducing replicas, it successfully built a system which relied on the inexpensive commodity hardware. Drawback: Contains too many details and yet lack a sense of the whole. If there is an example that can guide as started from the application request to how GFS track the data and send it back (step by step and detailly). It will be excellent! |
This paper details the Google file system developed in house to be a “scalable distributed file system for large distributed data-intensive applications.” The motivation for developing this system were the distinctive workloads faced by Google, such as a large amount of sequential reads of very large files for data analysis, as well as other factors unique to the internal Google ecosystem. With that in mind, the Google file system aims to provide performance, scalability, reliability, and availability, much like the typical file systems being used. This paper starts by detailing the main assumptions behind the design of this system. First, components are assumed to have relatively high failure rates (since they are composed of large numbers of inexpensive commodity items). Second, the file sizes that the system is expected to work with are orders of magnitudes larger than traditional file sizes. Third, appending to the end of the file rather than random writes is the norm. Finally, designing the applications and file system API together increases flexibility. The paper continues by describing the key implementation details of the Google file system. Each system is comprised of a single master and multiple chunkservers, which are potentially accessed by multiple clients. Clients are directed to a particular chunkserver by the master, which is also responsible for maintaining the system metadata. The chunk size was chosen to be 64 MB, much larger than that on typical systems, which brings the advantage of less client interaction with the master, among other advantages. The paper continues the discussion in a similar level of detail about how chunk locations are stored, replication (since it is a distributed system by design), fault tolerance and recovery methods, and ways to ensure data integrity. It then follows with experimental results based on benchmarks run on the Google file system, which are compared with performance data from real clusters in use at the time. The main strengths of this paper are that it introduces a system that is well specialized to the unique use case for Google. By identifying the typical workloads, the system can be well tailored to their needs. Also, the way that they essentially assumed that a system could fail at any time, by treating normal and abnormal terminations as the same, and taking steps to verify everything, helps in ensuring high reliability and data availability. In general, the paper was well written and easy to follow. The primary weakness probably ties hand in hand with its greatest strength, in that it is greatly specialized to a particular type of use case (e.g. mostly file appends rather than random writes). This undoubtedly brings greater gains than a more generalized system, but there is always a risk that usage patterns may change in the future (though that probably will not be the case in this age of big data). Beyond that, the presence of just one master that is responsible for managing many chunkservers presents a single point of failure, as well as a potential bottleneck that might make it more difficult to scale in the future. |
This paper’s purpose is to outline the Google file system (GFS) which boasts its scalability and reliability in serving many clients and offering distributed services. It mentions the challenge of handling very large, ever changing files. This is important because of how widespread Google’s database is in the average person’s daily life, and it is quite robust. The paper then describes some of the transactions that the file system interface supports. We get to see the architecture of the database as a single master and many chunkservers, which hold all of the data in the database in fragments. We look into chunk size and how it affects the performance of queries since larger chunk sizes will reduce the need for interaction with the master since all data can be found on one chunkserver. GFS deals with consistency issues by having multiple backups of each of the chunkservers. That way data is not lost unless all backups of that data chunk all fail before the master has performed its sort of heartbeat “handshake” with the chunkservers. We also looked into mutations, which were transactions that modified data, and how they were backed up to replica chunkservers whenever changed. We also looked into how GFS deletes files by renaming them and garbage collecting after the file has sat with the new name for 3 days. Data integrity was also covered as we have previously seen, using checksumming handshakes to detect corrupted data. I liked how thorough this paper was, this was one of the few times that assumptions were explicitly outlined, before going into the main contribution of the paper. We got to see what they assumed in the resources, performance, and challenges that the file system will face, like recovering data from one of the cheap distributed computers, or dealing with a lot of rapidly changing files. The paper was very organized as well, I felt, everything flowed well and was very digestible. I did not like how dense figure 1 was. I thought figure 2 was great, very digestible. However figure 1 was a bit hard for me to understand because of how dense the information was in it, even though they covered it in the text of that section. Perhaps if this visual was broken up into multiple sub graphs and explained in a bit more detail incrementally, I would have received this information better. |
GFS is a file system created by Google to fit needs. It leverages largely on distributed system ideas and the whole file system is running on multiple servers so that it costs less. Some difficulties and assumptions are 1)single component failure is very common considering large volume of servers. 2)files are huge by traditional standards. 3)there are two kinds of read: large streaming reads and small random reads 4)adopts data appending instead of overwriting. 5)handles simultaneously visits. The basic structure of GFS is master-chunkservers architecture accessing by clients. It is a relaxed consistency model with easy implementation but realized distributed properties. The master maintains metadata of files and communicating with the chunk server using HeartBeat messages basically. The master has three major types of metadata 1)the file and chink namespaces 2)the mapping from files to chinks 3) the locations of each chunks' replicas. The master is designed with the aim of minimizing involvement in reads and writes to avoiding becoming a bottleneck, The chunk size is relatively large with easy space allocation. And The files are divided into fixed-sized chunks as storing. GFS uses lease mechanism. It is designed to minimize the masters involvement in operations. GFS also supports pushing data using each machine's network bandwidth which prevent network bottlenecks and latency. This is achieved by pushing linearly along the chain of chunk servers instead of distributed in some other ways. It also minimizes latency. By pipelining the data transfer over TCP connections. GFS provides record appends which is an atomic operation and supports many clients on different machines append to the same file concurrently. GFS also use smart 'copy-on-write' to implement snapshots. GFS's master server serves many tasks: 1)executes all namespace operations 2)makes placement decisions 3) create new chunks and replicas 4)load balancing and reclaim unused storage 5)garbage collection(not deleting immediately, simpler and reliable but sometimes constrain things when storage is tight) 6)stale replica detection. GFS achieves high availability by fast recovery and (chunk, master) replication. Mention that there is a shadow master providing read-only access to file system in case the primary master is down. Also each chinkserver using checksumming to detect corruption of stored data and it will not that impact the performance. Also the diagnostic log is quite useful but have small impact to the performance. The contribution of this paper is that it uses commodity hardware to realize large scale data processing workload which is really amazing. And many times the design ideas bring easy implementation but good performance. It meet the needs for scalability, stability, concurrency, integration for real situation. I think one of the flaw of the GFS is on its master which may raise a bottleneck for system. My idea is to build a small group of masters(3-4) with fully connected manner to stabilize the master's work and ensures its performance. |
The paper presents the Google File System (GFS), a novel distributed file system. The system is built on top of commodity hardware, and takes the approach of assuming that components *will fail* and that the system should be resilient. GFS uses a single-master approach, and stores each “chunk” of data on three datanodes. A novel method, “record append,” is used to write data in an atomic fashion. Through a clever set of techniques involving caching data on client machines and circumventing larger network hops, the system is able to be quite efficient despite having many machines and only one master. Overall, this paper presents a system that gives up storage space and latency in order to achieve durability and scalability. The true strength of this paper in my opinion was that it covered almost all of the bases when it came to potential flaws. I constantly found myself marking that I thought something was a potential problem, only to read half a page later that the authors had a clean solution for that problem. One of the first things that I was concerned about was that the master seemed like a single point of failure; then on page 3, I read that the operation log was used to prevent problems in the case of master crashes, and on page 9, I read that the master’s state is indeed replicated. Another similar concern I had was that it might be possible or likely to have problems with multiple replicas, especially with the clear statement that machines are expected to fail. However, the decision to spread replicas of each chunk over multiple racks helps with at least some issues that could take down the data in multiple places. The way that the paper was written gave me confidence that the authors had built a reliable system. The empirical data provided was also useful in this regard. While most of my concerns were not ignored in the paper, I did still have some reservations about the system when I finished reading: 1. The system is highly inefficient when it comes to the number of machines used because of the (configurable) 3x replication factor. The authors briefly discussed using parity or erasure codes, and I’d like to know if this was ever implemented. If not, this might not be a great system for companies looking to save on infrastructure costs, although it does allow for some automation that may save money in the long run. Machines are cheap. 2. The system is clearly built for specific workloads at google, but might not fit workloads at other companies as well. For instance, there is a clear emphasis put on overall bandwidth as opposed to latency seen by one client - other applications might not have the same priorities. 3. Simply put, this system sounds like a pain to set up. Google probably has some great automation tools, but I have managed HDFS servers on AWS and it wasn’t fun. My concerns aren’t a suggestion not to use GFS - they are merely a list of reasons why it shouldn’t be considered as the only option. |
This paper purposed Google File System, which is a scalable distributed file system for large distributed data-intensive applications. The motivation for this system is that Google has observed a marked departure from original file system design assumptions. More specifically, Google found that component failures are common, files are huge, and most files are mutated by appending new data. A GFS cluster is designed as follows: it consists of a single master and multiple chunkservers. Files are divided into fixed-size chunks and identified by a globally unique 64bit chunk handle. Each chunk is stored and replicated multiple times (by default 3) on multiple chunkservers. When the client needs to read or write data, it first communicates with the master to get metadata such as chunk handle and chunk locations. The Master is responsible for storing metadata such as file and chunk namespaces, mapping from files to chunks and the locations of each chunk’s replicas. Some of the information should also be kept persistent so that when the master restarts these states can be retrieved. There’s also a shadow master which can serve read operations when the primary master is down. Besides the high-level architecture, the paper also explains lots of details, for example, the consistency model, lease, mutation order, locking scheme, garbage collection, etc. Together, they form a complete description of the Google file system. The main goal of GFS is to build a file system that meets the needs of Google’s workload. The paper presented measurements from their research & development cluster as well as production data processing cluster and it seems like GFS works extremely well on these workloads. I think besides the specific design of GFS, another lesson we should learn from this paper is that commodity hardware is capable of supporting large-scale data processing workload under the right design decisions and workload assumptions. |
In this paper, Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung discuss the implementation approach and considerations taken when creating the google file system (GFS). Specifically, this paper discusses the design overview, system interactions, master operations, fault tolerance and diagnosis, and measurements - both on the small data benchmarks and industry scale. When developing GFS, Ghemawat's design was driven by Google's workloads and technological environment, both current and anticipated. Thus, some traditional choices were abandoned in order to suit their needs. As a result, they developed a scalable distributed file system that runs on inexpensive commodity hardware and serves high aggregate performance to a large number of clients. In one instance, their largest cluster involves hundreds of terabytes of data, across thousands of disks, concurrently accessed by hundreds of clients. Scaling out rather than scaling up enables Google to meet their research and development needs at a cheaper price. This decision to make this public knowledge is important as it allows any smaller company to gain insight and more room for growth. Ghemawat placed many core assumptions that allowed him to carefully choose his architecture. These assumptions include using commodity hardware that fail frequently, a system storage of multi-GB files, workloads with large stream reads and small random reads, workloads with large sequential writes that append to files, and a prioritization on high sustained bandwidth rather than low latency. The architecture chosen consists of a single master and multiple chunksevers that are accessed by clients. The master mediates interactions between these chunks and oversees metadata storage. The interaction between the chunks and master are minimized in order to reduce the overhead for the master - the master will only respond with the chunk handle and chunk location. Thus, applications use the appropriate chunkservers to extract their data. It addition, the master creates and manages chunk replicas to balance loads across the chunkservers. This enables fast recovery of data, data integrity, and an easy diagnosis on problems (since machines are not to be trusted). Even though there are many great technical contributions, there are just as many drawbacks as well. The first and most obvious drawback is a security flaw in their file system. The GFS is located in the user space, which in practice, is not a good thing to do. Using the kernel space is much more appropriate due to the fact that the Linux operating system has a better chance of staving off unwanted viruses and protecting against attacks. Furthermore, having a single master might create a bottleneck in the system if they have constant random reads or writes to data. I would have appreciated it if they included graphs detailing the impact on performance they might have, if it was done constantly. |
This paper outlines the implementation of the Google File System. GFS is built with large workloads in mind; it uses several servers, ande is optimized for multiple clients and reading and writing from large files. Files are broken up into fixed size chunks, which are replicated and stored on a large amount of chunkservers. Clients contact a single master server, which redirects them to the appropriate chunkserver. As little work as possible is done on the master server, so that it doesn’t get overloaded by client requests. The chunkservers communicate with each other to synchronize changes, although each chunk replica need not be in exactly the same state after a change. Because replicas don’t need to be the same, they must keep track of their data integrity individually. Replicas of chunks are created whenever needed, or whenever the number of replicas falls below the needed threshold. Copies of files can be made from a user perspective with little work; by using copy-on-write, the actual copying can be saved until changes are made to either copy. GFS is optimized for large data loads, and is built with all of its components being replaceable without having a significant negative effect on the system. This allows it to scale up to much larger sizes much more easily than other storage systems. Being built on fault tolerance also helps the system to stay consistent. The general usage is mostly similar to other file systems as well, which should help portability. The focus on exclusively large datasets does limit its usage, however. The bulk of optimization is focused on serial as opposed to random access, which makes some applications less useful if they don’t read much data serially. |
The Google File System is developed to meet the industry demand with scenarios like more machine involved and more clients participated. It’s crucial as it successfully solve the problem by considering the fault tolerance on multiple inexpensive machines and optimizing aggregation performance to clients. The GFS is different from traditional file systems for these assumptions. It considers the fault tolerance and recovery on a routine basis. It emphasis on append operation. The chunk size should be reconsidered to both efficiently manage big files and also support small files. The workloads consists of large streaming reads, small random reads and large, sequential writes that append data to files. The system must efficiently support multiple concurrent writes to the same files and high sustained bandwidth is more important than low latency. The architecture of a GFS cluster consists of one master and multiple chunkservers accessed by multiple clients. The files are divided into several chunks delivered to and stored in different chunkservers and are accessed by unique chunk-handle. Given reliability, chunks will be delivered to multiple chunkservers.The Master manages metadata which is mainly about the information of chunks instead of file data. The master and clients only communicate metadata, and the read and write is processed in different chunkservers. Clients only need to backup metadata. One master, big chunk, metadata and concurrency control are crucial characteristics of GFS in my opinion. One master is beneficial to simplify operation and global scheduling. Big chunk will decrease the numbers of chunks which are beneficial to decrease interaction between master and client, relieve metadata storage pressure and decrease cost of transfers of clients. The utilization enables the master to focus on managing chunks instead of heavy work of reading and writing. The concurrency control mainly solves the problems in scenarios where multiple clients concurrently write to the same file. The GFS implement record append to realize this multiple consumers and single producer process and save synchronization cost compared with tradition file system. One thing to mention is the backup strategy of GFS. The data is stored in chunkservers and matedata is stored in master. We should always consider fault tolerance and recovery. The data will be backup in different chunkservers on different racks and metadata will be backup in different chunkservers. There is no doubt that GFS achieved great success. But maybe there are some drawbacks. The GFS make an assumption of more big file than small files, so it uses bigger chunk size, which shows that too many small files(if any) will greatly decrease the efficiency of GFS. Another thing to mention is that clients will backup metadata. Assumed there are frequent crash in chunkservers, the metadata backup in clients will also be frequently invalid. |
This paper introduces how Google File System delivers high aggregate performance to a large number of clients with inexpensive hardwares. GFS is essentially distributed file system that shares similar goals like performance, scalability, reliability, and availability with other distributed systems. What makes GFS different is it is application specific. Author provides a design overview before going into technical details, which is very helpful for readers to understand the whole picture of the distributed system. In the overview section, author provides assumptions that are specific to the application, high level explanation of interface and architecture, how GFS keeps its metadata, and how it deals with consistency. The following sections goes into details of each aspect mentioned in the overview section. Essentially, GFS has a single master that maintains all file system metadata. Files are divided into fixed-size 64 MB chunks, and the chunks have multiple replicas spread across racks for recovery. Clients interact with master to get the chunk location information and then send requests to the closest replica for chunk data. Master controls all chunk placement and monitors chunkserver status with HeartBeat message. There is also a operational log that contains a historical record of metadata changes. The paper explains why this structure is adopted for the application, why a certain size is chosen, why replicas are distributed in a certain way, how these key components interact to keep records consistent, what special features GFS has, how to detect stale replica, and many more aspects. The last section analyzes read, write, and append time, recovery time, and workload of real world clusters. This paper is very fluent in structure and provides reasoning for almost all design decisions. It also points out the potential problem of the system like hot spots and provides possible solutions. One thing that may help improve the paper is to provide summary of unique settings of Google. The paper mentions what types of service clients use the most here and there. If there is a summary of the specific application, it would be very helpful. |
“The Google File System” by Sanjay Ghemawat et al. describes a new file system approach created at Google that aims to support many clients with large reads and writes on an architecture built from (many) commodity hardware machines. Since there is high likelihood that some of the (many) commodity machines will fail in a given time range, GFS must utilize a few approaches to ensure data is not lost and that the system does not go offline: data replication across multiple chunkservers, replicated master metadata, checksumming for confirming data integrity, and fast recovery. To support high aggregate throughput, GFS uses a master-chunkserver-client architecture where clients communicate read/write requests (but not data) to the master in order to get routed to an appropriate chunkserver. The master is not a bottleneck, as the master is not performing the read/write operations; it is only telling clients which chunkserver they should read/write data to/from. The authors position the paper as questioning traditional file system standards and considering whether assumptions in prior research apply to their use case at Google: commodity hardware clusters running large-scale data processing workloads. In addition to their novel architecture and boldness in questioning the standard, I really appreciate that the paper outlines the assumptions (of scenarios to support and not support) that they made when designing their system. This makes it clearer for the reader, to understand where and how the approach could generalize, and how the work compares to related work. It is clear that the authors were actively aware of the assumptions during their research process, and actively considering what their research contributions were. It is also nice that the authors evaluate their approach on real world systems at Google, systems that would be expected to handle large and data-intensive workloads. The file system architecture as a result appears realistic, and the approach promising. The tradeoff of making many assumptions is that the approach likely would not work for a wide range of hardware and data workloads, but there is not always a one-size-fits-all solution. As another critique, in the related work I think it could have been helpful to have a table or otherwise easy to read summary of the different kinds of filesystem architectures and different kinds of workloads, and which architectures are and are not effective for each kind of workload. |
This paper describes the design and implementation details of the Google File System. It considers several goals same as other distributed systems, such as performance, scalability, reliability, and availability, along with their key observation of their application workloads and technological environment. In Section 2, the paper gives its assumptions of the GFS. It is so important that only by considering the assumptions can the system design be correct. For example, "the system is built from inexpensive commodity components that often fail", that's why they need to make several replicas and consider fast recovery. The paper gives the architecture of the system. Telling us why they choose a single master, multi-chunkserver structure and their consideration of choosing 64MB to be the chunk size, which is very different from the Linux file system. Also, the paper gives a detail description of how the client, master, and chunkservers to interact with each other to perform different data operation while keeping the consistency, available of the system. All from a system and engineering perspective. The paper also describes how the GFS achieves the availability of fault tolerance and diagnosis. At last, the paper shows it's experiment results on test cluster and real-world clusters. Overall, the paper gives the design of GFS detailedly, and they fully considered the real-world situations and limitations of building such a large distributed file system. Besides, it's very demonstrative to illustrate the data flow in a flow chart. However, after reading the paper, I'm still confused by a problem, how will the files be stored if I upload lots of files much smaller than the chunk size, for example, 1K photos of size around 1MB. If the system just leaves the chunk empty, it will be a large waste. |
This paper is one of the three most famous paper purposed by Google, the other two are MapReduce and Bigtable. The idea of GFS is a milestone in the area of distributed storage systems and make a big success in the market. The famous open source system Hadoop Distributed File System (HDFS) is designed based on many ideas of GFS. It’s a great pleasure for me to spend time reading this wonderful paper. With the coming of the Internet era, the volume of data grows at a crazy speed. How to effectively and efficiently manage these data come to a question for every internet company including Google. They need to build some storage system provide high reliability, availability, scalability and high performance for the rapidly growing demands of Google’s data processing needs. How to build such a system and apply their business logic to it is a significant problem for every internet company, because for an internet company, data is the most important thing for it. It needs to keep high availability rate for their website, guarantee the reliability of the storage so that it won’t lose user’s data, also it needs to make sure that it can handle accesses from many users at the same time. Based on these demands, the Google File System is introduced. The GFS uses the master-slave pattern which consists of a single master and multiple chunkservers. GFS achieves high performance as well as scalability, reliability and availability. Next, I will summarize the crux of GFS with my understanding. In GFS, all the servers are using commodity machine and it is very flexible to add or remove chunkservers. Since commodity devices are subject to failure, GFS introduces several mechanisms for the reliability including monitoring, failure detection, failure tolerance and recovery. GFS supports files in different size and files are divided into fixed-size 64MB chunks. GFS focus on workload in large streaming and small random reads and large, sequential writes (append). For GFS, the large bandwidth is required while the latency is not a big problem. GFS supports normal file operations include create, delete, open, close, read and write, besides new features like snapshot and record append are also introduced. One of the key ideas for maintaining the reliability of GFS is using chunk replicas on multiple chunkservers. In GFS, master server coordinates the operation of the system including metadata management, chunk lease management, garbage collection, chunk migration and etc. The master uses a periodical HeartBeat message to control chunkservers and collect their states. However, the master does not involve in reads and writes and they try to minimize its interaction in all operations. I think it a good design avoiding master become a bottleneck. Besides, their design decouples the data flow and control flow, which makes it easier to schedule expensive data flow efficiently. The whole system design is driven by observations of workloads and the technological environment in Google, this is also something I learned from this paper. When we try to design or create something new, we should start with real-world practice, identify the requirement clearly, then apply our knowledge to solve the problem. This is a pioneering paper in the area of distributed file systems and it does make a great contribution to the development of distributed file systems. The idea of master-slave mode and usage of commodity hardware has a great impact on modern distributed file systems. In their design, it doesn’t introduce too many complicated mechanisms, they try to make their design as simple as possible, I think this is what we still need to follow nowadays. Also, some design of the system is pretty innovative, like HeartBeat protocol, snapshot utility, chunk managements and etc. Although this paper was presented in 2003, it already includes many important ideas for Bigdata. GFS is a successful product which is still in use (maybe Colossus) as an important infrastructure of other products in Google. Overall, it’s a great paper and I do not find any main drawbacks. Since this paper was written 15 years ago, I think nowadays, it is impossible to use a single master to do things, a single master will definitely become the bottleneck of the system. By the way, GFS is not open source as HDFS, it would be better if Google is willing to open source it. |
This paper introduces a distributed file system developed and used by Google for “large distributed data-intensive applications.” Its main architecture consists of a master node and many non-master “chunkservers” which host “chunks” which contain the stored data. Clients access these chunks by asking the master node for the mapping that tells them which chunkserver to request information from; this can be cached by the client to mitigate bottlenecking at the master node. In order to reduce complexity, a weaker version of consistency (compared to serializability) is guaranteed by GFS, which optimizes append operations over re-write operations. In order to guarantee durability, a log file (called the “operation log”) is maintained for when things go wrong. Several specific feature optimizations are also listed, like lazy garbage collection and stale replica detection, which allow for even better performance. The chief advantages/benefits of GFS are as follows: 1) It runs on a distributed network of cheap commodity servers. This is the biggest strength in my opinion, as it allows for high performance at a cheap cost. 2) It is robust under failures, as its distributed protocol and operation log ensure that the consistency guarantees are always valid. 3) This is optimized for the environment assumed by the paper, which is data-intensive applications which use append operations far more than re-write applications. On the flip side, although GFS is optimized for the environment assumed, the environment IS based on a number of assumptions, which can be treated as weaknesses. For example, there are big assumptions made on the workloads; particularly that they will mostly be either large streaming reads, small random reads, or large appending writes. For workloads that do not conform to this assumption, GFS will not perform as well. Also, although the authors addressed this with a short-term solution, small chunksizes can be a bottleneck to the system if many clients are trying to access the same chunk at once. Replication was offered as a short term solution, but the paper did not mention an algorithm to determine what level of replication is necessary for a given chunk at any given time. |