This paper introduces Dynamo, which is Amazon's highly available key-value store system. The primary advantage of Dynamo is that it allows users to tune (N,R,W) parameters for consistency requirements. The paper first gives background of system assumptions / requirements, and design considerations. Understanding Amazon's assumptions is important, for example, since it's assumed that Dynamo is used by Amazon's internal services, no security issues are present, so the system can avoid dealing with authorization issues. Then, the paper introduces backgrounds of other systems and how Dynamo could be different or similar with them. Aspects discussed include, Dynamo work for application with "always writeable" need, Dynamo is built for infra within single admin domain, Dynamo don't require namespace support, and it's built for latency sensitive applications. An detailed introduction to the system architecture is provided, illustrating the get/put interface, partitioning algorithm to maintain work balance, replication for availability, hinted handoff failure handler to ensure reads and writes don't fail due to temporary node or network failures, Merkle tree is used to detect inconsistencies between replicas, how storage nodes are added/removed. This is a thorough discussion on the system architecture of Dynamo. The next part is the part I like best in this paper, where the authors gives their ideas on design lessons. First, different reconciliation logics are interesting to me since they actually take different use case into consideration; the discussion on choosing (N,R,W) has also been done in one of our lectures and reading this again is easier; another part I like is the detailed introduction to different load balancing algorithms, the paper includes strategy description and figures and illustrates the ideas clearly. One particular thing about this paper is it introduces related works very early in the paper (most other papers read put the section at the end). I think this paper's approach is good since this section provides a picture of what others are doing and what Dynamo is trying to achieve when reading the remaining contents. |
Amazon serves tens of millions of customers at peak time using tens of thousands of servers around the world. In order to avoid bad consequences caused by outage, reliability is one of the most important requirements on Amazon’s platform. Meanwhile, scalability is also important to support continuously growing data. This paper presents Dynamo, which is a highly available key-value storage system that achieves high availability and scalability. Some of the strengths and contributions of this paper are: 1. Dynamo is highly scalable. The service automatically allocates more storage when you store more data through write APIs. 2. Dynamo is Flexible. It doesn’t rely on fixed schema. Each data items may have different number of attributes. 3. Dynamo has built-in fault tolerance, which automatically and synchronously replicating data across multiple available zones to avoid individual machine failure. Some of the drawbacks of this paper are: 1. Dynamo can only be deployed on AWS platform. 2. Dynamo doesn’t guarantee the ACID property. It provides weaker consistency and no isolation guarantees. 3. Dynamo is unable to achieve complex queries like queries with multiple predicates. 4. Dynamo indexing is limited. For instance, changing or adding key on-the-fly cannot be achieved without creating a new table. Secondary indexing is also not supported in Dynamo. |
Services provided at Amazon require extreme need of persistent availability, as the slightest outage will cause significant consequences and impact customer trust. In other words, failure handling needs to be treated as the normal case without impacting availability or performance. Therefore, this paper proposed the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. Dynamo is a decentralized system with minimal need for manual administration, in which data is partitioned and replicated using consistent hashing, and consistency is facilitated by object versioning. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. The main contributions of this paper are as follows: 1. This paper proved that evaluation of how different techniques can be combined to provide a single highly-available system. 2. This paper demonstrated that an eventually-consistent storage system can be used in production with demanding applications. 3. This paper provided insight into the tuning of these techniques to meet the requirements of production systems with very strict performance demands. The advantages of the proposed model Dynamo are as follows: 1. Dynamo provides incremental scalability by using consistent hashing to deal with partitioning. 2. Dynamo decouples version size from update rates by vector clocks with reconciliation during reads in order to achieve high availability for writes. 3. Dynamo provides high availability and durability guarantee when some of the replicas are not available by sloppy Quorum and hinted handoff. 4. Dynamo synchronizes divergent replicas in the background by anti-entropy using Merkle trees. 5. Dynamo preserves symmetry and avoids having a centralized registry for storing membership and node liveness information by gossip-based membership protocol and failure detection. However, a big limitation of DynamoDB is the lack of multiple indices, which is a common problem in non-relational databases. Scans are expensive in DynamoDB, which is great for lookups by key, not so good for queries, and abysmal for queries with multiple predicates. Besides, although DynamoDB supports transactions, it is not in the traditional SQL sense. Each write operation is atomic to an item. A write operation either successfully updates all of the item's attributes or none of its attributes. There are no multi-operation transactions. |
Problem & motivations: Amazon expands its business rapidly and most of its business requires a high availability. Therefore, Amazon requires a database model (more precisely, NoSQL database) that can be scaled out automatically, high availability (in other words, high fault-tolerant) and low latency (simply because you do not want the customer to wait for a long time). Main Contribution: The engineers of Amazons propose Dynamo. Dynamo did not make new techniques or structures. But it utilizes many advanced techniques and enhances the performance and the fault-tolerance. The high architecture of the whole database is several groups of nodes. Each group holds a part of data. To distribute the data balanced, it adopts a technique called consistent hashing. Basically, each group is responsible for data within a range of hash values. Note that there are two key points, one is that the feature used to divide the data is hash values rather than the original key; the other is that each group is responsible for a certain range. It offers an advantage on scaling, if a new server/group is added, it only needs to communicate with the adjacent groups. To decrease the latency, it adopts vector-clock and quorums. The whole idea is that once the primary receiving a write request, it only updates the write quorum’s (W) nodes. When conducting a read operation, it needs to get replies from reading quorum’s (R) nodes. The only constraint here is that R should be intersected with W non-trivially. In this way, we can make sure at least one updated node is within R. Hence, we can use the vector clock to determine the latest version. Drawbacks: If other paper is a lecture with new content, then this paper will be a review slide. In other words, it does not propose a technique, but rather, aggregating the advanced techniques together. |
The purpose of this paper was to overview the design and implementation of Amazon’s Dynamo, a highly available and scalable data storage system that helps allow for Amazon’s “alway’s on” experience. Dynamo maintains consistency in the face of failure scenarios like data centers going down and allows for reads and writes even during these failures due to data being available across multiple data centers. The research contribution of the paper is that an eventually consistent system can be used in a production system with very strict demands and how the system must be tuned to meet those requirements. Dynamo enables things like Amazon’s e-commerce platform, which is a system of smaller services, some of which are stateless and some of which are stateful, that work together to deliver all of the features present on this platform. Dynamo is built on a few assumptions and requirements. It has a query model where each data item is identified by a key and no service operation spans more than one data item. It also requires that transactions adhere to the ACID properties, although it may sacrifice consistency for higher availability. Dynamo also does not support concurrent key updates. They also wish to guarantee an efficient system running on commodity hardware, which has some performance, cost efficiency, availability, and durability tradeoffs. The paper also introduces the idea of Service Level Agreements (SLAs), which are contracts that guarantee the response latency for services given a certain API request from a client. Next we look into design considerations such as the tradeoff between high availability of data, which usually means having many replicas, and consistency. The paper also considers when to handle update conflicts on this data, whether that be during reads or writes and whether the data store or the application handles this conflict resolution. It also briefly mentions other key principles such as being able to scale in the number of nodes without affecting the overall system and having symmetry of nodes, so as to simplify system maintenance. As far as the system architecture of this production storage system, the paper focuses on a subset of the challenges that were addressed, since there were a lot of details to consider and Amazon does not wish to disclose sensitive information. Namely the paper looks into partitioning, replication, versioning, membership, failure handling, and scaling. The techniques used to address these challenges were as follows. Consistent hashing was used in partitioning so that the system was scalable. Vector clocks are used to determine causal relationships between versions of an object so that the size of a version does not affect update speed. They also used Sloppy Quorum and hinted handoff, which is a method of reading and writing on the first N healthy nodes that ensures that read and write operations are not failed due to a node’s unavailability. Dynamo also uses Merkle trees to make sure replicas are up to date in the background and perform synchronization. Finally the system uses gossip based membership protocol and failure detection to see which nodes have been added or have become unavailable, which preserves symmetry of nodes and eliminates the need for a centralized method of storing this information. I appreciated how much detail this paper went into and found the visuals very helpful along the way to summarize long sections or to simply illustrate concepts. The images did a good job of that in this paper. I would critique how the author seemed to be slow to introduce the paper. The first few sections repeated themselves quite a bit, I felt, although this is a minor critique. |
DynamoDB This paper proposes DynamoDB which is a highly available key-value storage system that used in Amazon. It is designed for big multi-server system like amazon which many failures occur. In this design, DynamoDB sacrifices some consensus in failure scenario. In an overall situation, some failure will not be affected in its performance and availability. DynamoDB uses some well-known techs to achieve scalability and extensibility: it provides consensus by object versioning. The consistency among replicas during updates is maintained by a quorum technique and a decentralized replica synchronization protocol and it is a completely decentralized system with minimal need for manual administration. There are some hypothesis in designing DynamoDB. 1: query model is a simple key-value pair model which is simple and easy as most of the services in Amazon use this query model. 2: It provides ACID. 3: since the system are running on commodity level infrastructure, so that the platform has a strict requirement on latency. And DynamoDB needs to have a trade off between latency and throughput. 4: as an internal usage, all the operations are non-hostile. DynamoDB adopts optimistic data replication, which means that when one of the replicas receive the operations request, it first executes that request and send the message back to the client and then let all other replica to reach a consensus. But the difficult of this approach is that is will bring collision. DynamoDB is designed as an eventually consistent storage which every operation will finally get all replicas. Compared with other decentralized storage system, DynamoDB has different requirements: 1 DynamoDB is mainly dealing with the applications that need "always writeable" data store which is key point for amazon applications. 2: all the replicas are guaranteed to be trusted without any byzantine behaviors. 3: the applications that uses DynamoDB will not require support for hierarchical namespaces and complex relational schema. 4: there is a requirement over probability which 99.9% of operations should be finished in a fixed amount of time(required latency). The architecture of DynamoDB is complicated, the system needs to handle multiple cases such as loading balancing, failure detection and recovery, synchronization, overload handling, state transfer, etc. DynamoDB put a lot of effort in handling failures by using hinted handoff, ring membership, etc. The main contribution of this paper: 1 the storage system that is the combination of recent year's high tech which is successful. 2: provide resolutions and experiences of advanced storage system. The advantages of DynamoDB is 1: easily designed which leverages on mature techs in P2P. 2:Isolation of distributed logic and single storage server. 3:successfully resolve and provide the support in Amazon. The disadvantage of DynamoDB is 1: the complexity of system's extensibility is relatively high. 2: multi-write will cause non-consensus in order. 3 the storage is not in order which can not be used in MapReduce. |
In this paper, the author mainly introduced us a key value store solution Dynamo. This paper describes a solution for building a highly available Key-Value store. The system is to provide high availability for write requests. TThe size of the stored objects is generally less than 1MB. The implementation is similar to Chord + MVRs (multi-valued registers), but there are quite a few performanceoptimization. Finally, Dynamo provides observable causal consistency, which is the highest among its kind. he main requirements of Dynamo are as follows: an "always writable" data store where no updates are rejected, all nodes are assumed to be trusted, do not require support for hierarchical namespaces, In order to satisfy the high availability of write requests, Dynamo does not handle conflicts when writing. Instead, it saves multiple results at the same time through vector versioning. If it can determine the order relationship through vector version at the time of reading, it merges and writes the merged result back to the source. Otherwise, you can only choose simple LWW (last write wins) or be processed by the customer. Unlike normal Key-Value storage, Dynamo's get/put requests require an additional context, which is opaque to the user. From the paper, this context includes at least two contents: vector version and the response time of the node in the request. Dynamo's replication strategy mixes both proactive and passive strategies. Its proactive strategy is a "sloppy quorum" strategy that requires pre-configured N/R/W values and satisfies R + W > N. First, locate the location of get/put according to chord's method, and then count N healthy physical nodes from this position. When get or put, you need to initiate a request to all N nodes, but only need R or W nodes to successfully return results to the client. The passive strategy is such that each node builds the interval it is responsible for into a Merkle tree and propagates it through the gossip protocol. According to the proactive replica strategy, there are N nodes responsible for the same interval. These nodes can quickly compare and synchronize the differences through the Merkle tree. When a node fails, the value of this node A is written to the other node B according to the proactive replication policy above. At this point B will store this value (persistent) in a special list and record the hinted handoff tag to see that the value should be transferred to node A. This record will pass the value to A after B's background task detects A recovery. The main contribution of this work is proposed a high available and high consistency key value storage. It also provides insight on how to meet the high performance requirements of applications. What's more, though most techniques are not novel, but the Dynamo showed us a way to combine the existing techniques to make a difference. One of the limitation that I noticed is that the gossip based ptotocol that used in the Dynamo may limit the performance and the scalability of the system. Since gossip is a node 2 node operation and they need to convey a lot of information. |
DynamoDB is a key-value store created by Amazon, which uses it for production workloads. The basic idea behind a key-value store is that it is used for simple workloads in which only primary key access to single objects is needed; for those workloads, the complexity and overhead of a relational database is unnecessary. These workloads differ from those supported by BigTable because they require only point lookups. Dynamo is distributed, and uses consistent hashing to partition and replicate data. In order to increase availability, Dynamo is an eventually consistent system, meaning that updates should eventually reach all replicas, but without a specific guarantee of when. Dynamo supports multiple versions of objects, which could split over time and need to be reconciled due to eventual consistency. Users specify a certain version of an object to update. A quorum-like approach is used to maintain consistency among replicas. All-in-all, this system is good for certain applications that can handle the trade-offs that need to be made when strong consistency is not available. While I would hesitate to call it a strength or a weakness, I thought that the decision to focus on P99.9 latency and various SLAs was an interesting departure from what we have read so far this semester. It was more similar to the concerns that I have heard about in industrial settings, which makes sense as this paper is coming from Amazon, not an academic research lab. I am curious if decisions like this in industry papers have changed the way that academic papers have presented results in the past few years to better match what users in industry care about. Although the authors considered it a strength, I wondered if all of the tuning parameters (N, R and W), which control the balance of durability, consistency, and availability are a good idea. We have constantly talked about how tuning parameters are not ideal. I think that if Dynamo proposed certain sets of parameters for certain requirements, this could work, but a free-for-all is not great. While it isn’t mentioned as a tuning parameter, I noted that many storage engines are also supported, leading to another decision for the user to make. Additionally, I have some concern about the theoretical idea of a key-value store. The way that people want to access their data often changes over time. I would imagine that with such a limited API, it is likely that engineers will want some other way of accessing the data in the future. Perhaps there are other ways to access the data in offline jobs, etc., I just wonder if this really works well for many workloads in the long run. |
This paper introduces Dynamo, a highly available key-value storage system from Amazon. The design of Dynamo is based on the following observations of Amazon’s workload. First, many services on Amazon’s platform only need primary-key access to a data store. The complex querying and management functionality offered by a traditional RDBMS is not required. Second, since Dynamo will be used by many different services, it should be configurable so that it consistently achieve a service’s latency and throughput requirement. Third, applications using Dynamo need an “always writeable” data store, which means write operation will not be rejected even when there’s system failure or other concurrent writes. To meet these goals, Dynamo combined lots of different techniques and make tradeoffs between consistency and availability. To ensure high availability, Dynamo gives up strict consistency and turns to eventual consistency. However, now it needs to do conflict resolution in case of inconsistent versions of data. Dynamo makes two design choices to solve this problem. First, the complex conflict resolution logic is pushed to read operation so that “always writeable” can be achieved. Second, the resolution logic is provided by the application using Dynamo since it’s aware of the data schema and can provide a better end-user experience. The whole system is structured in a symmetric fashion, where each node in the system have the same set of responsibilities as its peers and there’s no centralized control (contrast to the master-worker structure). The way how Dynamo partitions data is also interesting. It uses a modified version of consistent hashing (introducing virtual node) to make sure adding or removing a node only affect a small number of nodes and workload is balanced even when there’s heterogeneity in the cluster. Availability and durability are achieved by replicating data on N-1 successor nodes defined by the consistent-hashing algorithm. To ensure the system can tolerate inconsistencies, the value corresponding to a key can have multiple versions. Each version has a context which is a vector of machine ID and logical time pair. Since this vector clock only has partial ordering and results in a tree structure among different version, applications need to provide a resolution logic when multiple versions of the values are returned from the read request. Dynamo also uses other techniques such as hinted handoff, replica synchronization etc. to improve the performance when handling node failures. In my opinion, though the interface of Dynamo (get() and put()) is quite simple. The requirement that the application needs to provide a conflict resolution logic prevents the system from being applied to a more general setting. |
In the paper "Dynamo: Amazon’s Highly Available Key-value Store", Giuseppe DeCandia and co. discuss Dynamo, a highly available key-storage system that Amazon uses to provide an "always-on" experience. "Always-on" alludes to reliability at a massive scale - 99.99% availability. Amazon, a company that places customer trust above everything else, values the availability of their services at the cost of consistency. Even if one service has the slightest outage, the financial consequences are enormous. However, Dynamo always assumes that there is a small, but significant number of servers failing at any given time. To prevent this, much of the data should be present across multiple disks. Since most of Amazon's services only need primary-key storage, using a standard relational table would not work well. Thus, Dynamo provides a simple primary key-only interface and uses object versioning and application-assisted conflict resolution to create a novel interface for developers. Dynamo serves as proof that an eventually consistent storage system can be used in industry with demanding applications. Since Dynamo is centered around Amazon's requirements, there are some assumptions and requirements that are noted: 1) Query Model: Simple read/writes to data items identified by a key. No operations span multiple data items. Store objects that are relatively small. 2) ACID Properties: Has weaker consistency to increase availability. No isolation guarantees and single key updates. 3) Efficiency: Has stringent latency requirements. 4) Other Assumptions: Security is not a current requirement. Should scale past hundreds of hosts. Furthermore, there are some design considerations that Amazon sees: 1) Data Replication: Current techniques make data unavailable until they know it is correct. Allow changes to propagate in the background. Eventually all storage will be consistent. Data is replicated at N hosts. 2) Dealing With Conflicts / Who deals with it?: Never reject writes. The data store or application can decide if they want to deal with the conflict. 3) Symmetry: Each node should have the same job. No master nodes. 4) Heterogeneity: Should be able to distribute work evenly based on node hardware capabilities. When describing the highlights of Dynamo, it is split into several parts: 1) Partitioning Algorithm: Uses consistent hashing to distribute workload among nodes. Consistent hashing has an output that can be visualized as a ring. Each node is responsible for a certain space within the ring. The advantage of consistent hashing is that departure or arrival of a node only affects its immediate neighbors; other nodes remain unaffected. Virtual nodes are used to address heterogeneity. 2) High availability for writes: One important thing to consider is that all operations applied to current or past versions should be preserved. We merge all these versions using a collapse operation. Thus, different locations may have different versions of data. Vector clocks are used to handle version control. 3) Handling Temporary failures: Uses "sloppy quorum"; all reads and writes are performed on first N healthy nodes. Storage nodes are spread across many data centers. 4) Failure recovery: Employs anti-entropy using Merkle trees. This synchronizes divergent replicas in the background. 5) Membership/failure detection: Can consider failure if nodes cease communication. When a particular node discovers that its neighbor is not functioning, it attempts to take over its work and periodically checks if it comes back alive. Even with all the benefits that Dynamo has given Amazon, there are still many drawbacks. The first one that struck me was the lack of an experimental section that tested Amazon's approach against other existing methods (such as RDBMS). Since Amazon is targeting the availability of their services, this would be a good metric to test against other existing methods. Another drawback that I noticed is when using DynamoDB, one is unable to do joins or complex queries. Unless the user has a moderate amount of control over the table, it probably isn't a good idea to use Dynamo. Lastly, as a NoSQL solution, it seems that Dynamo puts both size and index limits for each table. As a result, one needs to consider these limitations and come up with a relevant model if they want to use Dynamo. |
This paper describes Dynamo, Amazon’s distributed key-value store. The paper mostly deals with how Dynamo deals with the issues of being highly distributed and handling large numbers of client requests. Dynamo needs to scale well and to be able to handle requests with low latency. Whenever a client connects to a server, they form a Service Level Agreement, which determines how fast response times will have to be. Dynamo requires that 99.9% of the requests served will be within 300 milliseconds. Since clients are often performing important writes, Dynamo is designed to never reject writes, which can make reads more complicated. Dynamo is simple key-value store, so each dataset only needs to implement put() and get() functions. Since so many servers are active, data replicas are needed in order to keep data available. Consistent hashing of data replicas is necessary in order to distribute load fairly across servers. In order to improve performance, replicas are not always kept immediately consistent. Different requests can make different changes to replicas of the same data on different servers. As such, every data object contains a vector clock, which describes its version number on every server that has modified it. When a request looks at data from several servers, it can use the vector clocks to determine if there are any conflicts between replicas, and resolve those conflicts on the client side, while sending the resolved data back to the servers. Vector clocks can grow as more servers modify the same data, so Dynamo will remove parts of the vector clock if they haven’t been updated recently. In order to keep some level of consistency, Dynamo has a variant of quorum based replication. Read and write quorums aren’t based off of the total number of servers, however, since servers often fail. Instead, they’re based off of the N healthiest nodes in the system. Often, N is only 3, regardless of the total number of servers. If a server fails when a client wants to update that server’s replica, then the client creates a hinted replica on another server. This replica has a “hint” that points to the original server, so that it can be copied over whenever the original server recovers. The major contribution of this paper is the use of eventual consistency to improve performance. Since a client will always resolve inconsistent, this doesn’t hurt the client experience very much, and it can drastically speed up reads and writes, since very few servers have to be accessed. For an online customer system like Amazon, response time is one of the most important qualities. In addition, the system is very accommodating of server failures, and can balance server loads effectively. On the downside, losing guaranteed consistency, while useful for this application, may not be effective for other applications. Inconsistent updates can also make larger scale recovery much more difficult, since recovery has to take inconsistent versions into account. |
Amazon built dynamo to fill their need for a system with the following characteristics: partitioning, high availability for writes, temporary failure handling, permanent failure recovery, membership management, and failure detection. The system uses consistent hashing among virtual nodes(to support uniform load distribution) to partition the key values. The data is replicated among N nodes in a preference list. Objects are versioned and conflicts are resolved by the user. Causality between different versions captured using vector clocks. Sloppy quorum used to write/read from first W/R nodes where W+R > N. Hinted handoff is used to ensure good distribution in the presence of transient failures. Permanent failures are handled using Merkle trees. Anti entropy gossip based schemes used to announce addition/removal of nodes in the system. A cache is added to balance performance vs durability. Three strategies are discussed to ensure uniform load distribution in the presence of few popular keys. Contributions: (1) Integration a slew of techniques such as consistent hashing, replication, merkle trees, anti entropy algorithms, sloppy quorum, object versioning in a production environment (2) A partition aware client library to route requests to the coordinator directly (3) Has shown techniques that can Scales to Amazon’s environment Weak points: (1) Complex distributed coordination/control. First they have a hash function to assign load to all the nodes; then they came up with the concept of virtual nodes which balance the load distribution; and then they worries about the joining and leaving of nodes accidentally or not; and then they realize there are still different assigning strategies which result in different performance in balancing. And all these are just for balancing. |
This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use. To achieve always on experience, Dynamo sacrifices consistency under certain failure scenarios. Also, it makes use of object versioning and application-assisted conflict resolution to provide a novel interface for developers to use. Since Amazon runs e-commerce platform that serves millions of customers using tens of thousands of servers located in datacenters around world, there are strict operational requirements on Dynamo in terms of performance, reliability and efficiency. To meet the reliability and scalability, Amazon has developed several storage technologies. Dynamo also uses a synthesis that data is partitioned and replicated using consistent hashing, and consistency is facilitated by object versioning. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. To provide good experiences for all customers, SLAs are expressed and measured at the 99.9th percentile of the distribution. Storage system plays an important role in establishing a service’s SLA. Dynamo increases availability by using optimistic replication techniques, where changes can propagate to replicas in the background. Many traditional data stores execute conflict resolution during writes and keep the read complexity simple. However, Amazon provides a service that does not reject customers’ write requests. Therefore, push the complexity of conflict resolution to the reads in order to ensure that writes are never rejected. In terms of who performs the process of conflict resolution, developers choose to push it down to the data store, which chooses a simple policy such as last write wins. To sum up, Dynamo targets applications that require only key/value access with primary focus on high availability where updates are not rejected even in the wake of network partitions or server failures. The paper then explains the system architecture of Dynamo. Dynamo uses consistent hashing for partitioning for incremental scalability, vector clocks with reconciliation during reads to realize high availability for writes, sloppy Quorum and hinted handoff for handling temporary failures, anti-entropy using Merkle trees for recovering from permanent failures, and Gossip-based membership protocol and failure detection for membership and failure detection. For partitioning, if a node becomes unavailable, the load handled by this node is evenly dispersed across the remaining available nodes. When a node becomes available again, or a new node is added to the system, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes. The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure. To achieve high availability and durability, Dynamo replicates its data on multiple hosts. Each data item is replicated at N hosts, in addition to locally storing each key within its range, the coordinator replicates these keys at the N-1 clockwise successor nodes in the ring. Furthermore, Dynamo provides eventual consistency, which allows for updates to be propagated to all replicas asynchronously. Dynamo uses vector clocks to capture the differences between versions and reconcile the differences. To maintain consistency among its replicas, Dynamo uses a consistency protocol like quorum systems. This protocol has R and W. R is the minimum number of nodes that must participate in a successful read operation. W is the minimum number of nodes that must participate in a successful write operation. Setting R and W such that R + W > N. The paper then introduces how Dynamo handles failures and implementation details. I think this is a good paper because it points out the design principles of Dynamo first so that it explains why Dynamo adopts the features described in paper. Also, background session is very thorough that helps understanding of the paper. |
“Dynamo: Amazon’s Highly Available Key-value Store” by DeCandia et al. describes Dynamo, an eventually consistent distributed data store built at Amazon and used for many of its internal services. Dynamo provides high availability and low latency, which of course is a trade off for data consistency (i.e., eventual consistency rather than strong consistency). The reason for this is that Amazon wanted an “always writeable” data store, so that client “writes” cannot be rejected. Amazon found that many of their internal services mostly query for data just by their primary key; as a result they have developed Dynamo to be a key-value store, to avoid the overhead and complexities of a relational DBMS. To partition its data, Dynamo uses consistent hashing, where data is essentially hashed around a ring, and a node is responsible for hash values between it in its prior node; this enables incremental scalability because if a single node is added or removed, this only affects its immediate neighbors. Data is replicated across a number N of nodes in the system (often the following N nodes clockwise, or other nodes on the preference list if any of the first N have failed). To support high availability for data writes, Dynamo requires that only W of N (rather than all N) nodes acknowledge a write operation for it to be considered committed. Dynamo uses vector clocks to keep track of data object updates and reconcile differences during read operations. Dynamo also leverages sloppy quorum and hinted handoff to ensure that reads and writes can occur (on lower nodes on the preference list) even when not all of the first N nodes on a preference list are healthy. The paper also discusses how Dynamo recovers from permanent failures using anti-entropy and Merkle trees. The final technique discussed is gossip-based membership protocol and failure detection. The paper presents results of how Dynamo fares in real-world workloads at Amazon, in particular with the following reconciliation and quorum characteristics: business logic specific reconciliation, timestamp based reconciliation, and high performance read engine. The authors also discuss 3 different key partitioning and placement strategies they have explored. It seems very promising to see that Dynamo was tested in real-world workloads at Amazon and that the authors considered multiple different reconciliation, quorum, and partitioning strategies. I found the paper to be long and somewhat dense (need to read closely to understand the mechanics of how Dynamo works). Although all the details are helpful for researchers or practitioners trying to replicate the results or build on them, I think the paper could have been more concise. |
This paper represents Dynamo, which is a highly available key-value database behinds the world's largest e-commerce operations in the world. This paper gives the design and implementation detail of the Dynamo system. The paper first gives the system assumptions and requirements. They design the system to be a key-value storage instead of a traditional relational database system. And they give the reason for using weaker consistency property to improve the availability which is usually low when meeting ACID transaction properties. Then the paper gives its partitioning algorithm which is the key design for the Dynamo to scale incrementally. Also, like other large-scale database systems, Dynamo adapts replication to improve the availability and durability of the system. To provide eventual consistency, Dynamo uses data versioning, treating the result of each modification as a new and immutable version of the data. At last, the paper gives the implementation detail and uses experiments to show its performance. There are several strong points of the paper. First, the paper is well-written. Many concepts are explained in detail and readers are easy to catch up. The figures in the paper help me to understand the system, algorithm and other techniques introduced in the paper. Second, the idea of sacrificing consistency to exchange for high availability. Although it is a traditional idea to sacrifice one property for another, I still think it is a strong point to implement the system in the industry and in such a large scale. Also, they provide eventual consistency. The weak point of the system is also clear in my point of view. I didn't see much novelty in this system. Key-value is not novel, neither is replication, data versioning. Although I still appreciate Amazon's effort to implement such a large-scale system, the novelty of the paper is not that much to me. |
In this paper, the authors from Amazon proposed high available key-value storage called Dynamo. As one of the world’s largest online shopping platform, make a 100% available is definitely the most important thing for their service. In this problem is to build a system with a large number of scalable services with high available under the presence of component failures. This question is significant because DBMS at scale is always subject to failures, it is very important to develop some new system to achieve high reliability and availability especially for online transaction workloads. This may lead some sacrifice in consistency, but it is also very interesting to explore this field and learn how to make a trade-off between them. Actually, DynamoDB sacrifices some consistency in trading higher availability. The idea of Dynamo is really cool, it uses an eventually consistent model with high performance. It is able to make sure 99% percent of their workload can be served within 300ms. Next, I will summarize the crux of Dynamo from my point of view. The DynamoDB is implemented as a distributed storage system with consistent hashing to partitioned data pairs, every record is replicated into N-1 nodes in the hash ring. In DynamoDB, there is no operation spans multiple schemas and do not need a relational schema. Since the Dynamo mainly focuses on high availability, there is no need to talk about ACID properties for this particular database. Also, as they said in their paper, the authentication is unnecessary so that security issues can be ignored. These make the design of Dynamo much easier. There is no primary consistent version of data guaranteed, however, Dynamo is only working for data to be eventually consistent. Eventual consistency is a novel level of consistency which doesn’t provide any safety guarantee on the data record. The principle is to allow different versions to exist simultaneously in different partitions, write operation writes to W virtual nodes and read operation read from R virtual nodes. Since the R+W is larger than N, by the vector clock, it forms a quorum-like condition for determining the final version by causality. Generally speaking, the crux of Dynamo includes a partitional algorithm, replication mechanism, data versioning, hinted handoff and etc. The experiments illustrate a great response inconsistent value and highly visible and the latency is also low, Dynamo is definitely a practical system. The main technical contribution of this paper is the introduction of Dynamo which achieves great availability by trading off other factors, from experiments we find that Dynamo is doing promising. There are several advantages of Dynamo, first of all, its architecture is highly decentralized and user-friendly, it provides automatic partitioning and redistribution without supervising. Second, this design is optimized for Amazon’s demand and motivated by its real application, it gives a good explanation of research in industry. Third, unlike traditional DBMS, Dynamo resolves conflicts in reads instead of writes which greatly improve the user experience for Amazon’s product, this idea is very common for online transaction platforms and it does work well. Besides, the design of Dynamo is extensible, flexible and configurable, which make it easy to deploy in a different scenario from an engineering perspective. Generally speaking, this is a nice paper with great insight. I find some minor drawbacks of this paper. First, as for causality decision, it will always need people to specify the reconciliation method to keep the multi-version consistency, extra-human work is required. It will be preferable if this job can be done automatically. Second, as they mentioned in their paper, the Dynamo DB can only support one record at a time accessing, there are no range queries supported in Dynamo DB. Dynamo utilized an optimistic replication approach, extra work needs to be done to resolve potential conflicts. Last, Dynamo is not suitable for latency-sensitive and high consistency required applications, some important things like making payments should be carefully considered. |
This paper introduces Dynamo, which is a distributed key-value store developed by Amazon. Dynamo is motivated by two main observations: 1) Many of Amazon’s services are simply primary-key-only services. Typical relational DBMS’s are overkill for these as they offer many features that are unnecessary for these services, and additionally are much more difficult to implement as distributed systems where component failure is extremely likely (which is the case in Amazon’s environment). 2) Amazon requires a much higher focus on availability than traditional relational DBMS’s, which tend to put a higher premium on consistency instead. Low latency for at least 99.9% of reads/writes is desired. Dynamo meets Amazon’s goals by using an entirely decentralized architecture where each node knows enough information so that no multiple-hop routing is required. It partitions data by using consistent hashing. Eventual consistency is the model used in order to maximize availability, and multi versioning (using vector clocks) is used to help deal with inconsistencies that may arise in the short term due to this. Failure detection is more complicated than normal and uses what’s called a “sloppy quorum” instead of regular quorums—what this means is that instead of following the consistency hashing ring normally to get N nodes, we instead use the preference list to get N healthy nodes for a quorum. By doing this, Dynamo prevents network partitions / etc from impacting availability in a major way. Dynamo’s main strength is similar to GFS’s main strength in that it is a system designed to perform well under a specific environment with specific performance metrics (in Dynamo’s case, high availability (low latency) with writes prioritized in a decentralized environment with many nodes where failures are very common), and it meets these design requirements. The main disadvantages of Dynamo are the environment/workload assumptions made. For example, the fact that conflict resolution is pushed more during reads than writes mean that more read-heavy workloads are less suited for Dynamo. Also, Dynamo is a key-value store that rejects many non-primary-key-only features that typical RDBMS’s provide, and the eventual consistency model may not fit the requirements of some applications. Also, Dynamo is built assuming a secure environment and thus is not idea for applications that need to be robust to Byzantine failures. |