Review for Paper: 38-Relational Cloud: a Database Service for the cloud

Review 1

This paper introduces a new transactional database-as-a-service system called Relational Cloud. It describes the challenges and requirements of a large-scale and multi-node DBaaS, and presents design and implementation of Relational Cloud.

The challenges for the system include:
(1) efficient multi-tenancy to minimize hardware footprint for given workload
(2) elastic scale-out to handle growing workloads
(3) database privacy

The technical features of the system include:
(1) workload-aware approach to multi-tenancy that identifies workloads to be co-located on a database server
(2) use of graph-based data partitioning algorithm to achieve near-linear scale-out
(3) adjustable security scheme that enables SQL queries on encrypted data

The technique that's most interesting in the paper is the workload-aware partitioning strategy. Previously, many paper have introduced their distributed partitioning schemes, and many of them use hashing on data to do the partition. This paper introduces a very different idea, with the strength of providing independence from schema layout and foreign key information. The experiment result shows that the throughput increases perfectly-linearly when the number of servers is increased (from one to eight servers).

However, I was hoping to see scalability tests on more nodes. The paper mentions that one challenge with the partitioning technique is that it "only scales to a few tens of millions of nodes". I think that is already a large enough number of nodes to work with (in a lecture professor mentioned using ~6000 nodes is already very rare in huge company as Google), so I'm wondering what scalability level should a system designer aim for. Also, it would be better if scalability experiment could go beyond 8 nodes (I think ~100 nodes would be better).


Review 2

Traditional RDBMS left some operational burden like provisioning, configuration, scaling and privacy to end database users. “database-as-a-service” (DBaaS) systems are designed in order to move those operations and reduce user’s cost. Previously, Amazon RDS and Microsoft SQL Azure provides DBaaS service. However, those systems are weak in some important requirements and challenges. The paper proposes Relational Cloud to solve those challenges.

Some of the strengths and contributions of this paper are:
1. DBaaS can save hardware and energy cost by sharing service in master’s machine to multiple users.
2. Relational Cloud provides workload-aware approach to multi-tenancy that identifies the workloads that can be co-located on a database server, which achieves higher consolidation and better performance than previous systems.
3. Relational Cloud uses graph-based data partitioning algorithm to achieve near linear elastic scalability
4. Relational Cloud allows SQL query to run on encrypted data with operations including aggregates and joins.

Some of the drawbacks of this paper are:
1. The paper didn’t review previous work and systems. It’s not clear what is the difference in technical approach between Relational Cloud and previous systems like Amazon RDS and Microsoft SQL Azure.
2. The CryptDB introduces additional latency on both clients and servers.



Review 3

Moving DBMS to the cloud helps reduce the costs of hardware, licensing and administrative. In other words, by centralizing and automating many database management tasks, a DBaaS can substantially reduce operational costs and perform well. However, efficient multi-tenancy, elastic provider and database privacy remain the main challenges for designing DBMS on cloud. Therefore, this paper designed and implemented Relational Cloud.

In order to solve the three aforementioned challenges, Relational Cloud has three key features. First, to achieve efficient multi-tenancy, it is necessary to consolidate databases into smallest number of servers, balancing load and without affecting performance. Relational Cloud solved this problem by proposing single database server per machine with logical databases as opposed to DB-in-VM approach. By combining non-linear optimization formulation and cost model, Relational Cloud periodically determines which database should be placed on which machine.

Second, to achieve elastic scalability, it is necessary to partition the database into N chunks in a way that maximizes the workload performance, and the solution proposed by Relational Cloud is the graph-based data partitioning algorithm. Besides, this paper also proposed Kairos, a monitoring and consolidation engine which can solve the resource allocation problem.

Finally, to guarantee database privacy, Relational Cloud used CryptDB which can achieve adjustable security by encrypting each value of each row into an onion with different privacy levels, and having the server perform queries on the encrypted data.

The main contribution of this paper is it presented a scalable relational DBaaS for cloud computing environments, which successfully overcomes three major challenges: efficient multi-tenancy, elastic provider and database privacy, which serves as a guidance for further cloud DBaas development.

The main advantages of Relational Cloud are as follows:
1. Unmodified DB backends
2. Workload-aware consolidation
3. Workload-aware sharding
4. High availability via replication of front-end servers
5. SQL over encrypted data

The main disadvantage of this paper is it did not provide enough availability guarantees of the cloud service, and availability is crucial in real-life application. There have been cases that services built on top of Amazon AWS failed completely while AWS was down, which caused tremendous loss for the enterprises.



Review 4

Problem & Motivation
Relational database management systems are an integral and indispensable component in most computing environments and with the advent of hosted cloud computing and storage, the database-as-a-service is attractive for two reasons. The first is due to economies, the costs are lower when user pays for a share of a service. The second is it produces the fewer cost if the user want to scale it and can fully utilize the resources.

Contributions:
The authors of the paper first propose the principle in designing the database-as-a-service (DBaaS).
 Efficient Multi-tenancy: Many users choose to use the database-as-a-service for economic reasons. Therefore, the developers of DBaaS should reduce the cost of the system and efficiently use the resources.
 Elastic Scalability: if the workload exceeds single machine capacity, the system can scale a single database to multiple nodes easily and handle the query correctly. Also, it should minimize cross-node distributed transactions.
 Privacy: even the administrator shouldn’t have the right to see the user’s data and we should still able to handle the query even guaranteeing the privacy.

Then the authors propose the Relational Cloud system deployed by MIT. The DBMSs on the cloud are communicated with the outside through the JDBC. The way to overcome the efficient multi-tenancy is through the karios which monitor the resource and predict the load and consolidate optimization techniques. For partitioning, the RS adopts graph partitioning by predicting the possible weight for each edges and applying the minimum cut algorithm. To achieve the privacy, the cloud adopts two onion layers.

Drawbacks:
It is interesting to notice that the throughput of the system is worse than the baseline. The authors argue this is the tradeoff for better privacy. Therefore, the system is more suitable for cloud rent services rather than the private/inner cloud services among the organizations.



Review 5

In recent years, technological trends have led companies and organizations to shift increasing amounts of computing and other related systems out of their own direct supervision, and into cloud providers such as Amazon and Microsoft. Databases have taken longer, but are not following this trend, as introduced by the paper. It discusses a new concept called the “database as a service (DBaaS) called Relational Cloud, which essentially means that most of the operational burdens of handling and maintaining databases are shifted away from the users themselves to the service provider. In contrast to other providers like Amazon RDS and Microsoft SQL Azure, this paper argues that there are several key challenges that must be addressed in order to make Relational Cloud a practical and attractive option. They are:

1. Workload aware approach to multi-tenancy that identifies compatible workloads that can be co-located on a database server for greater consolidation and performance. This is in order to efficiently implement multi-tenancy.
2. Use of graph-based data partitioning algorithm to achieve near-linear elastic scale-out capabilities, even for complex workloads.
3. Adjustable security scheme that enables SQL queries to run over encrypted data, since privacy is obviously a major concern.

Relational Cloud is designed to use an unmodified DBMS engine for its back-end query processing and storage nodes, each of which runs a single database server. One or more of these database servers can be loaded up by the tenant of the system (some company, etc.). Applications communicate using a standard connectivity layer such as JDBC, with steps taken to make sure data is kept private. Throughout, the front-end monitors the access patterns of the workloads in order to periodically determine the best way to partition each database, place these partitions on back-end machines to balance load and minimize machines, migrate partitions without downtime, replicate data for availability, and secure data and process queries so they can run on untrusted back ends over encrypted data. A CryptDB-enabled driver allows the client to encrypt and decrypt user data and rewrites queries for privacy. According to the paper, many of these components have been developed, and are currently being integrated into a single coherent system, with plans to demonstrate the system. In addition to this, Relational Cloud uses a workload-aware partitioning strategy where the front-end analyzes sets of tuples that are accessed together within individual transactions, with the trace being represented as a graph. The optimal solution thus corresponds to finding a partitioning of these tuples that minimizes the number of distributed transactions. For monitoring and consolidation, they created an engine which they term Kairos. Kairos is made up of the following:

1. Resource monitor - captures runtime statistics
2. Combined load predictor - a model of CPU, RAM, and disk that allows Kairos to predict combined resource requirements when multiple workloads are combined onto a single server
3. Consolidation engine – nonlinear optimization techniques that are used to place database partitions on back-end nodes to balance nodes and minimize the number of machines used

Finally, as already mentioned, Relational Cloud uses a subsystem called CryptDB that is designed to guarantee the privacy of stored data by encrypting all tuples, while being able to run SQL queries over encrypted data efficiently. To do so as well as allow adjustable security parameters, the authors encrypted each value of each row independently into an onion, which comprised of multiple layers of increasingly stronger encryption.

The main strength of this paper is that it introduces a (almost working) system that is able to directly address some of the shortcomings of the other solutions being provided for the Database as a Service problem. Additionally, the authors go through each design question in detail and address it comprehensively, which helps readers to get a better idea of the issues that this field currently faces. As their performance results show, the system is able to meet many of the requirements that they initially set, such as the nearly linear scale-out efficiency.

One weakness of this paper was a lack of results directly comparing their system to Microsoft SQL Azure or Amazon RDS, which they specifically mentioned, though this is likely due to the authors’ inability to directly access these systems due to cost, etc. Such a performance comparison would have been valuable in seeing just how well Relational Cloud measured up, as well as potentially the tradeoffs these other companies made in terms of privacy, latency, and throughput, to name a few parameters.


Review 6

The purpose of this paper is to introduce MIT’s Relational Cloud, which is a database-as-a-service (DBaaS). The author defines a DBaaS as a service that can lower a database user’s overall cost by moving a lot of computation away from the user and onto the server. It describes Relational Cloud as having three main features. The first is an approach to distributing workloads where tasks are co-located in a way that optimizes performance better than the state of the art. The second is using graph based data partitioning to have linear scalability. The third is a security method that allows SQL queries on encrypted data. DBaaS is significant because of an increasing demand for outsourcing Database management systems as a service, due to the cost-efficiency of scaling up and also the option to pay-per-use. This paper describes the challenges that come with designing a multi-node, large scale DBaaS like Relational Cloud.

The paper details three challenges that come with designing Relational Cloud. The first is efficient multi-tenancy, which is when, given some databases and workloads, considering the best way to distribute them across multiple machines where you minimize the number of machines and also optimize the performance when some of them are co-located on the same machine. The second challenge was elastic scalability, which leans on a workload-aware partitioner that minimizes the number of multi-node transactions and statements. The third and final challenge that motivated the design of Relational Cloud was privacy, which they resolved by encrypting the DBaaS and making queries on the encrypted database, at an acceptable cost to performance.

The design of Relational Cloud is then described as having database servers that can run multiple tenant’s databases and workloads. A tenant is defined as a user of the service and their workloads are all the queries and transactions run on their database(s). The front-end of the DBaaS consults a router that tells it on which nodes each transaction it receives should be evaluated. The access patterns are also monitored by the front end so it can optimize partitioning, perform load balancing and replication, and secure the data and process the transactions so they can run on encrypted data.

I liked how in the results section, this paper defended the loss in throughput due to encryption by proposing this can be made up by the linear scaling factor. I suppose a small thing I did not like about the paper was the current status section where the authors exposed the proposed solution as incomplete as far as implementation status, which might put off readers so early in the paper, although it was honest.



Review 7

Relational Cloud: A Database-as-a-Service for the Cloud

This paper introduces Relational cloud which is a DAAS for the cloud. It is sort of shared database management system that merge different users database system together in the same DBMS and provide safe encryption to each users for their privacy. Therefore, the Relational Cloud could use good load balancing algorithm to maintain good performance for all the users and administrators. That is the reason why many firms tend to cloud database such as relational cloud because they do not want to control the database themselves, they want more experienced people and service to control it with better performance and security.

To sum up, this paper introduces Relational Cloud which is a scalable relational database as a service for cloud computing environments. Relational Cloud overcomes three major challenges: first, efficient multi-tenancy. Second, elastic scalability. Third, database privacy. The client can only provide the minimum decryption capability required for any given query. This article first outlines the relationship cloud and system design. It then moves to design details, including database partitioning, placement and migration, and privacy. It also provides performance evaluations for several benchmarks.

The main technical contributions: first, it is a work-load design, it is designed for multi-tenancy which reduces configuration efforts for both service providers and users. This design will take advantages of expertise management, efficient load balancing, and high performance, which for customer is the low cost and high availability. Second is the design of elastic scalability, it uses a graph based data partition algorithm to achieve near linear elastic scale up even in complex operations. This will enable The Whole system to adapt the changing workload. Third, the design provides safety and privacy by proposing CyrptDB which allows the database to work on the encrypted data with an acceptable performance.

The advantages of this paper are: first, workload awareness, which achieved by monitoring the query patterns and data accesses, the system will obtain these information for optimization and security function that will reducing the configuration effort for users and operators. Second, the design has excellent efficiency in multi-tenancy, elastic scalability and database privacy. Third, overall speaking, DBaaS is a good idea that combines performance improvement and user friendly.

The disadvantages of this paper: first, this paper does not pay much attention to the availability that if one of the server is crashed. Second, the product is in its implementation so that this paper is only in theory. Wo would hope to see the real production and its performance.


Review 8

The Relational Cloud Project is an effort by a group of researchers at MIT to investigate technologies and challenges related to Database-as-a-Service within cloud-computing.

The paper deals with the strategies used to transition databases and storage systems to the unique challenges of the cloud environment. The cloud database is mainly facing three challenges: 1. Elastic Scalability 2. Efficient Multi-Tenancy 3. Privacy issues

Relational Cloud uses the existing RDBMS (currently supports MySQL and Postgres) as a backend node. It only runs one instance. The instance can run many databases. There can be many tables in a database. The load of a database is called workload; Tenants (ie users) can use one or more databases, but one database will not be used by more than two tenants at the same time.
The client uses a standard connector (such as JDBC) to connect to the front-end of the Relational Cloud, front-end and then consult the router. The router analyzes the SQL and decides which nodes participate in the execution and the corresponding execution plan. Then the front-end coordinates the nodes. Transaction processing, also responsible for the processing of the failed node, and it also handles the allocation rate according to each tenant performance priority. The front-end monitor is also a monitor that monitors the workload data processing speed and the overall load of the machine. With these monitoring data, Relational Cloud can make decisions: 1. When the performance of a single machine is not enough, the database is fragmented to other nodes. 2. Optimal placement of fragments to each backend node, no downtime migration of data, high availability replication 3. Encrypted data query

Elastic Scalability
When the load of a tenant is getting higher and the performance of a single machine reaches the bottleneck, then the fragmentation is expanded, and the fragmentation can achieve the effect of load balancing. However, the rule of the extension is a problem; the Relational Cloud uses one. A graph-based segmentation algorithm called workload-aware partitioner is used to slice complex queries into different nodes. The workload-aware partitioner has a good advantage that the schema relationship between the schema and the foreign key does not affect its efficiency. So it is very suitable for many-to-many relationships between social networks.

Efficient Multi-Tenancy
For cloud database manufacturers, it is hoped that one machine will maximize the maximum number of tenants. At the same time, if a tenant runs complex SQL, it will not affect the use of other tenants. The traditional method is to use virtual machines: one machine divides many virtual machines. Each virtual machine contains a database running instance. However, the actual operation efficiency of this method is not high.

A newly created database is arbitrarily stored on a node, but all its running state information is placed on another dedicated machine. By analyzing the running state, it is predicted whether the load after the database will affect other devices on the machine. Database, whether it needs to be fragmented later. The monitoring and data integration engine is called Kairos in the Relational Cloud. It consists of three parts: 1.Resource Monitor, 2.Combined Load Predictor: Kairos, 3.Consolidation Engine:

Privacy
Traditional encryption can't prevent server-side viewing, such as a library encryption, but running SQL always decrypts the library first. At this time, DBA can see through the show processlist. Relational Cloud creatively uses CryptDB to achieve low-loss encryption and decryption. Test performance is only 22.5% less than without encryption.
At present, there are many kinds of encryption methods. These encryption methods have their own characteristics: RND provides maximum privacy, but the encrypted data can also be decrypted before being compared.

The main contribution is Relational Cloud’s privacy component, CryptDB. It allows the databse to work on encrypted data with acceptable performance.


Review 9

Relational Cloud is a database as a service system. The promise of an “x as a service” system in the software engineering world is to push tasks that are not directly related to their expertise to a specialized system. Hiring specialists is expensive, and this helps the company focus on the bottom-line by offloading a mission-critical system.

While some commercial DBaaS systems were already in operation at the time of the publication of this paper, the authors aim to address the challenges of multi-tenancy, scalability, and privacy using a workload-aware approach. The authors allow multi-tenancy by developing a system called Kairos, which monitors the resources and performs load balancing. For scalability, a graph-based partitioning algorithm is used in an attempt to minimize transactions running across partitions. For privacy, there are various layers that can be used to allow for query processing on encrypted data. This is a short paper that provides only a glimpse at each sub-system, with longer papers to follow.

I thought that the idea of using a decision tree algorithm to determine partitions was really interesting, and it was something that I hadn’t seen before. It seems like this decision tree would be able to find relationships in the data that might be hard to express in some kind of rule-based format. Additionally, it may even find relationships that the users are unaware of. A decision tree seems like a natural way to do this as it is easy to interpret.

While the ideas surrounding privacy were interesting - basically to have different levels based on access patterns - I wasn’t sure what the feasibility was in terms of adoption. Would users be able to specify a minimum privacy level for specific columns? If so, would there be workloads that the relational cloud just would not be able to handle, at least not efficiently?



Review 10

This paper introduces a new “database-as-a-service” called Relational Cloud. DBaaS is attractive due to economies of scale and the fact that costs incurred in a well-designed DBaaS will be proportional to actual usage. Thus it can substantially reduce operational costs and perform well. However, existing DBaaS failed to address three important challenges: efficient multi-tenancy, elastic scalability, and database privacy. Relational Cloud solves these problems in the following way.

To provide efficient multi-tenancy, Relational Cloud uses a single database server on each machine, which hosts multiple logical databases. It will periodically determine which databases should be placed on which machines using a new optimization formulation. It also implemented live migration of databases between machines. The monitoring and consolidation engine have been developed for this purpose is called Kairos. It takes in an existing collection of workloads and a set of target physical machines, then outputs the placement of databases.

To provide elastic scalability, the responsibility of query processing and corresponding data is partitioned amongst multiple nodes to achieve higher throughput. Relational Cloud developed a workload-aware partitioner, which uses graph partitioning to automatically analyze complex query workloads and map data items to nodes to minimize the number of multi-node transactions/statements. More specifically, in the graph algorithm, each node represents a tuple data and an edge is drawn between two tuples if they are touched in the same transaction. Then a weight is calculated for each edge to reflect how often such pair-wise accesses occur in a workload. Then the whole graph is partitioned by minimizing the total weight of the cut edges.

Finally, to provide strong privacy in Relational Cloud, CryptDB, which contains a set of techniques for providing privacy is developed. The key idea here is called adjustable security. Each value of each row is independently being encrypted into an onion using increasingly stronger encryption. Then the security level dynamically adapts based on the queries (which need to provide appropriate keys for different layers) that applications make to the server.

This paper covers a lot of different techniques like workload-aware database partitioning, placement, migration, and privacy. However, the paper doesn’t go into details for any of the topic, especially the Kairos part. Also, the evaluation part also seems to be very simple. For example in figure 4, only four data points are given.







Review 11

In the paper "Relational Cloud: A Database-as-a-Service for the Cloud", Carlo Curino and MIT faculty discuss Relational Cloud, a new transactional "database-as-a-service" (DBaaS). The purpose of DBaaS is to shift responsibility of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the database user to the service operator. In essence, this model strives to reduce the overall costs and burden on users. Since relational databases are not likely to be forgotten any time soon and cloud computing has gained great popularity, combining the two takes the best of both worlds. There have been several several other approaches to DBaaS such as Microsoft Azure and Amazon RDS, but all these services do not address three important challenges: efficient multi-tenancy, elastic scalability, and database privacy. Unless these three challenges are overcome, it will be hard to market the usefulness of DBaaS. Relational Cloud tackles all these three points in the following:
1) A "workload-aware" approach that can identify locations for workloads in order to achieve higher consolidation and performance.
2) Graph based partitioning algorithms that can achieve linear scale-out for complex transactional workloads.
3) A user adjustable security scheme that enables SQL queries to run over encrypted data.
As queries are run on Relational Cloud, it gets a sense for the type of workload that it is dealing with and optimizes for future tasks. A self-tuning database management system that can be customized by users who do not necessarily care about how a DBMS functions - this is an answer to the prayers of consumers in non-tech industries. Thus, it is clear this this both an interesting and important problem to consider.

The paper is divided into several sections (the main contributions are discussed per section):
1) System Design: Relational Cloud uses unmodified DBMS engines as both the back-end query processing and storage nodes. Each of these back-end nodes act as a single database server. Each of the back-end machines can change dynamically in response to the load of the system. Data is never mixed between two different servers, but can run within the same database server. When the front-end receives SQL statements from clients, it communicates with the router, which analyzes each SQL statement and uses metadata to determine the execution nodes and plan. The front-end also provides a distributed execution plan, handles fail-overs, and provides a degree of isolation by controlling the rate that queries are dispatched. Relational Cloud uses information from front-end access patterns in order to learn how to partition data, place partitions in the back-end to optimize for load balance, availability, replication, and secure data so queries can be run over encrypted data.
2) Database Partitioning: There are two main purposes for partitions: to scale a single database to multiple nodes and to enable more granular distribution on back-end machines as opposed to placing an entire databases. The partition that takes place does it in such a way that the number of multi-node transactions is minimized (because they incur overhead and cost $$$). As was previously stated, Relational Cloud uses a workload aware mechanism to concretely partition data such that its able to discover interesting correlations within data.
3) Placement and Migration: Some issues when dealing with a system like Relational Cloud include: monitoring the resource requirements of each workload, predicting
the load multiple workloads will generate when run together on a server, assigning workloads to physical servers, and migrating them between physical nodes. Thus, in order to combat this, a resource monitor is used to automatically collect statistics about RAM usage through disk activity from several DBMS. A combined load predictor also accurately predicts the usage of both CPU and RAM for a given back-end machine. Finally, a consolidation engine tries to minimize the number of machines required to support a given workload mix and balance load across the back-end machines. It does all this while also not trying to exceed the machine limits.
4) Privacy: The idea is to be flexible with their security mechanism and offer many cryptographic techniques that users have at their disposal. Relational Cloud encrypts each value of each row independently into an onion. As a result, each consequent value in the table is increasingly stronger in encryption. The security level dynamically adapts based on the queries that are made to the server.

Much like other papers, I felt that this paper had some drawbacks. The first drawback that I noticed is the fact that they mentioned related works near the end of the paper rather than at the beginning. I felt that connections between Microsoft Azure and Amazon RDS could be made with other current methods that have yet to be commercialized. Furthermore, I also felt that these other systems could be used to fuel the need for Relational Cloud. Another drawback that I noticed is a lack of how to describe privacy in the experimental results. They simply label privacy as a cost in terms of efficiency, but never actually measure how secure Relational Cloud is. Are other solutions not as good as Relational Cloud because of the latency that their security provides, or because it just isn't secure at all? The final drawback that I noticed is it was not clear what their baseline was in the experiments section. It seemed like their were testing their systems against one another without a third party which lessens the validity of some of their claims.


Review 12

This paper describes the implementation of Relational Cloud. Relational Cloud is a Database as a Service that offers behavior that other DBAASs lack. The idea of a Database as a Service is to offer a large set of machines to any user who needs them, so that they can customize their machine usage to their individual needs. The paper identifies three qualities that Relational Cloud needs to fulfill in order to offer effective service.

The first quality is efficient multi-tenancy. In other words, given a large set of machines, and multiple customers and workloads, the system needs to find an efficient way of servicing all of the users. An easy way to do this is to create a virtual machine for each user. However, this ends up being very inefficient, since each virtual machine simulates its own operating system and everything that comes with it, while the user just needs a database. Relational Cloud therefore stores multiple users’ data in multiple databases in the same database server, which uses machine resources efficiently.

The second quality is elastic scalability. Many users will have workloads that will work on just one machine. However, some workloads are too large, and will need to be automatically partitioned across multiple machines. Relational Cloud uses graph partitioning to determine how to partition the workload across the available machines, using a partitioning system called Kairos.

The final quality is database privacy. Each user wants their data to be secure from all other users. Relational Cloud uses a subsystem called CryptDB in order to encrypt each user’s data. Each element of each row is encrypted under multiple layers of encryption, called onions. When performing database operations, the system only decrypts as many layers as are necessary in order to perform the operation, thus keeping maximum encryption on the data.

This paper is able to succinctly and effectively describe the useful qualities of databases as a service, and how Relational Cloud implements all of these qualities. It’s very easy to read. The use of multiple layers of encryption is clever in how it keeps data encrypted while performing database operations.

The downside of this paper is that due to its short length, it can’t go into as much detail regarding the implementation of each of Relational Cloud’s systems. There’s only a brief mention of how Kairos and CryptDB function.



Review 13

This paper introduces relational cloud, which is a new transactional database as a service that move much of the operational burden of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the database users to the service operator. Therefore, this new system offers lower overall costs to users. The key technical features of relational cloud include a workload aware approach to multi-tenancy that identifies the workloads that can be colocated on a database server, achieving higher consolidation and better performance than existing approaches, the use of a graph-based data partitioning algorithm to achieve near-linear elastic scale-out even for complex transactional workloads, and an adjustable security scheme that enables SQL queries to run over encrypted data. Database-as-a-service is attractive because the hardware and energy costs incurred by users are likely to be much lower when they are paying for a share of a service rather than running everything themselves. Also, the costs incurred in a well-designed DBaaS will be proportional to actual usage. The three challenges that drive the design of relational cloud are efficient multi-tenancy, elastic scalability, and database privacy.
For the system design, relational cloud use current DBMS engines as the back-end query processing and storage nodes, and each node runs a single database server. The data of two tenants are not mixed into a common database, but databases belonging to different tenants run within the same database server. Relational cloud uses access patterns induced by the workloads to periodically determine the best way to partition each database into one or more pieces, producing multiple partitions when the load on a database exceeds the capacity of a single machine. The teams has implemented the distributed transaction coordinator along with the routing, partitioning, replication, and CryptDB components.
Relational cloud uses database partitioning because a single database can be scaled to multiple nodes when the load exceeds the capacity of a single machine. Also, database partitioning enables more granular placement and load balance on the back-end machines comparing with placing entire database. Relational cloud uses a workload-aware partitioning strategy with the help of front-end periodically analyzing query execution traces to identify sets of tuples that are accessed together within individual transactions. The system extracts a set of candidate attributes from the predicates used in the trace. The strength of this approach is its independence from schema layout and foreign key information, which allows it to discover intrinsic correlations hidden in the data.
To deal with resource allocation challenge, monitoring and consolidation engine was developed. It mainly has three key components: resource monitor, combined load predictor, and consolidation engine. For database privacy, the approach is called adjustable security.
The strength of this paper is that it clearly points out the goals and challenges with the design of relational cloud first before diving into technical details. However, it would be better to have more graphs to help illustrating how the system works.


Review 14

“Relational Cloud: A Database-as-a-Service for the Cloud” by Curino et al. present a new DBaaS, Relational Cloud, that addresses some of the challenges prior commercial DBaaSs had not supported: efficient multi-tenancy, elastic scalability, and privacy. The underlying approach is workload awareness, where the DBaaS considers past/recent workloads and queries when considering where to store data and relatedly where to run particular queries, when and where to migrate data, and what level of data encryption to use. At a high-level, Relational Cloud works by having clients first encrypt their data, then send their data to Relational Cloud to be stored. For queries, an adjusted version of the client’s query is run on the backend (on the CryptDB instances) in order to effectively retrieve the appropriate encrypted result. The encrypted data is then sent back to the client, where the client decrypts it. The level of encryption automatically adjusts based on the kinds of queries the client is performing. With regard to efficient multi-tenancy and elastic scalability, these are supported via database partitioning and placement and migration. Partitioning of data is done in an effort to reduce the number of multi-node transactions; prior workloads are processed and represented in a vertex and edge graph in order to understand the relationships between tuples, and a partitioning of the graph is performed to find a set of balanced partitions with min-weight cut. Incoming workloads can also be partitioned and migrated across different backend servers based on the system’s current utilization of resources.

The work in this paper is very practical, as DBaaSs are becoming more popular and commercial DBaaS will need to scale and address real-world challenges.



Review 15

This paper proposed relational cloud, which is a transactional "database-as-a-service". This paper emphasized on dealing with three important challenges, efficient multi-tenancy, elastic scalability and database privacy. These challenges were not tackled by the previous DBaaSs, including Amazon RDS and Microsoft SQL Azure. The key contribution of the paper includes a workload-aware approach to multi-tenancy that identifies workloads that can be co-located on a database server, a graph-based data partition algorithm to achieve higher consolidation and better performance, and an adjustable security schema that enables SQL queries to run over encrypted data, including ordering operation, aggregates and joins.

The relational cloud uses existing DBMS engines as the back-end query processing and storage nodes. And the communication between frontend user and relational cloud is performed by using a special driver that ensures their data is kept privacy. The front-end monitors the access patterns included by the workloads and the load on the database server. The information is used to determine the best way to partition each database into one or more pieces, and place the database partition ton the back-end machines to both minimize the number of machines and balance load.

I think the most interesting part of the paper is about the privacy. Privacy is a really important issue for a business of government when considering immigrating their data onto the cloud. I think the driving force for many companies to maintain servers on their own, paying much to employ engineers is that they don't want to leak their data. The onion architecture proposed in the paper is the most important part in my opinion. The design result of such encrypting strategy is also promising, by supporting ordering and aggregation.

The shortcoming part of the paper to me is that it doesn't talk about how to deal with different back-end DBMS servers. Since different DBMS may offer different API and service, so I think it is an issue that need to be taken into considertaion.


Review 16

In this paper, the authors introduce a new transactional “database-as-a-service” (DBaaS) called the relational cloud. The high-level idea of DBaaS is to move the operational burden of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the user to service operator that reduce the customer’s cost. This problem is important because nowadays cloud computing is a very popular service for companies, especially for small or start-up companies, for those companies, people have limited resources to maintain their underlayer infrastructure so that they can use the services provided by the cloud platform. As they mentioned in their paper, DBMS outsourcing provider like Amazon RDS and Microsoft SQL Azure are still subject to some challenges, these challenges include efficient multi-tenancy, elastic scalability, and database privacy. These challenges are very important because they must be overcome before outsourcing DBMS software and management becomes attractive to many users and cost-effective for service providers. Next, I will summarize the key points of this paper with my understanding.

First of all, the authors discuss key technical features for Relational Cloud include (1). A workload-aware approach to multi-tenancy that identifies the workloads that can be co-located on a database server, achieving higher consolidation and better performance than existing approaches. (2). The use of a graph-based data partitioning algorithm to achieve near-linear elastic scale-out even for complex transactional workloads. (3). An adjustable security scheme that enables SQL queries to run over encrypted data, including ordering operations, aggregates, and joins. For the system design based on workload, the workload approach is adapted to multi-tenancy, which involves identifying functions, and workloads that can be easily co-located on the server, resulting in high consolidation and better performance. In the long run, this reduces configuration efforts for both service providers and user. As for efficient multi-tenancy, Relational Cloud requires the developer to predetermine the resource requirements of individual workloads, how they will co-locate on one machine, and how to benefit from temporal variations of individual workloads to optimize hardware utilization efficiency. As for elastic scalability, the adoption of a graph-based data partitioning algorithm assists to achieve near linear elastic scale out, even when carrying out complex operations. This enables the system to support workloads and databases of different sizes since the number of multi-node transactions is minimized. Besides, this paper also discussed the privacy and data migration problems in DBaaS.

Generally speaking, it’s a nice paper with great insights and there are several contributions to this paper. The main contribution of this paper is the design of the Relational Cloud, which is an implementation of a data fission approach from the MIT DBMS group. In this paper, they point out three most important technical features of Relational Cloud. There are several advantages of this paper, first of all, this is the first work that points out the three important challenges in the DBaaS field, the insight of this paper is very good! Next, from a system design point of view, DBaaS delivers database functionality similar to relational DBMS. DBaaS provides a flexible, scalable, on-demand platform that's oriented toward self-service and easy management. Also, as a distributed cloud service, this paper also addresses the problem of how machines can be used to minimize copying of information and communication so that scaling can be more efficient. Besides, for the paper itself, I think this paper is well organized, it gives a clear description of DBaaS and uses several examples to illustrate the design of such service, which make it very easy to follow and understand.

However, I think there are some drawbacks to this paper. I think the first issue is the cost control for the user. For cloud service, pricing is also a very important factor for both service provider and users, however, I think their system doesn’t mention anything about the pricing policy, which can be improved later. Second, the authors assume that their load predictor is accurate and don't have any mechanisms for backup behavior for bad predictions, if predictor went wrong, the whole system performance will be greatly degraded, it should have some way to find this problem and fix it. Last but not least, they do not talk about the fault tolerance for the cloud service, as we all know that as the cloud service, the availability is very important, if due to some reason, some nodes are crashed, it should have some way to recover from failure and keep operating, I think they should consider how to tolerate fault in cloud DBMS service.



Review 17

This paper introduces Relational Cloud, which is a cloud database-as-a-service (DBaaS). It starts by introducing 3 main issues that existing DBaaS offerings had not addressed and that the paper argues are vital for a DBaaS to be attractive to a customer. These 3 issues are:

1) Efficient multi-tenancy: This basically means that a DBaaS must be able to sufficiently understand the resources necessary to serve a set of workloads while minimizing the # of machines used. Relational Cloud does this by hosting multiple databases on a single-server-per-machine basis rather than using individual VMs per machine, and optimizes the placement by using cost-based optimizations. Migration of databases from server to server is also supported to increase flexibility.

2) Elastic scalability: A DBaaS must be able to scale out if a workload requires more than just a single machine in terms of performance. Relational Cloud has a novel workload partitioning strategy that they call “workload aware” — the front-end analyzes queries and adjusts the placement of tuples which makes this approach more flexible to workload type.

3) Privacy: Many customers might be anxious about DBaaS administrators having access to private organization data — Relational Cloud deals with this by providing mechanisms for operating on encrypted data so that RC administrators can’t peek at this data—RC calls this CryptDB and boasts flexibility so that different types of data have different encryption levels.

The main contribution is obviously RC, which solves the 3 problems introduced in the paper—the partitioning strategy seems to be what they are most proud of. However, I also think that the fact that RC can be used as a public cloud DBaaS or a private cloud offering is also an advantage (I’m not too familiar with what the technical implications of differentiating these are, so it’s possible that this isn’t very impressive). And probably the best strength of RC is that it is very flexible in multiple ways — it is robust under multiple workload types and has adjustable security, which seems to be very important for a DBaaS. One weakness I think is that the paper claims that “only a 22.5% reduction in throughput on TPC-C” was acceptable for providing extra privacy protection—I could be wrong, but 22.5% seems to be a pretty big performance hit.