Many organizations turn to database-as-a-service because they need a DBMS but do not want to administer it themselves, lacking expertise in performance tuning, load balancing, and the like. DBaaS providers fill the need, but they may not be as cost-effective as possible: It appears that many DBaaS hosts use more servers per customer than are necessary for the desired quality of service. Existing solutions such as running each user database on its on DBMS within a virtual machine create large overhead. Providers might be able to place 10 times as many user databases per machine, if there were a better way to provide good performance and privacy for user data.|
Relational Cloud is a shared database system that puts different users’ databases together in the same DBMS, using encryption to give each user privacy from other databases, and even Relational Cloud administrators. Relational Cloud also uses clever load balancing techniques to decide how to partition each database across nodes so that good performance is likely, yet as few servers are required as possible.
To decide how to allocate server nodes to a database, Relational Cloud obtains a usage profile of each database, as a directed graph of which tuples tend to be accessed together. The database is partitioned across nodes to minimize the weight of cuts between data on different nodes. Relational Cloud also uses a novel method to estimate the RAM needed for each database’s working set, by reducing the RAM allowed for each database until disk reads increase dramatically, indicating that the allotted space is smaller than the working set. After collecting these data, Relational Cloud uses a nonlinear optimization solver to decide how to split each database across nodes and assign it RAM on each one.
To provide privacy for users, Relational Cloud keeps data encrypted in the database, with encryption and decryption happening in a middleware application on the client’s computer. Layers of encryption called an “onion” are applied in sequence, so that if data are only accessed sequentially, data will be kept at a high (random) encryption level in-DB, while if data need to be sorted or compared for equality, partial decryption to something like OPE will be made in the database.
The main drawback of Relational Cloud is high overhead: There is about a 40% performance penalty from the combined effects of encryption and the transaction coordinator. The authors claim this is compensated for by the ability to host more users per server.
The paper talks about Relational Cloud, a new transactional database-as-a-service which promises to move much of the operational burden of provisioning, configuration, scaling, performance tuning and other operations to service operator, providing low cost to the users. The traditional databases are marred with high cost, lack of elasticity and are harder to maintain. The DBaaS provide much lower hardware and energy cost. It also entails efficient multi-tenancy, elastic scalability and database privacy.|
Multi-tenancy uses a single database server on each machine which hosts multiple logical databases. Relational cloud periodically determines which databases should be placed on which machines using a novel non-linear optimization formation. A good DBaaS supports database and workloads of different sizes. The challenge arises from a database workload exceeding the capacity of a single machine. All the data stored in DBaaS is encrypted eliminating the privacy concerns. The database is partitioned to scale a single database to multiple nodes. It also enables more granular placement and load balance on the back-end machines compared to placing entire databases. Resource allocation is a major challenge when designing a relational cloud. The database is placed in care of a resource monitor, combined load predictor and a consolidation engine.
The paper provides a comprehensive study of Relational Cloud data model. The proposal is supplemented with the performance benchmarks among CryptDB and Baseline.
There aren’t many comparisons between CryptDB and current RDBMSs. The usage of this data model hasn’t been discussed much.
What is the problem addressed?|
The paper deals with the strategies used to transition databases and storage systems to the unique challenges of the cloud environment.
With the advent of hosted cloud computing and storage, the opportunity to offer a DBMS as an outsourced service is gaining momentum. Moreover such approach is attractive by two reasons. First, due to economies of scale, the hardware and energy costs incurred by users are likely to be much lower when they are paying for a share of a service rather than running everything themselves. By centralizing and automating many database management tasks, a DBaaS can substantially reduce operational costs and perform well.
1-‐2 main technical contributions? Describe.
Here are some core requirements for any database or storage system in the cloud: scalability, elasticity, and autonomy. Scalability means scale-out, the ability to use multiple nodes to gain increased storage capacity and performance, and it’s achieved through partition. They use workload-aware partitioner , which uses graph partitioning to automatically analyze complex query workloads and map data items to nodes to minimize the number of multi-node transact.
Elasticity is one of the core selling-points of the cloud: pay-as-you-go pricing as a cloud consumer, adding and removing nodes in response to the load on your service. This paper achieves this by developing monitoring and consolidation engine, Kairos, which consists of three parts Resource Monitor, Combined Load Predictor, Consolidation Engine.
Privacy is a novel goal. The paper presents a design that will allow DBAs (who operate Relational Cloud) to perform tuning tasks without having any visibility into the actual stored data. With the notion of adjustable privacy, they encrypt each value of each row independently into an onion: each value in the table is dressed in layers of increasingly stronger encryption, and each operation will require different level of decryption.
1-‐2 weaknesses or open questions? Describe and discuss
I like the idea of privacy. How to provide adjustable privacy seen like an important question that hasn’t explored too much by conventional cryptography.
This paper presents a new transactional “database-as-a-service” called Relational Cloud, which is designed to move much of the operational burden from the database users to the service operator and offer lower overall costs to users. In short, Relational Cloud is a scalable relational database-as-a-service for cloud computing environments. Relational Cloud overcomes 3 significant challenges: efficient multi-tenancy, elastic scalability and database privacy. Client can provide only the minimum decryption capabilities required by any given query. This paper first gives an overview about Relational Cloud, as well as system design. Then it moves to the details of design, including database partitioning, placement and migration, and privacy. It also provides a performance evaluation on several benchmarks. Finally, it provides related and future work.|
The problem here is that with the advent of hosted cloud computing and storage, the opportunity to offer a DBMS as an outsourced service is gaining popularities. DBaaS is attractive for two reasons. First, the hardware and energy costs will be much lower. Second, the cost will be proportional to actual usage. Thus DBaaS can reduce operational costs and perform well. However, there are several challenges, including efficient multi-tenancy to minimize the hardware footprint, elastic scale-out to handle growing workloads, and database privacy. This paper presents a good solution called Relational Cloud.
The major contribution of the paper is that it presents a scalable relational DBaaS for cloud computing environments and successfully overcome three major challenges mentioned above. By doing performance evaluation on several traditional benchmarks, it shows that Relational Cloud can reduce a lot costs and perform well. Here we will summarize the key elements of Relational Cloud:
1. efficient multi-tenancy: a novel resource estimation and non-liner optimization-based consolidation technique
2. scalability: graph-based partitioning method to spread large databases across many machines
3. privacy: adjustable privacy and use different levels of encryption to enable SQL queries to be processed over encrypted data
One interesting observation: this paper is very innovative, and it presents some new ideas about how to overcome 3 main challenges for a DBaaS. One possible weakness is that this paper does not provide enough information on how to measure database privacy. It presents an “onion” layered encryption, but it does not provide proof or tests on how it works.
Relational Clouds is introduced as 'Database as a service' transactional model(OLTP workload) which offers to shift much of the operational burden, scaling, configuration, performance tuning, access control, privacy and backup from database users to service providers; in the end effecting cost reduction to users. Initially, a few providers introduced DaaS efforts into the market but failed to address three pertinent issues: elastic scalability, efficient multi-tenancy, and database privacy. This formed the basis for this paper and setbacks related to DaaS with respect to cloud computing.
1.Design based on Workload
The workload approach is adapted to multi-tenancy, which involves identifying functions, and workloads that can be easily co-located on the server, resulting in high consolidation and better performance. In the long run this reduces configuration efforts for both service providers and user. Relational Cloud periodically determines which databases should be placed on which machines using a novel non-linear optimization formulation, combined with a cost model that estimates the combined resource utilization of multiple databases running on a machine.The strength of this approach is its independence from schema layout and foreign key information, which allows it to discover intrinsic correlations hidden in the data. As a consequence, this approach is effective in partitioning databases containing multiple many-to-many relationships—typical in social-network scenarios— and in handling skewed workloads
Relational Cloud requires the developer to predetermine the resource requirements of individual workloads, how they will co-locate on one machine, and how to benefit from temporal variations of individual workloads to optimize hardware utilization efficiency.This helps minimizing the set of machines required while still meeting application level performance goals.
The adoption of a graph based data partitioning algorithm assists to achieve near linear elastic scale out, even when carrying out complex operations. This enables the system to support workloads and databases of different sizes since the number of multi-node transactions are minimized.
The issue of using graph based partitioning algorithm is speed. It's slow to turn a cloud database worth of tuples into a graph, run the partitioning algorithm, and then move the data around.Unless the algorithm is weighted properly too, it could result in bad "full shuffle" data movement patterns, and the inability for manually tuning.
This refers to CryptDB, a paper that is also on the reading list. The key factor which help is the ciphertext expansion .It deals by approaching a limited subset of SQL operations on encrypted data, where the data is stored in the most secure format that can still support the requested operation. In this way, the database only ever sees encrypted data, though there are some assumptions about keys and the threat model that I find slightly unconvincing.
5.Placement and Migration
Resource migration and placement is a major challenge when designing multi-tenant services such as Relational Cloud. Nevertheless, Relational Cloud overcomes this hurdle as it allows arbitrary placement of new database or workloads on some designated nodes for applications.Relational Cloud introduces Kairos to take care of autonomous elasticity. It monitors load and the current working set of the database, and adds or removes nodes in response to this. Kairos also can migrate data partitions to take care of load imbalances. It predicts the I/O performance to figure out the capacity of the system, which I'd like to hear more about.
Finally, the authors through their experiments usin TPC-C benchmark help us understand the performance of the relational cloud by measuring multi-tenancy, scalability, cost of privacy and the impact of latency which gets added due to CryptDB and cloud frontend.
I was curious to know how the tuple graph would look like for OLAP workloads and if OLAP workloads can even be easily partitioned on physical nodes.With respect to Kairos , the key component is the Combined Load Predictor which models consolidated CPU, RAM and Disk for multiple workloads. It is not clear if it can predict sudden spikes or other non-historical workload characteristics. Again, the effects of consolidation on individual query latency could have been an interesting aspect to know about along with the effect of Kairos mis-predictions on individual query latency.
This paper address the future of database systems - the database as a service for the cloud. With many of the recent offerings from Amazon, Google, Microsoft, VMWare, and host of startup companies, we can see that this is an exciting, emerging field. The reasons behind this transformation are simple - economies of scale. Buy buying more machines at a discounted rate, and more efficiently managing those machines, service provides can already save money. But, because customers will rarely all see peak use at the same time, the service providers can actually have less hardware than they would need if they were to simply sum the peak performance of all their customers. Scale can also lead to less cost, even for tiny customers - a large number of small databases can be hosted on a single machine, allowing those users to get the performance they need at a fraction of the cost compared to setting up their own machine.|
The system uses unmodified DBMS engines as its backend, and partitions customers databases and data across the back-end nodes. The system is under construction at the time of publication, however, so we don't have to much detail about how all the pieces fit together. We do know that there is a resource monitor that must keep track of how much system resources are being consumed on each node, and make sure that performance is still hitting service level agreements. If needed, the system must be able to either shuffle data (partition the data across several nodes), or partition databases (move the entire database from one node to another). This is, in addition to allow scalability for one database to run on multiple nodes.
One major contribution of this paper is CyrptDB. This system allows the database to work on encrypted data, at an acceptable (~25%) performance hit. When you compare this to the extreme cost of data leaks from hacks or mismanagement, this cost is well worth it. It uses multiple levels of encryption (so called onion layers) so that different operators can be applied to the data while it is still encrypted.
Problem and solution:|
The paper mentioned DBaaS, which undertakes the work of administration and fault tolerance details to ease the life of users. It is an attractive issue because economically, sharing the service for all users has lower hardware and energy costs comparing to they run everything themselves. And the cost in a DBaaS is proportional to the usage. So DBaaS reduces the operational costs and has good performance.
The problem proposed is that early DBaaS does not solve three important challenges: efficient multi-tenancy, elastic scalability and database privacy. The solution mentioned is Relational Cloud, which is a new DBaaS. Its core is workload-awareness. It solves the three challenges:
It uses workload-aware approach to multi-tenancy to minimize the machines used. The approach taken is using a single database server to host multiple logical databases on each machine. Relational Cloud allocate the databases dynamically according to the optimization formulation and the cost estimation model. To achieve the periodically redistribution, Relational Cloud has a live migration mechanism for databases.
It uses graph-based data partitioning algorithm to achieve near-linear elastic scale-out. Data items are mapped to nodes and the algorithm is to minimize the number of multi-node transactions.
It supports adjustable security scheme for encrypted data to solve the privacy concerns. Relational Cloud has the component CryptDB, which has multiple encryption levels for different data and queries, to solve the problem with good performance.
The main contribution is Relational Cloud’s privacy component, CryptDB. It solves the privacy concerns of large scale uses with acceptable cost in performance and allows the administrators tune the tasks without any visibility of the stored data. The approach is adjustable security, which encrypts the data with different cryptographic techniques (RND, DET, OPE, HOM) for layers like an onion. The onion key is used to decrypt the data.
Though the paper is good, I still has a questions. The key component workload-awareness partitioner is based on the data nodes in the graph. To limit the size of the graph, it uses sampling of the tuples and transactions. The sampling strategy is not mentioned. It is not clear and is possible to provide a bad partition when the sample could not reflect the whole dataset.
This paper introduce Relational Cloud: “database-as-a-service” (DBaaS). It is a cloud service that move much of the operational burden of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the database users to the service operator, offering lower overall costs to users.|
There are three important challenges: efficient multi-tenancy, elastic scalability, and database privacy.
The goal of is efficient multi-tenancy to minimize the number of machines required, while meeting application-level query performance goals. The method in the paper uses a single database server on each machine which hosts multiple logical databases. They developed a novel resource estimation and non-linear optimization-based consolidation technique.
For scalability, it must support scale-out, where responsibility for query processing is partitioned among multiple nodes to achieve higher throughput. They use a graph-based partitioning method to spread large databases across many machines.
For privacy, we developed the notion of adjustable privacy and showed how using different levels of encryption layered as an “onion” can enable SQL queries to be processed over encrypted data.
DBaaS makes hardware and energy cost much lower and the cost is proportional to actual use. It can provide much better performance hardware to people for research and education purpose and support many functionality same as DBMS.
Cloud platform provide user less control of some resource and thus less flexibility. It would also be better to provide user a visualize user interface to provide more easier way to use the cloud resource.
Services are becoming the big new business model, allowing companies to manage the tasks of purchasing and maintaining servers and infrastructure for clients. Databases are also being offered as a service (DBaaS = DataBase as a Service). However, this paper focuses on solving three problems that they say are unaddressed in current offerings- efficient multi-tenancy, elastic scalability, and database privacy, and integrates their solution into what they call the Relational Cloud.
Multi-tenancy means that multiple users’ data may be stored on the same physical machine. In the past this problem has been solved using VMs- different users’ data would be on separate database instances in different VMs. However, using VMs is inefficient since each instance needs to take resources to run its own operating system. The Relational Cloud instead only runs a single database server on each machine, which can store multiple logical databases belonging to different users. The Relational Cloud has a system called Kairos which tries to estimate the resources needed to run a certain set of database instances on a single machine. The system can then use Kairos to distribute databases to balance load.
Elastic scalability means that the Relational Cloud is aware of the resource utilization of its nodes, and will spin up new servers to accommodate increased use. To figure out how to partition a users’ data, the system keeps track of tuples that are accessed together using a graph, and then tries to solve a min-cut problem to partition the data. Upon receiving queries, the central node creates a query plan based on where the relevant data is located, like in a distributed database.
Users using other peoples’ servers want their data to be private, even to those managing the server. Relational Cloud uses CryptDB as its solution to running queries on encrypted data. CryptDB uses the observation that some operations (like equality checks, comparisons) can be done if the data is encrypted in a certain way.
The paper does a good job of pointing out specific downfalls of current DBaaS systems, and proposing solutions.
Does the Relational Cloud have the ability to train its partitioning algorithm on a past workload? Otherwise, the performance will not be good until it can figure out a good partition using queries that take a long time.
It would be interesting to compare the cost of decreased performance vs the cost of managing one’s own servers- how much more money does it cost to make up the gap in performance?
The paper introduces a new transactional database-as-a-service (DBaaS), called Relational Cloud. The development of cloud computing opened the possibility of even outsourcing DBMS to the cloud. It has become much viable these days for businesses to operate their DBMS on the cloud. Relational Cloud is a new DBaaS that attempts to address problems that existing solutions from Amazon RDS and Microsoft Azure have.|
The authors point out that the existing solutions do not address three problems: efficient multi-tenancy, elastic scalability and database privacy. Their answers to these problems in the paper are to use a single database server on each machine instead of VMs, the use of workload-aware partitioner and the use of CryptDB for an additional encryption layer, respectively. These problem statements are reasonable and well-explained, but I am dubious about some of their approaches.
The paper states that Relational Cloud uses unmodified DBMS as their back-end database engine, meaning the back-end database must be a parallel database. At this point, I am not sure why the paper talks about “DB-in-VM”. The performance difference shown in the paper may be solely due to the database hosted in the VM opposed to the database without VM. Nonetheless, how should cloud service provider manage their DBaaS without the use of VM? I think it is questionable from the practical point of view, yet the paper does not seem to address it. I also question the 22.5% reduction of the performance using CryptDB. It may be reasonable as the paper states, but the number seems a bit big to be acceptable in many cases.
In conclusion, Relational Cloud is a new DBaaS that tries to resolve many challenges that arise when incorporating DBMS into the cloud computing. It is mostly on point with the discussion of these challenges and their solutions are reasonable. A few points in their approach seems questionable, not addressing the practical point of view well enough.
This paper introduces Relational Cloud. Relational Cloud provides a setup for providing a database as a service within the cloud. It is composed of several smaller components. The two main ones are Kairos and CryptDB. Kairos is a monitoring and consolidation engine that can accurately estimate hardware requirements of workloads to provide better load balancing. In a cloud provided database environment, it is important to be able to predict the disk, RAM, and CPU requirements of a workload since not all of the workloads are the same. Kairos is able to monitor resource usage, predict loads, and move database partitions. Relation Cloud tracks query execution traces as a graph in order to provide a workload aware partition strategy.|
Due to the presence of multiple users and multiple database partitions, privacy is a concern. CryptDB provides some privacy guarantees with an acceptable overhead cost. The main idea of CryptDB is to encode a value within several layers of encryption with the outermost layer being the most secure. As tuples are retrieved, they are decoded by the JDBC application and the database. The assumption is that eventually the database will converge to the security level that the application requires.
An interesting point made is the paper is that the overhead of CryptDB is acceptable. Using the TPC-C benchmark, the paper observed a drop of about 22% throughput, but deemed is acceptable due to the privacy guarantees. Furthermore, the paper remarked that the throughput overhead is due to the fact that TPC-C is a high contention workload and thus is not representative of a real life workload where contentions will be lower. If this was the case, I would have liked to see them use a more representative benchmark. It seems that their work shows the maximum latency bound, but knowing the minimum latency bound is also useful. If it is not significantly, lower than the maximum, their claims may not be valid.
This paper introduces "Relational Cloud", a new scalable database-as-a-service(DBaaS). DBaaS aims at moving much of the operational burden from the database users to the service operator. This paper states that previous DBaaS in the industry has achieved such functionalities but still does not address some problems. these challenges are discussed and overcome in this paper.|
The three major challenges for DBaaS are as below.
1. efficient multi-tenancy:
This is essentially a resource allocation problem that needs to consider the locality of machine and also the temporal variation of workloads. The design of Relational Cloud is to put one database server on each machine and allocate the databases on the machines. They formulate this as a non-linear optimization problem as a cost model.
2. elastic scalability
Problems arise when a database workload exceeds the capacity of a single machine. There should be a good way to partition databases for scale-out. Relational Cloud uses graph partitioning to automatically analyze complex query workloads and map data items to nodes to minimize the number of multi-node transactions/statements.
3. database privacy
The database administrators should not be able to see a user’s data which would cause some privacy concerns. In this paper, they propose CryptDB; CryptDB employs adjustable security that applies different encryption levels to different types of queries that users run.
The major contribution of this paper include:
1. It listed three major challenges that ware not solved in previous DBaaS.
2. It proposes new approaches and implement them in Relational Cloud to address the three aforementioned challenges.
2. It describes the system design of Relational Cloud and performs some experiments to demonstrate the effectiveness of Relational Cloud.
Motivation for Relational Cloud:|
The purpose of transactional database as a service is that it moves the operational burdens of provisioning, configurations, scaling, performance tuning, backup, privacy, and access control from the database users to the service operator, resulting in lower costs to users overall. Previous DBaaS did not provide efficient multi-tenancy, elastic scalability, and database privacy, which are necessary for the database to be cost effective for service providers and appealing for users. The key technical features of Relational Cloud are higher consolidation and better performance from workload-aware approach to multi-tenancy that identify the workloads co-located on a database server, near-linear elastic scale-out for complex transactional workloads from a graph-based data partitioning algorithm, and an security scheme that is adjustable and lets SQL queries run across encrypted data, such as ordering operations, aggregates, and joins.
Details on Relational Cloud:
Relational Cloud emphasizes workload awareness, which allows the system to obtain useful information for optimizations and security functions by monitoring query patterns and data accesses, resulting in reduced configuration needs for users and operators. Efficient multi-tenancy determines the best way to serve a set of databases and workloads from a set of machines, by minimizing the number of machines and meeting performance goals for the application query. This is achieved by making the system understand resource requirements of individual workloads, how they combine with sharing machines, and using temporal variations of each workload advantageously to maximize hardware utilizations and avoid over commitment. Relational Cloud’s solution uses a single database server on each machine to host multiple logical databases. Relational Cloud determines the placement of databases on machines periodically with a non-linear optimization formulation and a cost model for the combined resource utilization of many databases on a machine. Relational Cloud can also perform live migration of databases between machines. Elastic scalability of a good DBaaS supports differently sized database and workloads. When the workload size is greater than that of the capacity of a single machine, the DBaaS must support scale-out so that the responsibility for query processing is partitioned between multiple nodes to achieve higher throughput. Relational cloud does this with a workload-aware practitioner that graph partitions to automatically analyze query workloads and map data items to nodes in a way that minimizes the number of multi-node transactions and statements, which take up significant overhead and are a main limiting factor for linear scalability. In relation to privacy, a cloud-deployed database seems to be insufficient. Relational Cloud uses CryptDB, which exemplifies adjustable security to employ different levels of encryption for different types of data based on the types of queries ran by the user. CryptDB largely eliminates privacy concerns by executing queries over encrypted data.
In the experimental results, real world databases could be consolidated by between 6 and 17 times with Relational Cloud because its methods exploited statistical independence and uncorrelated workload load spikes, and no server experienced more than 90% peak load. Relational Cloud, with a single DBMS instance running multiple databases, obtains 6 times more throughput for a uniform load and 12 times more throughput for a skewed load than running each database in its own DBMS instance. This is mainly because one DBMS is better at coordinating access to resources than the OS or the VM hypervisor, especially in the fact that multiple databases in one DBMS can share a single log that is adjusted more easily in a shared buffer pool. In terms of scalability, Relational Cloud is confirmed to find the optimal partition, providing a 7.7x speedup. There is a throughput reduction of about 40% with CryptDB, but the linear scalability from partitioning compensates for overhead from additional servers. Thus, there is an overall reduction of hardware use, from 3.5 to 10 times.
Strengths of the paper:
I liked that the paper used real world datasets of Wikipedia, Wikia.com, and Second Life to conduct experiments to compare Relational Cloud with DBMS in VM. It was also enjoyable to see the paper discuss the applications of CryptDB, which we read about in another paper, in Relational Cloud. I also felt that the paper does a clear and concise job of describing the three main features that make Relational Cloud novel: efficient multi-tenancy, elastic scalability, and privacy.
Limitations of the paper:
I would’ve liked to seen more discussion on how database partitioning is done, perhaps some pseudocode. Also, it seems that the 40% throughput reduction from CryptDB is very high, even though the partitioning compensates for it. I would have liked to see a discussion on other possibilities for data security that may be not as high of a throughput increase as CryptDB. I feel that the paper could have also included more discussion what applications are anticipated to use Relational Cloud.
This paper's strengths are in its way of targeting the three problems in previous work and systematically providing steps in the direction of improving performance for these tasks. The use of cryptographic protocols, while it does hurt performance, provides the users with increased privacy and the options in CrpytDB give options as to the tradeoff between performance and security. Similarly, non-linear optimization for resource estimation and graph-based partition for distributing large databases across many machines allows database technologies to more effectively use resources in the cloud.
There are drawbacks in the methods they suggest to solve the privacy issue. Homomorphic encryption is not practical for real world applications so CryptDB has to allow users to make a tradeoff between various forms of encryption and the efficiency of their programs. CryptDB does not perform well for several types of aggregate functions. For some types of applications this could cause a more severe performance hit than this paper suggests in Figure 5. They state that although they have several real-world data sets the figure generated is only for the TPC-C benchmark. Using their other data sets would have been a more insightful addition to their experiments section.
Part 1: Overview|
This paper presents a new idea that building transactional “database as a service” system called Relational Cloud. To save time of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control for the users by provide database utility as a service. Relational databases are indispensable component in most computing environments. Because of economies of scale, the hardware and energy costs incurred by users are preventing users to focus on their original development goal. Efficient multi-tenancy is the first challenge that we need to face if we are to build database service. It is really hard even for an experienced DBA to answer the question that given a set of database and workloads, what is the best way to serve the queries. The second challenge would be elastic scalability where good DBaaS must support database and workloads of different sizes. The last challenge would be privacy that when all users data are stored together we need to protect single user’s privacy. This would limits the degree of trust users are willing to place in the system. Placement and migration solved problems including monitoring the resource requirements of each workload and also predicts the load multiple workloads will generate to physical servers. Relational Cloud that guarantees the privacy of stored data by encrypting all tuples. The key challenge is executing SQL queries over the resulting encrypted data, and doing so efficiently. The key idea in their approach is a notion so called adjustable security.
Part 2: Contributions
Relational cloud uses unmodified database engine as the back end query processing and storage nodes. The set of back-end machines can change dynamically in response to load. Each tenant of the system which we define as a billable entity can load one or more database.
Database partitioning is implemented in DBaaS system to scale the single database into multiple nodes and it is useful when the load exceeds the capacity. Database partitioning can also enable more granular placement and load balance on the back-end machines.
Part 3: Drawbacks
They only developed distributed transaction coordinator along with the routing , partitioning, replication, and CryptDB components. They are still in the process of integrating all of the components into a single coherent system.
Relational Cloud is introduced in this paper as a new transactional "database-as-a-service". It offers lower overall costs to uses by moving much of the operational burden of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the database users to the service operator. Three major challenges are there in this area, and they are efficient multi-tenancy, elastic scalability, and database privacy. To support those targets, three key features are discussed in this paper.
The system is composed of the following parts: on the trusted platform, we have client nodes as user applications and JDBC-Client. On the untrusted platform we have front-end nodes and admin nodes which contains partitioning engine and placement and migration engine, which further talk to CryptDB with encrypted communication. The database use partitioning for 1. to scale a single database to multiple nodes, 2. and to enable more granular placement and load balance on the back-end machines compared to placing entire databases.
Resource allocation is a major challenge when designing a scalable database like this. To solve this problem, a new database and workload are placed arbitrarily on some set of nodes for applications. The system is composed of 1. resource monitor, 2.combined load predictor, 3. consolidation engine to realize placement functionality.
The design has strong enforcement on the three major issue for cloud database systems, which are efficient multi-tenancy, elastic scalability, and database privacy.
The completeness of the design is questionable for the following reasons:
1. They didn't test the database against different TPC-C query mixture.
2. The system efficiency cannot be stated with one single system resource setup.
Overall, the design and the discussion is great, which makes the experiment section a little to short.
Relational cloud is built based on the idea “database-as-a-service” (DBaas). It means that everyone can use a cloud database, and save and retrieve their data from cloud as needed. There are several benefits. Database user won’t worry about the load of their database and they pay according to how much they used. |
For the one who provides this service, there are mainly two concerns, according to this paper. The first one is load of their servers. To provide this service and make a profit, they should keep machine as less as possible while providing a good service. Relational cloud uses a workload aware strategy. It has a front end component that analyze query execution traces to identify tuples accessed together. And this information is used for data partitioning. And the developed a engine called Kairos for resource monitoring and load balancing. The other problem is privacy. This is important because business data is valuable. Customer of Relational Cloud wants to keep their data security not only from other customers but also relational cloud administrators. Relational cloud used a method called adjustable security. Adjustable security is possible because there are different encryption technologies that have different property. For example, DET enables server to check whether two values are equal to each other. Then using DET, server can support equality in queries. The authors built CryptDB based on this idea.
DBaaS is an interesting idea, given a lot of other things have been provided as services already. This paper identified two main problemes of DBaaS, load balancing and privacy, and then it solved them. This is the main contribution.
The most obvious weakness is that the database is less configurable. We have seen so many databases in this course, many of they are built to meet specific workload. When you put your database in the cloud, this means you cannot adjust it to meet you need.
This paper introduces Relational cloud, a transactional Database as a service. This system aims to concentrate on making this service more cost-effective and focusing on multi-tenancy, graph-based data partitioning algorithm and an adjustable security scheme. |
One of their features seems to be the fact that they have multiple database systems that can each host multiple logical databases. Their system uses a non-linear optimization module along with a cost formulation that determines placement of databases in individual systems. In addition, they have developed a workload-aware partitioner that uses graph partitioning to analyze complex query workloads and maps data items to corresponding nodes in order to save data access time. The front end also has a component that analyzes the queries to determine which tuples are accessed together. Their service uses this in order to partition the workload using graph plans. The authors have also introduced CryptDB implementing adjustable security in increasing layers of stronger encryption. One of the key features of this database system is that the security adjusts until it converges to a given level of privacy based on the queries executed on the server.
One of the advantages is that the system periodically checks for the placement of the databases so in case the workload changes, the placement of the individual databases will be changed accordingly. The system also allows for partitions to be migrated without any down time.
One of the things that I do not completely agree with is that existing vendors such as Amazon RDS and Microsoft SQL Azure do not provide database privacy, in fact, it is one of the key features that they specify to the customer. So I am presuming the idea of this paper was to talk about encrypting the data in the DBMS itself as opposed to secure access. They specify the performance reduction as 22.5% but according to me, that seems like a huge hit to performance in case of peak traffic times where it may be evident.
Relational Cloud: a Database Service for the cloud paper review|
In this paper the author introduced the topic of large scale, multi-node DBaaS. In the previous experience, each user has to purchase and maintain their own independent machines to run transactional database. But that would result in having many of the individual database nodes staying in idle mode in most of the time. Compared to that, a cloud based database-as-a-service design could move most of that burden to the service provider. Yet, there are three main challenge for establishing such a service: efficient multi-tenancy, elastic scalability, and database privacy.
The main contribution of this paper is the design of the Relational Cloud, which is an implementation of a data fission approach from the MIT databases group. In its design, scalability is achieved through data partitioning. In short, Relational Cloud uses a graph partitioning strategy to identify min cuts on a graph representing query execution traces, basically trying to group together data that is used together. The major consideration here is speed. It is expected to be slow to turn a cloud database worth of tuples into a graph, run the partitioning algorithm, and then move the data around. Ideally, the system would be able to do this in reaction to load spikes (on the order of minutes), but that's unlikely. Unless the algorithm is weighted properly too, it could result in bad "full shuffle" data movement patterns.
Relational Cloud introduces Kairos to take care of autonomous elasticity. Kairos is in charge of monitoring load and the current working set of the database, and adding or deleting nodes in response to this. It can also migrate data partitions to take care of load imbalances. lastly, Kairos can apply pretty deep modeling of I/O performance to figure out the capacity of the system, which I'd like to hear more about.
And the database privacy for each single user is achieved by using adjustable security: This refers to CryptDB, which essentially is a way of doing a limited subset of SQL operations on encrypted data, where the data is stored in the most secure format that can still support the requested operation. By enforcing this rule, the database only ever sees encrypted data, though there are some assumptions about keys and the threat model that is still perceived to be slightly unconvincing.
However, there are still some weakness of this paper:
1.lack of billing mechanism: as is well known that each service provider will try to bill the users as linear to the usage as possible, in this paper, the author never mentioned anything how to determine the billing for each user, nor did it propose any metrics for that.
2. possible invasion of privacy: on those integer values, if the user have to pass the user defined encryption function to the database, it is possible for the database administrator to brute force enumerate all possible numbers for a certain row and hack the content in that row.
In this paper, the author introduces Relational Cloud, a scalable relational database-as-a-service for cloud computing environments. DBaaS is a cloud-based approach to the storage and management of structured data. The paper introduce three challenges for DBaaS: efficient multi-tenancy, elastic scalability, and database privacy. |
Relational Cloud periodically determines which databases should be placed on which machines using a novel non-linear optimization formulation. It uses a cost model that estimates the combined resource utilization of multiple databases running on a machine.
DBaaS must support database and work-loads of different sizes and support scale-out, where responsibility for query processing is partitioned among multiple nodes to achieve higher throughput. The paper represent a graph-based partitioning method to spread large databases across many machines.
The paper represent the notion of adjustable privacy and showed how using different levels of encryption layered can enable SQL queries to be processed over encrypted data. And also the paper talks about the CryptDB which can achieve this goal.
DBaaS delivers database functionality similar to what is found in relational database management systems, DBaaS provides a flexible, scalable, on-demand platform that's oriented toward self-service and easy management.
As the node is running remotely, the user has less option to manipulate the work, like a lack of control over network, latency and application failures.Also, the user cannot change some configuration of physical node which losses much flexibility to do some work.
The authors introduce Relational Cloud: a relational database built for cloud environments. The authors work to address the following issues: efficient multi-tenancy, scalability, and data privacy. The authors solve the first issue by using one database instance per-machine to host a number of logical databases rather than using multiple database instances isolated in virtual machines avoiding resource fragmentation and overhead. To address scalability, the authors use a shared-nothing architecture which partitions data across machines based on user workloads. To actually assign data partitions to machines, the authors use load predictions and solve an optimization problem. Relational Cloud provides some degree of data privacy by using multiple forms of encryption and minimizing the amount of data decryption required to answer user queries.|
One of the key insights made by the authors was that data could be selectively decrypted to answer user queries. Randomized encryption provides the strongest level of encryption and typical applications are likely to have most of their data in this state as only columns that WHERE clauses operate on will be operated on using weaker forms of encryptions (deterministic, homomorphic, or ordered).
Though this system addresses a number of issues faced by cloud tenants, this paper had some shortcomings:
* The authors did not discuss the amount of time that went into solving their optimization problems. This may be a scalability bottleneck and the benefit of having such accuracy is also not discussed (vs some heuristic)
* The authors seem to assume that their load predictor is accurate and don't have mechanisms for backup behavior for bad predictions
* When the authors discuss latency, they discuss average latency. However, the authors make no mention of variance or tail-latency which are often more important for OLTP cloud applications.
This paper introduces a “database as a service” system called Relational Cloud. In the past, many small companies need to do much of the operational tasks of maintaining and scaling databases by themselves. Database as a service promises to take care of all of these needs for the user so that they can just focus on the schema and efficiency of the application. However, database as a service (DBaaS) systems have multiple challenges when implementing the service. They need to have efficient multi-tenancy, elastic scalability and database privacy. Relational Cloud implements all of these features and is still cost effective for the service providers.|
For efficient multi-tenancy, we need to figure out how to service the different workloads using the machines that are available. This process must achieve the best possible performance and use the fewest machines if possible. Relational Cloud has a mechanism that performs migrations of databases onto another machine to make this process as efficient as possible. Another consideration is the scalability of the service. The DBaaS should be able to run workloads of all different sizes with a linear scale out. Finally, since users want their data private on the servers, privacy is a big concern when designing DBaaS. Relational Cloud implements an encryption layer called CryptDB that helps protect users’ data on the cloud.
The following are the positives of this paper:
1. The paper addresses the main problem of having a database as a service: scalability. It also suggests how to attack this problem and still provide security for the users.
2. The authors also explain how machines can be used to minimize copying of information and communication so that scaling can be more efficient.
Even though DBaaS is a good idea and it will help administrators as they will not need to worry about the operational tasks for databases, I still see some weaknesses with the paper:
1. The authors do not go in depth about how the new databases are spread out across the servers after they are monitored. What is the algorithm that is used to know what is optimal?
2. It is mentioned that encryption decreases the throughput by about 20%. Is there any way to still provide a database service without needing to encrypt?
This paper discusses Relational Cloud, a system that offers relational database functionality as a cloud-based service. Relational Cloud offers users the functionality of a full RDBMS without the hardware and administrative costs associated with obtaining, configuring, and maintaining a privately owned setup. While this service is offered by other systems such as Microsoft Azure, Relational Cloud specifically seeks to address the issues of efficient multi-tenancy, elastic scalability, and database privacy.|
To achieve effective multi-tenancy, Relational Cloud runs one instance of a DBMS on each machine and hosts several databases to maximize usage. This achieves much higher performance than hosting each database on a separate VM, as the latter approach requires more space for multiple instances of the DBMS and is less efficient, since the OS is not as good at coordinating the use of shared resources between instances. Hosting all databases on one instance of the DBMS allows the system to coordinate buffer pages effectively and takes advantage of batched I/O between databases to improve performance. In order to achieve high scalability, Relational Cloud uses sophisticated analysis software to create a graph of which tuples in which databases tend to be accessed together. It partitions data across nodes so that the majority of queries only need to contact one node in the system.
In order to achieve privacy, Relational Cloud relies on CryptDB. Information on the cloud is encrypted several times with successively more secure types of encryption. Randomized encryption is the outer layer of encryption, providing security against attacks such as adaptive chosen plaintext attacks. Because the ciphertext of a random encryption ideally provides no information about the plaintext, data that has been encrypted in this fashion cannot be used for comparisons or orderings and is not useful for answering queries. CryptDB allows the client to send a query along with the key used for decrypting the outer layer of encryption to the database on the cloud. This key allows the database to decrypt the first layer and access the data in a form that provides less security but that also allows for selection and ordering. In this fashion, Relational Cloud provides a level of security that users can trust.
My chief concern with this paper is that they seem to believe that the 40% decrease in throughput they expect when running CryptDB with their service will not impact clients significantly. Many of the other systems we have read about went to great lengths to achieve performance increases in the 40-50% range, so blowing this off as something that users won’t really mind seems naïve to me. Additionally, I wish they would have discussed what kind of guarantees they provide their customers. It seemed that they could take advantage of some of the innovations we have read about in other papers to improve performance, until I realized that most of these innovations were aimed at specific types of workloads and business needs. Because Relational Cloud provides a service to many different types of clients, they can’t make broad assumptions about the type of workloads being processed, which limits their flexibility to innovate and improve in the future.
This paper is an introduction to Relational Cloud, which is a database as a service (DBaaS) to offload lots of work from database administers to the service provider. The three main goals relational cloud has is; 1) managing multiple query workloads at the same time efficiently, 2) offering scalability effectively for the client, and 3) ensuring absolute privacy of data while not hurting performance too much. |
Managing multiple query workloads at the same time efficiently is done using their own resource monitor, combined load predictor, and consolidation engine. The goal of the resource monitor is to automatically collect statistics of databases do help estimate things like RAM required for a workload next time something similar is seen. This works hand in hand with the combined load predictor which predicts how much CPU, RAM, and disk each workload will require one one physical server, to help plan where to send what tasks and what tasks to run together and things of that nature. Lastly, this segues right into the consolidation engine, which decides where to place partitions to most effectively balance load.
Scale out is addressed largely in the same way as the proper resource management and partitioning setup will be used to determine what to put on what machines when more are added for scale out. Scale out is a nice feature to have for a DBaaS because you can imagine how it’s nice for a DBA to not have to worry about securing and setting up more hardware for a growing database.
Lastly, they addressed privacy by using multiple layers of encryption in an onion style tactic where the layers are of varying levels of security. The information stored in the DB is always encrypted and a decryption key is passed in with the queries so that the DB can run the query over the encrypted data, and then the encrypted data is sent back. They also allow queries to decrypt specified layers of the onion so that it can be evaluated entirely at a certain level of encryption. For example you could query on the totally decrypted data or query based on a less secure encrypted version of the data or on the completely encrypted data depending on what performance and security tradeoffs you want.
One downside to this was it wasn’t fully implemented at the time of the paper. They had all the parts individually implemented but they hadn’t integrated them all together at the time of the paper. Also nothing there struck me as a major innovation being that Amazon and Microsoft already were offering many of these services, but none the less it was a good paper.
Overall, this is a solid paper because of the major relevance of DBaaS. This is a trend that is growing in industry and has made Amazon tons of money, so it’s important to understand more thoroughly how some of these systems work. This paper did a good job of introducing the goals and implementation strategies that many DBaaS share, and because of that was worth the read.
Broadly, this paper attempts to address the need of databases as a service (DBaaS) in the industry. It is an attractive motivation since users only pay for the use that they need instead of having to manage their own equipment and a shared cloud system will (when architected efficiently) not incur as much of an energy cost than many massive individual systems. While they specifically mention tools like Microsoft Azure (which is actually the first thing I thought of), they claim that for a service to be strong and successful, it must address (1) multi-tenancy, (2) elastic scalability, and (3) database privacy without noticeable cost to the user or provider. |
The first problem described is multi-tenancy; given workloads and databases, how can a system determine the distribution of each to specific machines? It needs some heuristic regarding workload and resource requirements before sharing machines between workloads. Relational Cloud attempts to perform some machine learning optimization methods to determine co-located operations on machines. Virtual machines don’t seem to work as they incur additional, unnecessary overhead. Elasticity is intuitive; database systems need to be able to scale-out and scale-in appropriately. They also developed CryptDB (a separate paper in and of itself) in order to address the issue of DB privacy while maintaining acceptable performance. They go a bit deeper in the “system design section,” describing how a front-end router determines how to horizontally partition data, how to partition the data among nodes more coarsely, and how to replicate data to increase availability, all without downtime.
I thought their DB partitioning scheme was particularly interesting; formulating the set of touched tuples into a graph representation between nodes and attempting to find a minimum l-cut for optimal partitioning was somewhat useful, but I think the cool part was sampling the partitions for candidate queries and using a decision tree to determine a good “explanation” for the features determined by the partitioning scheme. They are actually able to see and “understand” the types of queries grouped into each partition. They also claim that the naive graph solution doesn’t scale to N nodes, but by removing blanket statements and sampling statements (which screw things up by touching all the nodes with their dirty dirty hands) they are able to remove undesired overhead. The placement/migration system, Kairos, is described to use CPU & RAM use statistical information to feed a non-linear optimization engine that minimizes workload resource usage. The paper on Kairos is under development, so they do not release important details like (1) features used to model CPU/Disk IO or (2) the actual nonlinear optimization model, which is a very general term. Their explanation of CryptDB mimics that of their paper, but is a bit confusing regarding the technical details of encryption. The gist of it is that different layers of security (“onion”) are used for different parts of the data. Again, the technical details are a bit confusing, but the example they gave made it a bit clearer.
As this is a holistic system paper, I was a bit surprised at a thorough evaluation section at all, so I think it is good that they included a set of results on their workload balance and the effect of privacy on latency. Everything made sense (privacy affecting latency of queries, latency’s inverse correlation with throughput, outward scalability, etc), but I am not sure number of “consolidated servers” by itself is a good metric. Sure, nodes touched can let you put more data on one machine, but that still incurs another problem if many queries are consistently touching that one machine, so it would be more descriptive if they included latency information along with that table. However I believe the novelty of this work lies in the many subsystems: CryptDB, the graph partitioning scheme, and Kairos (which I would very much like to read more about). They seem to have a good grasp on many aspects that go into the DBaaS design. One last thing: regarding their point on multi-tenancy; while the point on VM inefficiency is well-founded, I wonder what relevance tools like Docker, which were made to circumvent VM bulk overhead, have to this problem, and if it can competitively decrease any cost on their end.
This paper talks about Relational Cloud, a transactional “database-as-a-service” (DBaaS) that addresses three important challenges: efficient multi-tenancy, elastic scalability, and database privacy. The motivation behind the paper is because while DBaaS is an attractive concept in terms of economies of scale and “pay-per-use” policy, not many DBaaS can answer to the three challenges mentioned above. |
The paper starts with Relational Cloud’s system design. It uses existing unmodified DBMS engines as the back-end query processing and storage nodes to provide separate services tenant (billable entity) that run on same machine. The front-end monitors the access patterns induced by the workload and the load on the DB Servers. This information is used to determine the best way to: (1) partition each DB into one or more pieces, that is done when the load on a database exceeds the capacity of a single machine (by using workload-aware partitioning strategy); (2) place the partitions on the back-end machines and migrate them as needed without causing downtime, as well as replicate data (using Kairos as monitoring and consolidation engine); (3) secure the data and process the queries so that they can run on untrusted back-ends over encrypted data (by implementing onion layer encryption with various value access, in which value is only decrypted depending on query’s access requirement). The paper explains in detail about database partitioning, placement & migration, and privacy system in answering the three challenges. The paper continues with experiment, which shows that while there is still overhead in performance, there is reduction in the overhead percentage. Lastly, the paper mentions several related work, namely about scalable database services (in which existing offering is limited), multi-tenancy, scalability (unlike Relational Cloud, past works focuses on OLAP workload and declustering), and untrusted storage & computation (which, relative to Relational Cloud, offer much weaker security guarantee).
The main contribution of this paper is it shows that there is a DBaaS, Relational Cloud, that can overcome the three challenges. The overall theme in Relational Cloud design and operation is “workload awareness”, in which by monitoring the actual query pattern and data accesses, Relational Cloud is able to apply optimizations and security functions, which eventually enable multi-tenancy, elastic scalability, and database privacy. This paper proves that DBaaS can be as reliable (and secure) as non-cloud database service.
While CryptDB’s design could be considered a breakthrough in providing cloud database security, it does not protect from the application side. Several papers has talked about this, and one of them in particular has tried proved that CryptDB can still be “broken” through “legal” means (no backdoor) by making use of the application code. When decrypted data is sent to the client for process, while the data is still in protected form, the activity is still traceable in the log. While this is outside the DBaaS scope itself (because it is considered ‘outside’ of the DBaaS), it should still be mentioned to the user as prevention.--
The purpose of this paper is to introduce Relational Cloud, a Database-as-a-service system being developed at MIT. This system improves on existing DBaaS services such as Microsoft Azure and Amazon RDS by offering efficient multi-tenancy, elastic scalability, and database privacy. |
The technical contributions of this paper are numerous. To address the multi-tenancy issue, the authors propose a novel non-linear optimization problem along with a resource-utilization-based cost model to determine which databases and their respective workloads should be placed on which cluster machines. This is presented in contrast with the “typical” solution which is to house databases in separate virtual machines on each cluster machine. The authors also present their “workload-aware partitioner” to solve the problem of elastic scalability. This allows for larger workloads to be effectively partitioned between different nodes to ensure high throughput and performance. This uses a graph partitioning model to determine the configuration that minimizes the inter-node communications for a given workload. Within Relational Cloud, the authors present a new method for encrypting data to increase privacy that they call CryptDB. Though this encryption has a performance cost, CryptDB allows users to privately host data without it being visible to DBAs. Additionally, it allows various levels of encryption to be assigned to different kinds of user data. Another important innovation is that the query is executed on the encrypted data and then the result is shipped back to the client and decrypted on the client’s machine.
I think one of the strengths of this paper is that it identifies specific weaknesses in existing DBaaS architectures and then carefully states how their new system addresses and solves these concerns that users may have regarding the existing services.
As far as weaknesses go, it’s a shame that the system they’re presenting is still under development. Though they have developed the individual pieces, system integration can be a huge hurdle to overcome and would have lent more validity to their project. I also think they could present a stronger argument with their empirical results if they have data regarding the performance of the entire system, rather than simply performance of specific components. Although they do present interesting and promising results, they could have made an even stronger case for their system if it had already been successfully integrated into one functional entity.
Paper Review: Relational Cloud: A Database-as-a-Service for the Cloud|
This paper presents some issues for a then novel transactional “database-as-a-service” called Relational Cloud. The issues discussed in this paper are efficient multi-tenancy, elastic scalability, and database privacy. The paper argues that those are the must-solve challenges for the database software to become a outsourced service. Efficient multi-tenancy aims at minimize the number of machines required to meet application-level query performance goals. Instead of using VM, which can be inefficient in resource utilization, this paper proposes to use a single database server on each machine, which hosts multiple logical databases. Elastic scalability requires a DBMS to support DBaaS and workloads of different size.
As a consolidated solution the paper proposes an approach classed “workload-awareness”, which takes advantages of observations from the past to better utilize resources.
This paper attempts to solve some key barriers for DBMS to become a service product. The idea of resource managing is intuitive while the implementation design is sound. It is interesting that the paper proposes Kairos. The mechanism of Kairos sounds like a smart solution. It would be even better to know some performance comparisons of Kairos and other implementations.
Another strength of this paper is the clarity. The paper presents its ideas quite clear and the figures also help making it reader-friendly.
It may be a little picky calling it a weakness that there is not yet a real product to prove the proposed model can perform well. Because after all this is still a paper in theory. However it could help to advocate the value this paper carries by giving some real-world examples as where the proposed model can be used for improvements.
This paper summarizes a new cloud computing service: relational cloud, or in other word, Database-as-a-Service (DBaas) that can serve as a database infrastructure on cloud to provide services for its frontend users. This paper summarizes the important challenges of DBaaS: 1. Efficient multi tenancy, 2 elastic scalability, 3. Database privacy. It also points out the solutions to these problems in details when introducing the system design for DBaaS cloud: 1. Workload-aware approach, 2. Graph-based data partitioning algorithm, 3. Adjustable security schema. It summarizes that the most important design awareness for DBaaS is the workload awareness. Moreover, this paper also introduces a complete evaluation on the three aspect of the DBaaS. |
1. This paper identifies the three most important problem need to be solved in DBaaS cloud. Moreover, it also introduces the solution to these problems and also the experiments based on these problems.
2. This paper refers to a lot of frontier application in DBMS that related to its problem, for example it mentions Amazon RDS and Microsoft SQL Azure in its abstraction and also CryptDB when discussing the privacy problem.
1. This paper shows that the most important theme of solving the DBaaS challenges is workload awareness. However, in order to archive workload awareness, it need to monitor the query pattern and also data access. This paper does not show the overhead of designing database as well as in the computation and rebalancing problem when arching that.
2. This paper does not mention its work on fault tolerance, which is also a very important problem for cloud computing infrastructure.
This paper introduced Relational Cloud, a transactional database-as-a-service(DBaaS). The DBaaS providers enables users to rent DBMS as a service. DBaaS is attractive due to two reasons. First, by building database as a large scale service, the hardware and energy costs incurred by users are much lower than having their own DBMS. Second, the costs incurred in a well-designed DBaaS will be proportional to actual usage, making it more cost effective. There are at least three important challenges to be solved in DBaaS: efficient multi-tenancy, elastic|
scalability, and database privacy. Relational Cloud applies a resource estimation and non-linear optimization-based consolidation technique to increase efficiency in multi-tenancy. For scalability, a graph-based partitioning method is used to spread large databases across many machines. Relational Cloud also develops CryptDB, which uses the notion of adjustable privacy. With different levels of encryption layered as an “onion”, CryptDB enable SQL queries to be processed over encrypted data. This approach makes it possible that the client only needs to provide the minimum decryption capabilities required by any given query. In summary, Relational Cloud achieves optimization by workload awareness. The system monitors query patterns and data accesses, obtains information useful for various optimization and security functions, thus reducing the configuration effort for users and operators.
The main advantage of this paper is applying various techniques to achieve workload awareness. Such techniques includes resource estimation, graph-based workload partitioning and multiple levels of encryption.
There is a weakness in this paper on resource allocation. In Relational Cloud, any new database and workload will initially be set up in staging area for allocation algorithm. However, after that the allocation is fixed and no further change will be applied. This is not optimal as the workload can change significantly as the application runs. It would be better if the allocation algorithm can monitor the workload change and periodically re-allocate resource if there is a strong need.