The SQL standard defines several aggregation operators, such as SUM and AVG, which compute functions of sets of values, and the GROUP BY operator, which specifies how values should be partitioned before aggregation. Report generators use a hierarchy of partition operations that drill down to the values of specific rows in a database, and roll up, showing aggregate information about sets of rows with similar values. Formerly, SQL did not have operators that make it easy to request all such drill-down and roll-up actions at once.|
In “Data Cube,” the authors propose a suite of new operators for the SQL standard, which make it easy to produce aggregates over a hierarchy of sets of rows, automating the roll-up process. The CUBE operator returns an aggregate result that includes results for individual rows, as well as for rows grouped by all subsets of a given list of fields. The CUBE operator can be emulated in conventional SQL, but this requires writing a multiple-union query with many GROUP BY clauses. A similar operator called ROLLUP returns a smaller result set than CUBE, grouping by a list of columns added in sequence, instead of using all subsets of the list.
The authors contribute a new set of operators for roll-up queries in SQL, along with suggested algorithms for their implementation, and an analysis of the operators' algebraic and performance characteristics. The CUBE operator had already been implemented in SQL Server when the paper was written, which shows the method is practical for commercial DBMSs. The authors find that the CUBE operator can be evaluated efficiently for simple aggregation functions that they call “distributive” and “algebraic.” These aggregates can be computed using only a bounded amount of local memory, and they include operations like SUM and AVG. Unfortunately, “holistic” operators like median and mode, which require memory that grows in proportion to the size of the data, may not be possible to implement efficiently with the CUBE operator. These operators would require using exponentially many GROUP BY computations (in the number of cube columns).
The authors' preferred implementation of the CUBE operator requires adding an ALL operator to SQL, which represents the set of all values present in a column. This unfortunately increases the complexity of the language, which must alter its semantics to accommodate ALL.
The paper presents a detailed outlook on relational aggregation operators used during querying databases. The motivation for creation of a new operator called Data Cube (or only Cube) was the fact that the SQL integrated aggregation operator GROUP BY only produced zero-dimensional or one-dimensional aggregates. Applications needed N-dimensional generalization of these operators. |
The GROUP BY aggregate operator partitions the relation into disjoint tuple sets and then aggregates over each set. This creates problem of constructing histograms, doing Roll-up totals and Sub-totals for drill-downs and cross tabulations. For achieving high-levels of data (rolling-up data), the GROUP BY operator needs many UNION statements. For a six-dimension cross tab, 64 GROUP BY operators are required calculating 64-way UNIONs. The Cube operator generates the required N-dimensional data with a single simple statement. It requires generation of the power set of aggregation columns. Sometimes the applications require only a roll-up or drill-down report. For this purpose, if the cube is employed, the output will contain a lot of extra cases rendering it meaningless. To deal with this, a ROLLUP operator is offered which the answer set is linear in nature as compared to the multi-dimensional nature of the CUBE.
The paper is successful in providing detailed information about the various aggregate operators present. It also provides steps on creating user-defined aggregates which are integrated into SQL. It provides syntax suggestions and also exhibits the pictorial representation of how a cube is computed using minimal number of calls to aggregation functions, making the concept clear.
The authors provide conflicting opinions at times. They propagate the use of cube but are surprised at the fact that after a year of its release, companies are making use of them all the while wondering how are they handling the limitations caused by the operator. They created the cube operator to do-away with the GROUP BY aggregate operator but in the end they require a new super-aggregate mechanism to allow efficient computation of cubes.
What is the problem addressed?|
The paper examines how a relational engine can support efficient extraction of information from a SQL database that matches the above requirements of the visualization and data analysis
Major problems with GROUP BY in standard SQL can be summarized as: it can not help us directly construct histograms; it is not good at generating hierarchical reports,in which data are aggregated from coarse level to finer level; it awkwardly provides the cross-tab-array presentation since a N dimension cross-tab requires 2N-way union of 2N different GROUP BY operators to build the representation, which implies that the computation will not be fast. This paper studied a new data models in database systems, which presents a better tool or operator to achieve analysis and to make a better conclusion on large and complex datasets, especially in the application of data warehousing.
1-‐2 main technical contributions? Describe.
Here, one of the most important concepts is that, CUBE is actually a generalization of GROUP BY, or an N-dimensional aggregate operator. Moreover,the other important concept is the usage of ALL, which influences standard SQL (at that time) in many aspects: it is a new keyword that brings extra options for column definition; it can not be counted in some traditional aggregate functions since ALL itself is generated from aggregation, just like NULL; it makes different the meaning for some relational operators, such as IN.
Finally, the paper discusses the maintenance of data cubes. The paper talks about three operations, insert, update, and delete. This is an issue with consistency due to the data cube no longer containing the correct values or records. In order to deal with this when one of the three operations is executed, then the relevant records should be updated in the data cube. This is relatively easy to do for algebraic functions and modifications to the data because they are not expensive to maintain. The DELETE operation however, is costly with holistic functions since it will require a complete recomputation.
1-‐2 weaknesses or open questions? Describe and discuss
The CUBE operator seems very handy for certain data analysis tasks, but might not be the strength of relational database and what should probably be left to special purpose data analysis tools. With this sort of data processing, the strengths of an RDBMS — transactions, indexes, sophisticated query optimizers — either don’t help at all or just get in the way.
This paper presents a new relational aggregation operator called Data Cube, which generalizes the histogram, cross-tabulation, group by, roll-up, drill-down and sub-total constructs. It was developed for Microsoft SQL. Data cube operator treats each of the N aggregation attributes as a dimension of N-space, and super-aggregates are computed by aggregation the N-cube to lower dimensional space. This paper first provides the relevant background including GROUP BY operator and SQL extensions. Then it discusses the problem of GROUP BY. Following that is the introduction of CUBE and ROLLUP operators, as well as how they overcome the shortcomings of GROUP BY. Finally, it presents how to address and compute the CUBE. |
The problem here is that aggregation functions are widely used in SQL based on the benchmarks like TPC-D, and the GROUP BY operator has several shortcomings:
1. Standard GROUP BY operator does not allow a direct construction of histograms
2. Roll-ups use totals and sub-totals for drill-down reports.
4. Resulting representation of aggregation is complex to analyze for optimization
Thus, a better generalization is needed, which is the motivation of developing the Data Cube.
The major contribution of the paper is that Data Cube is a very good generalization of aggregates, group by, histograms, roll-ups and drill-downs, and cross-tabs. It is easy to compute for a wide class of distributive and algebraic functions. Also, it overcomes several standard SQL GROUP BY shortcomings. Here we summarize the key features of Data Cube:
1. It is an aggregation operation, and it can be externalized by overloading GROUP BY operator
2. The CUBE of a ROLLUP or GROUP BY is a CUBE, and ROLLUP of a GROUP BY is a ROLLUP
3. Use ALL value to denote the set over which each aggregation is computed
One interesting observation: this paper is in general good to present the new design Data Cube, and it provides a better generalization on several popular concepts. However, this paper only talks about how Data Cube is good, but it does not provide any experimental evidence or mathematical proof on how it provides better performance. Nevertheless, I still like the idea of Data Cube. It is easy to understand and fitting well in SQL. Users can also define new aggregate functions.
The paper focuses on the aggregation operations in the standard SQL database management systems, and shows the inefficiency of the conventional approach due to the redundant operations the system needs to execute. And the paper specifically categorizes such operations into 3 classes: (1) histograms, (2) roll-up total and drill-down total, (3) cross tabulation. Then the paper proposes the CUBE and ROLLUP operators, which has a built-in table that maintains these aggregate values so that the users can easily query the indexed data in much cheaper cost. In addition to the methodology, the authors also present the syntax for performing such queries as an extension to SQL, which is relatively simple and similar to SQL. At the end of the paper, they also present a little bit about how we can maintain such cube data structure in an incremental way so that we do not have to compute the entire cube every time we have new data inserted or old data updated in the database tables.
- It is a very interesting topic that the authors discuss in the paper, since these operations are so common and they are widely used in production database systems but no one actually thought about maintaining a separate data structures to accelerate these operations.
- It is also quite nice that the authors made this feature into production system: SQL Server from Microsoft. And the paper also talks about these features in an user’s perspective that they observe from SQL Server, which makes the paper very promising.
- I think the major drawback of this paper is still the overhead of computing or maintaining such complex data structures. The incremental approach for updating the data only works under certain conditions, and in other scenarios it is quite expensive to keep the indexed data up-to-date. And at the same time, as the dimension of the data grows, the working set size of the cube table increases exponentially, which is not very feasible in practice.
- From a hardware perspective, the way the cube is constructed does not seem to fit well during execution. For certain queries, the processing has to be performed on multiple rows that does not necessarily have locality which means each operation could be very expensive to execute. I would imagine there could be certain ways to organize the data in a way that maximizes the probability that the queried data sits closely in the data layout.
This paper focuses on an extension to SQL to generalize the group by and aggregation abilities of database systems. This is driven by the business needs - data analyst need to be able to look for unusual patterns in data, and they do this by viewing statistical information across categories and looking for differences. They want to be able to add features such as "DRILL DOWN" and "ROLL UP" to the database, so that analyst can explore than data in faster and more natural way.|
The authors begin by showing some of the shortcomings of SQL's GROUP BY clause, It does not perform well for queries where you essential want a pivot table - for example, if you want to see all the cars sold in 1994, broken down by model, and then further broken down by color. To get results of this kind, you have to write a query to get each section of the data individually using group by, and then union all the results together. Not only is this tedious, but it is impossible for the query optimizer to speed it up. The authors then go into their CUBE and ROLLUP operators. The CUBE operator generates the power set of aggregation columns. ROLLUP produces just the super-aggregates. They propose a change in the syntax of group by, as well as the extension to add their new operators.
The authors did a great job explain the complex ideas in this paper. Unlike other papers that felt rushed, or that their length had been artificially constrained, this paper was able to take the time it needed to fully explain the ideas. I also really enjoyed their discussion of the "ALL" value - it seems like this could be a bit of a rabbit hole. Some systems could treat this as null, some as set values, or as function, or property of the table column. There are a lot of options here.
One area where this paper is lacking is any hard data. The authors mention that Micrsoft's SQL Server supports the cube and rollup operators, but we don't have any idea of how these techniques apply in practice. Are people able to use the extensions to SQL, or is it too complex? Are the queries using CUBE and ROLL UP faster than the queries that were UNIONS of separate queries? It seems like a lot of the idea presented here could be very computationally expensive, and I would have like to have seen some kind of measurements.
The paper puts forward the problem that multi-dimensional aggregates are needed for data analysis applications, while the current aggregate functions and GROUP BY operator can only produce zero- or one-dimensional data. They are hard to handle histograms, roll-up totals, drill-downs subtotals, cross tabulations and so on, which is important techniques for data analysis and visualization.|
The solution proposed by the authors is to generalize the standard operations and create a relational operator for N-dimensional aggregation for complex data analysis. To accomplish this, it should treat each of the N aggregation attributes as a dimension of N-space.
The approach is defining new operators for the needs. CUBE is a relational operator who overloads GROUP BY. It first aggregates data over needed attributes using GROUP BY, then UNIONs in the cube to get-aggregate values by RLLUP. As for roll-up and drill-down report, the paper offers ROLLUP to solve it. It is computed by sort the table of the aggregating attributes and then compute the aggregate functions. The operators GROUP, CUBE and ROLLUP have algebra relations, which enables them to work in compound order. Also, it allows decorations to bring more flexibility to the aggregate results. During the computation, aggregates are computed at the lowest possible system level. If the values are small, they are kept in arrays or hash tables in memory. Otherwise, it maps the large values to an integer to keep the aggregate values small. Since cube and rollup are relational operators, they can be easily added to SQL by supporting user-defined aggregates.
The strength of the paper is that it provides a bunch of figures and pseudo code to explain the idea, which make it accessible to me. Also, the super-aggregate mechanism is useful in the case when decreasing the dimensions in a multi-dimensional or multi-attributes problem. The drawback is that the performance of the cube and rollup is depended on super-aggregates, while the paper does not provide a convincing analysis of the efficiency of super-aggregates except that it mentioned holistic function is the most efficient way of their known. It is a core of the new operators, so I think more explanation and analysis are needed.
This paper introduces the data cube operator for relational database, which is an aggregation operator that generalizes group-by, histogram, roll-up, drill-down, and cross tabulation. The paper presents us with the motivation behind the cube operator, problems in existing SQL aggregation constructs, high level details of the cube operator and the cost and implementation of the data cube. |
The need for data cube operator arose from the limitations and complexity of the standard SQL aggregate function, GROUP BY, in performing popular data analytics operations such as histograms, roll up totals and sub-totals for drill down, and cross tabulations. The GROUP BY in standard SQL cannot construct histograms directly, and it is even worse for roll ups and drill downs, as even a simple aggregation results in high complexity that is difficult to optimize properly. The data cube solves this problem by generalizing the aggregation in N-dimensions and optimizing data movement.
This paper presents a good overview of the data cube operator and why it was needed, but it feels a bit lacking in the exact details of the data cube implementation. Performance or runtime comparison between original aggregation based on GROUP BY and what we get from data cube would have been nice to see.
This paper defines a new operator data cube(cube) in the relational databases, which is an extension to the normal aggregation like group-by, but supports higher dimension aggregation more efficiently. The paper also mentions the cube operator fits in the SQL and discusses how it can be implemented in RDBMS in several ways.|
Data cube gives great flexibility in data analysis. Conceptually, cube treats each attribute as a dimension of the whole data and the operation gets the aggregations on one or more data dimensions. It is a good tool to visualize the data. With cube, several operators like roll-up, drill down and get pivot table can be supported. So the data analysis can summarize the data in any dimension(s)(roll up), analysis any part of specific data in the cube(drill down), analysis data trends(pivot table) and more complex operations combined with traditional analysis approach.
While the analysis allow to aggregate data along any hierarchies, its flexibility may still be limited in several ways:
1.What if the relational table is not pre-modeled, like table containing only some raw data without schema? One of the concerns on this is mentioned in the paper: if there are known functional dependencies, unnecessary aggregates should be avoided. Another one is that if the relationships among some or all attributes, i.e, the possible hierarchies of the data, are unknown, the cube may not help since many aggregations can mean nothing and it would be nontrivial to find out those relationships in this way.
2.Cube is conceptually more complex. It extends a 2-d table to a multi-dimensional data aggregates, which may become more abstract to normal people when the dimensions get higher than 3 or 4. In practice, this may take time figure out analysis strategy on a cube than several straightforward 2-D or 3-D graph.In addition, this might affect the communication efficiency between an engineer and some non-engineer client.
SQL’s GROUP BY operator allows aggregating records based on their values for a certain attribute. However, often what a DB user wants is not just one aggregation, but multiple to be returned in a query. For example, a car dealership might want to compare sales for “Black cars in 1994”, “White cars in 1994”, “Black cars in 1995”, “White cars in 1995”. Because these categories overlap, the only way to express this using GROUP BY is by UNIONing several GROUP BY statements together. This paper proposes two new SQL operations, ROLLUP and CUBE, to help make this easier.
ROLLUP takes a set of columns (c1, c2 etc.) and performs hierarchical GROUP BY operations. So a ROLLUP c1, c2, c3 operation would return the union of GROUP BY c1, c2, c3; GROUP BY c1, c2; and GROUP BY c1;. CUBE performs GROUP BY operations on every combination of attributes. So in addition to the results returned by ROLLUP, CUBE would also return GROUP BY c2, c3; GROUP BY c2; and GROUP BY c3;
The paper does not get into specifics about computing the ROLLUPs and CUBEs, only to say that techniques for implementing GROUP BY can also be used for implementing these operations.
This paper clearly demonstrates a deficiency of SQL, and proposes a syntactical solution. It also motivates why this deficiency would be a pain point for certain common use cases. It supplies many examples of these use cases.
The flow of the paper seems scattered, and for some sections I did not understand why the discussion was relevant.
The paper defines and proposes a data cube operator, which is the N-dimensional generalization of the GROUP BY operator in SQL. The data cube enables operations that are frequently done especially in OLAP, such as cross-tabulation, roll-up, drill-down and calculating sub-totals. |
The authors motivate the need for the data cube operator by looking at requirements of the visualization and data analysis tools. These tools represent the dataset as an N-dimensional space and performs “dimensionality reduction” to summarize data by focusing on certain dimensions of interest. The paper discusses why the GROUP BY fails to meet the aforementioned requirements of visualization and analysis tools, then defines relevant operators (CUBE and ROLLUP) for the SQL standard. When this paper was written, the SQL standard did not support such features even though there were some vendor-specific SQL extensions that supported them partially.
The paper demonstrates that the GROUP BY operator is not suitable for constructing histograms or roll-up/drill-down operations. The main idea is to introduce an ‘ALL’ value to represent a set of aggregated attributes. This enables creating a data cube that can be represented as a relation without generating too many columns. This is explained well in the paper with tabular examples of different representations.
The paper continues to explain the syntax of CUBE and ROLLUP operators and their implementation while also looking at three categories of aggregate functions and their implication for computing data cubes. While the paper motivates well on the need for the data cube operators and the introduction of the ‘ALL’ value, it was difficult for me to follow the explanations afterwards. As the paper goes on to explain the technical details of the data cube operators, it could have used better examples than SQL queries as it places burdens on readers to understand the implication of each SQL query example. Even if the query examples in the paper are not difficult to understand, they were mostly just used as mere ‘examples’ without contributing much in terms of explaining the details of the data cube operators. For me, the paper failed to engage me to follow and comprehend the subsequent chapters after the chapter 2.
Although I have personally had a difficulty in following this particular paper, it certainly was a seminal work. The concept of data cubes is very important in OLAP workloads today. As a final remark, I think that a real-world example of the implementation and application of the data cube could have been useful for my understanding of the topic in the paper.
This paper discusses the CUBE operator, an extension of a common SQL operator GROUP BY. In data visualization, users are typically interested in taking data that is represented as an N-dimensional space and slicing it into a smaller 2D or 3D subspace. The problem the paper addresses is that the GROUP BY operator in a standard relational database does not easily support the needs of data visualization. More importantly, creating histograms, roll-up tables, and cross-tabulations are hard an complicated in the standard SQL language. It is possible to create the results or some crude form of them, but the required SQL expressions can be hard to construct and hard to optimize.|
The solution the paper suggests is the CUBE and ROLLUP keywords that work with the group by operator to support data visualization. CUBE works by overloading the GROUP BY operator. The operator first performs a normal GROUP BY aggregation over the column elements specified in the SELECT field, then a union is performed over each of the super-aggregates. Furthermore, an additional value ALL is added to each column that is not the aggregate function. ALL is the combination of all the possible categories in the column. The paper also uses the ROLLUP keyword to support queries only interested in drill down tables. ROLLUP forces the operator to only produce the super aggregates.
I find it curious that the paper chose to improve the functionality of relational databases to support data visualization. It seems that if users were extremely concerned about easily performing data visualization, it would be better to create a specialized database to support their needs rather than tacking on new functionality to the jack-of-all-trades database. I think the paper chose this approach in part due to the fact that the optimizations for the GROUP BY operator extends to CUBE and ROLLUP since they are just overloading the operator. However, it was not clear to me the performance of these new keywords. All throughout the paper I noticed that certain computations scaled exponentially. For example, insertion of a new tuple requires 2^N calls of Iter() and computing the cube looks at 2^N - 1 super aggregates. In practice, exponential computations severely degrade the performance of the system, but there seemed to be no indication of this in the paper. In fact, other than the fact that the SQL standard had been supporting this new operator, there seemed to be no comparisons of CUBE verses a normal GROUP in a traditional system.
This paper talks about Data Cube, an operator that generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs. This is important because many of these features are being included in standard SQL and add efficiency to SQL database extraction. The paper approaches this task by discuss how CUBE and ROLLUP make up for some shortcomings of GROUP BY.|
Issues with GROUP BY arise because it does not support histograms, roll-up totals and sub-totals for drill-downs, and cross tabulations. GROUP BY does not allow for construction of histograms (which can be useful if, for example, we were looking for the daily maximum reported temperate for each nation given longitude and latitude. Without support for histograms, SQL would have to use nasty nested queries), directly. Rolling-up data and drilling-down data are important because sometimes data aggregated at each level to produce a sub-total, but this representation is not relational due to empty cells. Adding columns so that there are no longer any empty cells is not a practical solution because it adds huge amounts of domains in the resulting table. Data cube provides an elegant solution by bringing in the ALL value, which is a dummy value that fills in super-aggregate items. Cross-tabulation is a symmetric aggregation results table. The cube operator can thus generalize histograms, aggregaes, group by, roll-ups and drill-downs, and cross tabs. Creating the cub starts with obtaining the power set of the aggregation columns. This paper then overloads the GROUP BY operator because the CUBE is an aggregate operation and a relational operator with GROUP BY and ROLL UP as degenerate operator forms. Implementing a full cube is more functionality than needed, so this paper resolves this by implementation ROLLUP, which produces super-aggregates, in addition to CUBE.
Algebraically, the CUBE of a ROLLUP is CUBE and the ROLLUP of a GROUP BY is a ROLLUP, and thus the aggregation operators should be ordered with the most powerful, the cube, at the core, followed by the roll-up of the cube, and followed by the group by of the roll-ups. ALL adds complexity because it adds more rules, and thus it is good to avoid ALL values by using NULL instead, not implementing the ALL() function, and implementing GROUPING() to distinguish NULL from ALL. Techniques for computing aggregates include minimizing data movement and processing cost by computing aggregate at the lower system level, using arrays of hashes to organize aggregation in memory, keep a hash symbol table to map aggregation string values to an integer, sort or hash to organize data by value and then aggregate with sequential scan of sorted data, use parallelism to aggregate partitions and then coalesce the aggregates. Aggregate functions are classified as distributive (i.e. COUNT(), MIN(), MAX(), SUM()), algebraic (i.e. MaxN(), MinN(), center_of_mass()), or holistic (Median(), MostFrequent(), Rank()).
Some drawbacks of this paper are that it does not provide a quantitative analysis on the improvements in performance contributed by CUBE and ROLLBACK. I would have liked to see numbers that show the amount of time saved by CUBE and ROLLBACK than GROUP BY. Another drawback is that I would’ve liked to see a discuss of industries and research that use the CUBE operator now, and how the CUBE operator has impacted industry/research.
This paper discusses an aggregation operation for relational databases. The authors generalize existing aggregate functions into an operator they call the “cube”. This can be used for data visualization applications. The paper describes the cubes operations and how they can be used in SQL as well as their efficiency and how users can define their own functions. They discuss the issues with current GROUP BY statements with histograms, roll-up totals and sub-totals, and cross tabulations.|
This paper motivates the three functions: histograms, roll-up and roll-down totals, and cross tabulations. For use in these problems a cube operator makes sense. The paper demonstrates that creating this operator would make it easier to implement these types of queries. A thorough explanation of why they chose the syntax and how they suggest revising it are included in the paper. The authors were confident that this was useful and proposed it as a change to the SQL language and it was eventually added.
There are several drawbacks of this paper. The paper discusses how you would implement certain types of histograms on page 34 and shows a nested query as an alternative. Then the authors imply that this is not acceptable but do not explain. There is also no analysis of the differences in execution time or memory usage for a nested query versus the cube operation. The cube operator is not tested and compared empirically with alternative ways to implement the functionality it seeks to replace. There is also no data cited or provided that support the fact that these are common queries worth changing the language to support. How many applications really use this type of reporting? Should we implement functions for a small subset of applications in a language that is so general purpose? Other papers we’ve read in this class talked about the evolution of tiny languages and highly specialized query languages. Perhaps this type of function belongs in a specialized query language, and not in SQL.
Part 1: Overview|
This paper defines a new operator, called data cube, which performs as a relational aggregation operator and also fits in SQL. The data cube operator combines and generalizes the histogram, tabulation, roll-up, drill-down, and subtotal constructs and express them in relations. Cube operator can carry out N-dimensional aggregation. Four basic data analysis processes are introduced, formulating, extracting, visualizing, as well as analyzing. While performing summarization and dimensionality reduction, applications would need all operations including histograms, tabulation, etc. The traditional operator GROUP BY fails to cater this kind of need and that drives this paper, proposing CUBE operator.
Standard SQL language does not allow directly constructing histograms. People need to use nested queries to get the aggregated report. In addition, the rolling-up and drilling-down operations are not well supported by SQL as well. The cube operator works straight forward. The GROUP BY operator provides the core of CUBE operation, and then the core becomes surrounded by N-1 level lower dimensional aggregates.
Part 2: Contributions
The CUBE operator overloads the standard GROUP BY operator in SQL and generalizes it to be suitable for histograms, tabulation, rolling-up, drilling-down, and subtotal operations which are commonly used by applications. The CUBE operator is purely relational and fits in SQL standard. They have the syntax proposal for CUBE, ROLLUP, etc.
The cube operator generalized aggregates, group by, histograms, roll-ups, drill-downs, and cross tabs. The cube is easy to compute for the wide class of functions, both distributive and algebraic.
The CUBE module has been exposed in the industry for a year. Implementing language is extremely difficult. They have to deal with enormous corner cases. They provide the proposal for the CUBE syntax.
Part 3: Possible Drawbacks
They do not support ALL value, instead, they use NULL value. However the user can simulate all values themselves. They still need to extend CUBE for rank, N_tile, cumulative, and percent of total operations in a more computationally efficient way.
This paper discusses about a relational operator called “data cube” which allows to specify aggregation operation across multiple attributes. The problem addressed is that SQL aggregate functions and the group by operator produce zero or one dimensional aggregates. However, different applications including: visualization and data analysis tools need n-dimensional generalization of these operators. The cube operator takes individual n number of aggregation attributes and aggregate them, producing n-dimensional cube.|
The paper mentioned that traditional relational system fit into the multi-dimensional system by modeling n-dimensional data as a relation with n-attribute domains. Furthermore, the paper addressed limitation of the “group by” operator. The common problems is that this operator is not capable of construction of histogram, roll-up reports, drill-down reports, and cross tabulations. For example, this “group by” operator does not allow a direct construction of histogram i.e aggregation over computed categories. Furthermore, such traditional operator doesn't handled roll-ups report using subtotals. Reports commonly aggregates data at different levels, at a coarse level followed by a finer level or the other ways. Going up the levels is called rolling-up and going down is drilling-down. Such reportings are handled by the data cube operator. The cube operator creates a data cube that requires generating the set of all subsets of the aggregation columns. The output of this query can be selectively customized to suite to requirement by the reporting application. For example, the roll-up or drill-down report might not need the full cube. In SQL this operation is included by using the keyword “cube” along with the group by operator.
The main strength of the paper is its detailed explanation of the motivation of the cube operator. It explained the shortcoming of aggregate and group by operators in finding histogram, sub totals for roll-up and roll-down reports.
The main drawback of this operator is that it was suggested a long time agoes. It would be more insightful to evaluate its effectiveness in present day’s DBMS and in supporting queries which involves very large data sets.
This paper defines n-dimensional generalization of the operators like group by, as data cube or simply cube. This paper explains this operator, how to fit this operator in SQL and how users can define new aggregate functions for it. Efficient techniques are also discussed for computing the cube. Now, many of these features are being added to the SQL standard, which is one of the key contributions of their work.|
As for the four steps to data analysis looking for unusual patterns in data:
1. formulating a query
2. extracting the aggregated data
3. visualizing the results and
4. analyzing the results and formulating a new query
Analyzing data in multi-dimensions is hard and the paper introduces how relational database can help to support efficient extraction of information from a SQL database. The existing SQL does not support histogram generation, which makes it hard to analyses data pattern of multiple dimensions.
The cube operator generalized and unifies several common and popular concepts, including aggregates, group by, histograms, roll-ups and drill-downs and cross tabs.
This paper provides very detailed and solid examples, especially for explaining the problems of the existing group by operation and how cube works. The code examples are short but servers the purposes perfectly.
This paper doesn't provide enough discussion about implementation difficulties of this new interface. It turns out that, making this new interface available is very difficult since it has been exposed to the industry for a year but they are stuck because it is extremely hard to handle enormous corner cases. Providing syntax without considering possible implementation difficulties is not enough.
This paper presents the two operators Rollup and cube that are extensions of the group-by and aggregate functions of SQL for multidimensional databases. It also explores the idea of a relational database being able to support efficient extraction of information for visualization and data analysis. |
These operators are specifically applicable to multidimensional/OLAP databases. These operators were created in order to improve the inefficiency of group by to be able to handle these queries. In the example given in the paper, a typical group by would result in multiple scans of the data and multiple sorts and hashes and therefore a long wait.
With the cube operator, it computes the values in all possible dimensions and returns aggregate results over multiple dimensions thereby being able to visualize in a detailed manner. These operators use the commutativity, associativity and holistic nature of the aggregate functions in order to compute the basic aggregate values and then use those values to calculate the super-aggregates rather than recompute 2^N values whenever a row is updated, added or deleted.
The authors have also specified the necessity of introducing the ALL keyword despite being a situation where it might result in exceptional cases and they have provided replacements. However, it might result in a few meaningful queries being excluded by not using the ALL keyword.
Rollup is a special operator in order to calculate the aggregates of multiple values over one dimension rather than computer all possible aggregates over all dimensions. They explain how using Rollup and Cube gives the user the ability to manipulate the data according to the structure of the visualization required.
One of the key advantages is that the user does not have an overhead in having to understand how these operators work since they are in line with the SQL standard.
Some of the issues that I foresee are as follows:
These aggregate values are typically calculated in memory. Though the authors have specified how only super-aggregates will be calculated on memory using the aggregate values calculated on disk, this may result in a lot of time to generate the results. Understanding how to use rollup and cube despite the simplicity of the operators might have a learning curve considering the data is still represented as a relational database and cannot be visualized just by viewing the tables.
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals |
This paper mainly talks about new methods concerning multi-dimensional query in OLAP. To begin with, the complication of fitting relational databases into multi-dimensional data analysis is put forward. And furthermore, the paper focus on problem of supporting multi-dimensional operations in relational database.
Traditional RDBMS can use groupby and aggregate functions to generate multi-dimensional queries, but the queries been written would usually be very long, and enumerating all possibilities over the large N-attribute domains. For example, if the query is associated with cross-tabulation histograms over N-attribute domains, the query plan would be consists of 2^N times execution of aggregation on each possible combinations. To optimize this type of OLAP, this paper proposed two types of SQL extension:
1.Cube: generate all possible sub-combination aggregation over the same level of concept hierarchy like :
CUBE(A,B,C) : (A,B,C) (A,B) (A) (A,C) (B,C) (B) (C)
2. ROLLUP(DRILL-DOWN): generate aggregation on a data cube by climbing up a concept hierarchy for a dimension like:
This paper also introduced some method to maintain a multi-dimensional data cube, but there are only efficient algorithms to deal with updates on distributive and algebraic aggregates mentioned at here.
To sum up, this paper serves well in explaining the problem of histograms, roll-up & drill-down and cross-tabulation that the traditional RDBMS are facing, and by introducing the new concept of multidimensional data cube, it seems some of the problems can get a more optimal solution. However, there are also some weakness in this paper:
1. In this paper, the methods for maintaining data cube for Holistic aggregates are far from satisfaction, it basically avoided this part. As aggregates like rank(), medien() can be essential in the current OLAP functionality, the solution without dealing with all important user need can have a better chance to fail in boosting multi-dimensional queries.
2. In this paper, only conceptual ideas of construction of multi-dimensional cubes are presented, not enough details are provided for the audience to analyze the computational complexity of new approaches.
3. In this paper, new ways of supporting multi-dimensional operations are proposed, but no performance test are conducted to convince the audience of the benefit of adopting new methods as well to help programmers to judge what is the best situation to switch to data cube.
Data analysis applications categorize data values and trends, extract statistical information in order to look for unusual patterns in data. In order to have more readable visualization result, we need to reduce the dimension of dataset. Then aggregation is a good way to get summarizing data to reduce the dimension.|
The GROUP BY have many limitation: One is there is no direct way to group by a calculated value, people must do it in a nest query way. Another is there will be too much UNIONs to do roll-up and cross-tab which is too long code and involve many duplicated calculation.
Then the paper introduce two new operator: CUBE and ROLLUP. The CUBE operator is the N -dimensional generalization of simple aggregate functions. The cube will get all the subset of aggregation in the select list, however the rollup can get a subset table of cube which is a increasing dimension aggregation. This is because sometimes all the information in the cube is not necessary.
In the end, the paper talks about how the cube is computed and maintained. A naive way to do this is 2^N algorithm which allocate one handle for each record in the cube result table. One tuple in the base table will update the corresponding handles. But sometimes no need to use this kind of “Brute Force” way. For distributive and algebraic aggregation functions, can use the result of sub-aggregation to calculate the super-aggregate result. For maintaining the cube, sometimes it is difficult to maintain a holistic update because cannot update incrementally.
This paper start from the motivation of aggregation and the limitation of the GROUP BY, talks about the necessary to have a CUBE operator inside SQL. Then it talks about the CUBE in SQL and some concerns about it, like avoiding “ALL”. The most interesting part is CUBE can have a more optimal way to compute it when the aggregation function is distributive and algebraic, which is a key improvement for building a CUBE using GROUP BY. It also point out the future work which is study algorithm to deal with holistic actions.
(1) The weakness for cross-tab and roll-up is the same: Too much UNION. There is no need to separate into two points.
(2) Should introduce a little how the query optimizer deal with the CUBE operator.
(3) Should introduce some current study on optimizing CUBE calculation and maintain.
(4) The background of DATA CUDE is data warehouse, should have a simple introduction.
(5) Doesn’t clearly mention how CUBE deal with “Histogram” problem mentioned in Part 2.
OLAP workloads are one of the primary use-cases for relational databases. Most of these workloads involve summarizing data. This paper looks at a problem faced in contemporary databases by database users: it is difficult to form histograms, roll-up and sub-totals, and cross tabulations using SQL. The authors introduce two new operators: CUBE and ROLLUP. CUBE generalizes the GROUP BY SQL operator, and ROLLUP produces aggregates from these that can be used for reports. The paper then details ways that CUBE and ROLLUP can be used together.|
This paper is nice in that it provides a new operator that adds new functionality by generalizing existing operators rather than augmenting them. One shortcoming I noticed in the paper was that the authors claim that "ROLLUP and CUBE will serve the needs of most applications" however they do not justify this. I would have liked to see a more detailed analysis of what types of (common) applications exist, whether their needs are met by this, and whether their needs are already adequately met with the existing set of SQL operators. The bar for adding new operators to a language like SQL should be high since once added, they cannot be removed.
This paper argues that when we are trying to do data mining in SQL, the GROUP BY clause is not enough and we need the proposed CUBE operator to generalize common data mining concepts. GROUP BY clauses either cannot perform the specified operations or will perform them in a long time, so the positive contributions of this paper are extending the SQL language to perform data mining operations and explaining how to maintain those values as data is inserted.|
In standard SQL, the GROUP BY operator does not allow us to construct histograms and expressing roll-ups of data. For example, if we had a table of all of the phones that Samsung sold with the times, locations, and color, we would not be able to create groupings with each combination of country and color and compute their totals with just one SQL call.
Therefore, the proposed operators are the CUBE and ROLLUP operators. The CUBE operator first does an aggregation of all of the items in the GROUP BY clauses and then it creates the super-aggregates of the global cube. In the end there will be 2 to the N minus 1 super-aggregate values. This will allow the user to get the full cube of aggregates, but what if the user wants a roll-up or drill-down report? This is where the ROLLUP operator comes into play. The GROUP BY, ROLLUP, and CUBE operators can be used together by first specifying the GROUP BY list, then the ROLLUP list, then the CUBE list. For example, we can group by manufacturer of the phone, roll up by year and cube by model and disk space.
Overall, the paper does a great job in extending SQL to perform data mining operations, but there are some weaknesses with the paper:
1. The paper goes through the syntax and how to implement CUBE and ROLLUP. However, there are no experiment results for performance. I would have liked to see how much faster CUBE is compared to doing N scans of the table when brute-forcing all of the aggregations.
This paper proposes the introduction of the CUBE and ROLLUP aggregates into SQL. These aggregate operators are intended to provide a simpler way to express multi-dimensional GROUP BY queries that might otherwise require very complicated SQL queries that contain a large number of unions. The paper also proposes the introduction of the ALL keyword, which is necessary for the addition of CUBE and ROLLUP and which allows for a more concise and complete listing of aggregates and super-aggregates in result tables.|
The CUBE and ROLLUP aggregates are designed to allow users to analyze multi-dimensional data patterns. For example, a car dealership may want to aggregate sales data by model, color, year, or any combination of these attributes. The combination of the CUBE aggregate and ALL keyword allows for the enumeration of all aggregates and super-aggregates in a complete table with ¬Π(C_i+1) rows. While this table grows exponentially, it is more concise than adding additional aggregate columns, as in Table 3b of the paper, which grows with the power set of the number of aggregated attributes. It also avoids the empty spaces present in Table 3a, which is no longer relational.
Aggregate functions can be classified as either distributive, algebraic or holistic. Taking the CUBE of distributive and algebraic aggregates can be sped up by aggregating tuples in one dimension, then aggregating the aggregates at the next level. This is important in selecting CUBES. Keeping CUBES updated is a more difficult issue, as some functions, such as MAX are distributive for SELECT and INSERT, but are holistic for DELETE. Since the properties of each aggregate vary by operation, optimizing for the CUBE of each aggregate could be very complex.
One of the biggest drawbacks of this paper is that implementing the changes the authors propose to the SQL syntax would require an enormous effort. The authors acknowledge this, but feel as if the work is worth the effort. I would like to see more analysis of this claim. For instance, the authors state that all values would need to be treated as singleton sets in order to maintain consistency with the set semantics involved in the new aggregate operators. These new features would require a rewrite of SQL that touches the lowest levels of how values are handled. While the simple syntax that would come with the addition of the ROLLBACK, CUBE and ALL operators/keywords would be nice, it doesn’t seem to offer the kind of improvement that would validate such a massive overhaul of SQL.
This paper introduced data cube, which generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs. Before, the SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. However, for recent statistical use, applications need the N-dimensional generalization. Therefore, this paper talked about data cube.|
The paper described the motivation for data cube. It mentioned the problems with GROUP BY operator in SQL. These problems include histograms, roll-up totals and sub-totals, and cross tabulations. The GROUP BY operator does not allow direct construction of histograms, which is aggregation over computed categories. In addition, it does not support going up the data levels and going down the data levels, which is called rolling-up and drilling-down data. Also, the symmetric aggregation result called cross-tabulation is not supported. Therefore, data cube is introduced to deal with these three problems.
The strength of this paper is providing many examples to illustrate the ideas. For example, it used many example tables to introduce the functions of data cube, and also to indicate the problems of GROUP BY operators. This gives readers a concrete picture about the functions of data cube and how these functions work.
The weakness of this paper is that it does not provide some experimental results of using data cubes. Thus, the readers cannot really understand the advantages of it. In addition to describing the functions of data cubes, I think it would be more convincing to provide some concrete experimental results to indicate the advantages of using data cubes.
To sum up, this paper introduced data cube, which generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs, which cannot be supported by GROUP BY operator.
This paper was an introduction to Data Cube which is a relational aggregation operator in SQL. The authors started out this paper by addressing certain problems that SQL currently is lacking in aggregation. The first problem is the creation of histograms using SQL group by, which requires an ugly nested query. The second problem was what they called “drilling down” or “rolling up” data, which basically means subtotals or picking out a more specific segment of data and viewing them side by side. The last problem was what they referred to as “cross- tabulation” which is basically an n by m grid table with totals at the final row and column. Standard things that you can see in Excel or other spreadsheets but not using SQL. Cube was built to address these issues. |
They have introduced the operators CUBE and ROLLUP to help solve these problems. They have a terrific picture that demonstrates how ROLLUP and CUBE fit into a SQL statement on page 41 that I strongly recommending checking out (it is difficult to explain via text and not an image). Basically GROUP BY is the most general, then ROLLUP is more specific and CUBE is the most specific.
The paper also briefly touches on how to index a cube which they have made addressable by row and column similar to how you would index a multidimensional array. This provides ease of use for analysts using the cubes.
One of the greatest strengths of this paper in my opinion was the diagraming and examples they gave. They did a terrific job of displaying rather complex concepts in a very simple looking and intuitive way. This helped explain the concepts to the readers immensely and gave me a much deeper understanding of the issues and solutions. They also interspersed examples very well to help provide numbers and make the problems and solutions more clear.
If I had any complaints or weaknesses of the paper I would have liked slightly more implementation detail. I feel they could have gone a little more in depth as to how the cube was implemented and maintained but only slightly more as I think the greatest strength of the paper was how understandable it was and the lower down you go the more that goes away normally. The only other question I have with the paper is if it was adopted. In my DB classes I have used GROUP BY but never has CUBE or ROLL UP been mentioned and I wonder if that is because it was never adopted and if so why?
I think this was a terrific paper, one of my favorites we read actually. It did a great job of explaining a need and demonstrating how much more convenient the solution was. It was clear and easy to follow and did a terrific job of giving examples throughout. Overall I think this was a great paper and enjoyed reading it.
In the age of OLAP querying, data analysis relies, in some non-trivial portion, on aggregation operations. In many traditional scenarios, this would amount to SQLs GROUP BY. However, to paraphrase the authors of this paper: "GROUP BY stinks. We need something better." The limitation of many relational databases is they generally compress information into 0,1,2-dimensional space, where data visualization or analytics tools can make more use of viewing information as higher-dimensional clusters of information (via histograms, cross-tabs, "drill-down," or "roll-up." Being able to aggregate data across different combinations of columns|
An interesting problem posed by the creation of the data cube is its creation; how can we actually store the data needed by this multivariate representation? The paper proposes a method of materializing optimal *parts* of the cube to materialize, thus increasing response time of data access. The ingenuity comes from materializing “dependent” data, i.e. forming a graph representation of query dependencies if one query’s results can be used to answer another query’s results. There is a tradeoff between time lost by computing values as they are needed in the hypercube and the space consumed by actually materializing the data. Essentially, the CUBE operator attempts to generate a union of all the group by’s on the listed attributes by using “ALL.”
There are some sources of confusion I have regarding some design decisions; specifically, why overloading NULL would be an efficient design decision, or if it is highly application dependent. It seems like it would require special-case exception handling that is even more difficult to deal with. In addition, how would one determine a cost function for query execution; this seems like it is another problem dealt with by the query optimizer, so is it an extension of that arm of the query engine? Or is a separate machination built-in entirely?
In general, SQL has the five builtin functions that return a single value (min, max, avg, sum, count). These single values can be turned into tables of aggregate using the builtin group by. However, group by has many limitations. It does not lend itself well to histograms and there are various issues that arise when trying to do rollups, subtotals or drilling down. Data cube alleviates many of these issues and while adding extra functionality.|
Data cube is a novel operator that provides N-dimensional aggregates by treating each of the N aggregation attributes as a dimension of N-space. Data cube is implemented as the “Cube” operator. It is essentially equal to the union of all possible group by operations that can be expressed on a given set of attributes. Thus, it is equivalent to the power set of the columns that are being acted upon in the cube operation. Another contribution of this paper was the SQL exertions added to support decorations. Decorations are the attributes that are functionally the attribute that is being aggregated.
One flaw of this paper lies within the generation of the power set. For a large set of attributes this can be an extremely expensive operation (though the cost is not explicitly explained in the paper). There is also the very realistic possibility that all the dimensions of aggregations aren’t need, thus it may be advantageous to compute them on demand.
This paper explains how to perform data visualization in SQL database/relational database using CUBE operator that is developed from SQL GROUP BY operator concept (including ROLLUP). For analytical purpose, data is often represented as N-dimensional cube. This paper explains how SQL help user to achieve this visualization.|
In general, the paper starts with basic SQL features for data visualization. It shows how SQL aggregation functions are utilized in data extraction and summarizations. It also explains a bit about and vendor-specific SQL extension. The paper continues with the problems with using the basic GROUP BY syntax. First, user needs to build nested queries in order to make a group within group. Second, even with the assistance of ALL, user still needs to list the possible use ALL (i.e.: when there are many dimensions, then user has to make ALL for each dimension) and write it as UNION query. The paper then introduces the CUBE and ROLLUP operators, which facilitates N-dimensional data cube. It elaborates on the algebra of the said operators, then considering how they could replace the use of ‘ALL’ operator (especially to recognize aggregate column). It continues with the “decoration”, fields that are used as labels but not as dimensions. The paper also discusses star and snowflake schema/queries a bit to show how granularities are formed in the dimensions. The next section talks about how to incorporate the programming into the data cube. The paper also elaborated on the computation of CUBE and ROLLUP that is based on SQL aggregate functions. Last, it talks about maintaining cubes and rollups.
One major contribution of this paper is showing how the relational engine can support efficient extraction of information from SQL database. Initially, such queries are performed using GROUP BY and other aggregate functions in SQL, but CUBE operator has improved GROUP BY and adding ROLL UP syntax to enable roll-up/drill down data visualization. It unifies several common and popular concepts such as aggregates, group by, histogram, roll-up/drill-down, and cross tabs.
However, this paper still does not answer the need of comparison between records in certain time period (time series). While the CUBE operator has facilitated the visualization, it is still not enough to support comparison queries. Another thing is that the examples used have at most 3 dimensions. What if I want view the data in 4-dimensions? I also think that the section about maintaining cubes and roll-up can be elaborated a bit, since, in practice, the data in data visualization is a moving data, not static. --
The purpose of this paper is to present a new N-dimensional aggregation operator. Data cube, or just cube, treats each attribute as a separate dimension, and the data it returns form an N-dimensional cube. These N-dimensional data points can then be aggregated again the move them into lower dimensions. These cubes have been deployed into SQL. This structure allows users to define new aggregate functions over these cubes, and the paper discusses efficient computation of the cube itself|
The motivation behind this work is to allow a database system to support capabilities that might normally be handled by a data visualization engine, such as efficient extraction of information along interesting dimensions. The technical contributions of this paper come in its walk-through of the new system, as well as detailing how it is integrated with existing SQL features as it is included. The paper does provide several examples to assist in explanation of some of these tasks. It presents a novel operator to much more efficiently perform common data aggregations on the database side that might normally have to be ported to something like Microsoft Excel. The authors note that the queries they are optimizing for with the introduction of CUBE are very awkward to write in SQL, and as such they are simplifying these useful data visualization tasks for users. It is difficult for me to discuss the technical contributions of this paper in more detail because it was so hard to read and follow as detailed below.
One strength of this paper is that is continually goes back to the existing SQL feature GROUP BY, and points out reasons why their CUBE operator is more effective and more efficient that the currently existing SQL infrastructure.
I think a big weakness of this paper is that it’s scatterbrained. I had a very hard time following any logic through the paper. I feel like many, many different concepts related to data cubes and their computation were presented, but very very minimally discussed, and so I feel like I come away with little understanding of any of the complexities or nuances of the system. I think this is a very poorly written and poorly organized paper, and due to this fact I have a hard time critiquing any of the actual technical contributions since they are presented so minimally and opaquely.
Reviews: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals|
To begin with, this is another paper with a lot of code and pseudo code in it. Thumb down.
This paper proposes an operator called cube that produces N-dimensional aggregates. This is in need because many data analysis applications need to look across many data dimensions for anomalies or unusual patterns. On top of that, many advances data analysis algorithms have been developed and are well suited for high-dimensional data analysis problems. This paper discusses the workflow of a relational engine on supporting efficient extraction of information from a SQL database. It discusses relevant features in Standard SQL and some extensions and later provides instructions on how Cube implement advanced operations.
This paper provides detailed instructions and explanations on how Cube is implemented. The potential advantages of this operator compared to zero-dimensional and one-dimensional aggregation operators are also straightforward. Despite the fact that this paper has a lot of pseudo code in it, the overall presentation is clear and reader friendly. However in the experimental section, which there isn’t but it’s ok since it’s a theory paper, it would be more helpful to provide actual arguments of how much more efficient this operator would be compared to other baselines, e.g., other implementations of N-dimensional operator or simply the zero-dimensional or one-dimensional operators.
This paper proposed the Datacube, or CUBE and ROLLUP, two n-dimensional aggregation operator that can efficiently generalizes histogram, cross-tabulation, roll-up, drill-down and sub-total constructs.
This paper discussed the limitation of GROUP BY operator in many ways, especially for histogram generalization. It then discuss the proposed CUBE and ROLLUP operator and the methodology to overcome the issues with GROUP BY operator. After that, the paper gives detail introduction the implementation details of CUBE and ROLLUP operator, especially the data structure used in CUBE: the DBMS system maintain a built-in table to maintain the aggregate values so that user can query these information with little cost. Moreover, this paper also discusses how to use CUBE and ROLLUP in the SQL syntax and also the problem about maintaining the data structure.
1. This paper proposed novel operators to handle aggregate computation in an expressive way, and also introduces the efficient data structure and algorithm to underplaying the operators. Since aggregate operations are so common in basic and advanced data analysis, I believe this improvement has great potential in efficiency and convenience.
2. This paper add this feature in Microsoft SQL sever, a widely deployed SQL engine, which on the other side prove the usefulness of this two novel operators.
1. Although this paper gives full introduction about the algorithms and benefit of CUBE and ROLLUP operator, it would be great if this paper can provides some quantities analysis experiment between the novel aggregate operator and the old GROUP BY operator, which I believe would be more convicting to show the benifit.
2. One of the concern I have is the overhead of maintaining the data structure to keep tracking of the aggregate operation. It would be great if this paper can give some profiling experiment on the overhead of maintaining the underlying data structure of CUBE.