Review for Paper: 11-Optimization of Sequence Queries in Database Systems

Review 1

Sequence queries are an important in domains such as finance and medicine, but that may not be handled efficiently by conventional databases. A financial trader may wish to look for a time series pattern, such as the “double bottom” pattern, which has two local minima around a local maximum; but in standard SQL, it is difficult to formulate a query for instances of this shape, and slow to execute such queries. It would be useful for those who work with time series databases to have a specialized DBMS that makes their common tasks efficient.

In “Optimization of Sequence Queries in Database Systems,” the authors present SQL-TS (time series), an extension of SQL that uses an efficient algorithm to mine time series for occurrences of certain sequence patterns. SQL-TS provides a CLUSTER BY primitive that specifies which data to group together as one time series, a SEQUENCE BY primitive that specifies which variable to use for ordering the time series, and a star operator that specifies one or more consecutive occurrences of a value meeting certain conditions. The star operator allows the programmer to create a pattern involving at least one, but possibly many, descending values in sequence, for example. The authors extend the Knuth-Morris-Pratt string matching algorithm, allowing it to search time series for sequences matching a compound boolean predicate.

The work of this paper is clever in the way it extends KMP to cover logically joined sets of equality and inequality predicates, while KMP originally could match strings using only a simple equality operator. The authors do this by using three-valued logic, as well as prior work by Guo, Sun, and Weiss on discovering logical relationships among sets of boolean predicates. The resulting algorithm appears to be in some sense optimal for the search problem within a time series.

Much of the work in this paper, however, was already done by the prior authors who produced the KMP string matching algorithm, and the GSW algorithm that is used to fill in the \phi and \theta matrices. Moreover, many other aspects of the database should probably be optimized for time series usage. For example, indexes over intraday stock price data should likely include references to the start of each trading day, which might not happen automatically in a general-purpose DBMS.


Review 2

The paper addresses the need for an expressive query language for finding complex and recurring patterns in database sequences. The relevant tools which implement this such as ADT, SEQ and SRQL are not competent enough for this scenario and require a new entry to efficiently solve the problem. For this purpose, authors introduce SQL-TS, a new query language which is a minimum extension of SQL and has a new query optimization technique based on different string-search algorithms.

The paper moves on to highlight the search optimization technique by showing how the KMP algorithm (search algorithm) is implemented. An extension of the algorithm termed as Optimized Pattern Search is also discussed which is directly applicable to the optimization of SQL-TS queries. An analysis of a naïve algorithm against optimized algorithm is done which exhibits the difference between the two. An important advantage of the OPS algorithm is that it can be easily generalized to handle recurrent input patterns which, in SQL-TS, are expressed using the star pattern. This star pattern can also be optimized with the help of transformations which are explained using graphs in the paper.

The authors are successful in exhibiting the usefulness of SQL-TS for querying complex sequential patterns. The optimizations techniques seem fairly accurate to speedup the process. At the end, further work shown appends support for disjunctive queries, partial ordered domains and aggregates to the mix. They show promise in improving the OPS algorithm and finding a better alternative to the search algorithm even though its search capabilities are better than the other algorithms already.

The paper could not supplement the speedups with sufficient tests and hence doesn’t support the claim efficiently with relevant application simulations. The technique needs to address more test cases to bolster its existence.



Review 3

The need to search for complex and recurring patterns in database sequences is shared by many applications. Examples include the analysis of stock market prices, meteorological events, and the identification of patterns of purchases by customers over time. This paper investigates the design and optimization of a query language capable of expressing and supporting efficiently the search for complex sequential patterns in database systems.

Thus, we first introduce SQL-TS, an extension of SQL to express these patterns. Authors explore optimization techniques inspired by string-search algorithms(KMP algorithm), since finding sequential patterns in databases is somewhat similar to finding phrases in text. The KMP algorithm creates a prefix function from the pattern to define transition functions that expedite the search.
However, rather than searching for strings of letters (usually from a finite alphabet), SQL-TS have to search for sequences of structured tuples qualified by arbitrary expressions of propositional predicates involving arithmetic and aggregates, and repeating pattern, and aggregates(windows-based, temporal, and user-defined aggregates).

The OPS algorithm exploits the interdependencies between the elements of a pattern to minimize repeated passes over the same data. Much as in the KMP algorithm, the algorithm can capture the logical relationships between the elements of the pattern, and then infer which shifts in the pattern can possibly succeed; also, for a given shift, it can decide which conditions need not be checked (since their validity can be inferred from the two kinds of information described above). The OPS algorithm begins by capturing all the logical relations among pairs of the pattern elements using a positive precondition logic matrix, and a negative precondition logic matrix. With these matrices it derive another triangular matrix to describe the logical relationships between whole patterns. To support star operation the whole patterns is indeterminate(at least not finite), the paper propose implication graph to compute shift and next function.

Experimental results on typical sequence queries, such as double bottom queries, confirm that substantial speedups are achieved by our new optimization techniques.


Review 4

This paper presents a new design for optimization of sequence queries in database systems. It first presents a new query language called Simple Query Language for Time Series (SQL-TS), which can express the recurring (often complex) patterns in database sequences. Second, it presents how to optimize search queries for SQL-TS, where they implemented a new algorithm called Optimized Patter Search (OPS) algorithm on top of KMP algorithm. Then, it describes how to deal with the “star” operator, a key feature of SQL-TS to express recurring patterns. Finally, they provide the performance evaluations and future recommendations.

The general problem here is that in many application there is a strong need to search for complex and recurring patterns in database sequences, and basic method such as SQL is not very efficient. For example, we need to find some special patterns in sequential data for stock market prices analysis, such as the stock that went up by a certain amount for 2 days and went down by another certain amount. Traditional methods like SQL have to deal with this kind of request by multiple joins, which is complex, hard and not intuitive to optimize. Thus a more powerful tool is needed.

The major contribution of this paper is that first it develops a very good query language called SQL-TS. SQL-TS was developed on top of SQL, which makes all existing well-known query optimization techniques remaining available. There are several key features in SQL-TS:

1. CLUSTER BY clause (specify data separately in separate steam)
2. SEQUENCE BY clause (specify data in ascending sequence such as data)
3. Star Operator * (denote a sequence of one or more tuples that satisfy all conditions described in WHERE clause)

Second, it provides detailed description for the OPS algorithm, which is created on top of KMP algorithm. It can handle more general conditions than KMP (equalities only). They provide detailed implementation of their algorithm and the proof of the correctness, as well as optimal complexity analysis.

One interesting observation: I like the idea of taking use of existing well-known techniques such as KMP algorithm and SQL. They optimized these techniques to build a better tool for efficiently deal with sequence queries in DBS.


Review 5

The paper was trying to address the problem of efficiently searching for and identify complex and recurring patterns in database sequences. This problem is important as many applications such as stock analysis require such pattern searches, but the previously proposed searching methods lacked flexibility and easy integration.

The main approach proposed by the paper is to introduce the Simple Query Language for Time Series (SQL-TS) language. SQL-TS is a superset of SQL, which in addition to regular SQL functionalities, also supports additional SQL-like clauses specifically targeting pattern searching in database sequences. It also supports recurring-pattern operator (* operator). The paper extended some existing algorithms to support these extra SQL extensions.

Strengths:
- it illustrated its algorithms with simple but explanatory examples, accompanied with graph illustration. The examples and visualization explained the intuition of the algorithms well.
- The paper also concisely categorized different searching cases/branches, making it easier to reference different cases.
The technical contribution of this paper is that it proposed the SQL-based language SQL-TS which can more efficiently identify recurring patterns in database sequences such as stock prices, without asking users to learn a totally new language. The mapping of search patterns to matrix-like graph models also makes it easier for further algorithm optimization.

Drawbacks:
- In the paper is that while it proposed the new pattern searching language and proposed it to be more efficient, it did not point to the exact improvements it contributed (e.g. search accuracy, run-time improvement, easier integration, etc). It might be more helpful to give a few more descriptions of the exact previous drawbacks it addressed.
- The experiments results are mixed with the algorithm. They could be put into a separate section.


Review 6

This paper covers the problem of dealing with time-series data within a relational database management system. The authors propose adding new features to the SQL language to allow complex queries that look for sequential patterns that is not possible in SQL and would be difficult and complex with SRQL. Their new language, SQL-TS, a super-set of SQL, adds the CLUSTER BY, SEQUENCE BY, and repurposes the AS keyword within a FROM clause. These keywords allow queries that would involved large amounts of joins in traditional SQL, and would be very difficult to optimize. For example, how would you write a query to find stocks that went up by 15% or more one day, but then went down by 20% or more the next day? This would not be a simple query.

However, we can clearly see from their example queries that these queries must be very computationally expensive. Matching complicated patterns, such as the rising and falling of stock prices, would lead to an exponentially exploding search space. However, this is really only a problem if naive query plans are used - if joins between the table and itself are used to find patterns. The authors purpose generalizing the text searching algorithm by Knuth, Morris and Prat (KMP) and use to for time series queries, to recognize patterns. The authors spend quite a bit of time discussing their algorithm, and going through the mathematics of it. They show how next() and shift() can be used to deal with the star case of time series queries. However, they don't spend very much time at all talking about their implementation of experimental results. I would have liked to have seen considerably more in these areas - without knowing more about the implementation, I don't really see how their results are relevant. They seem to have only tested this on one data set, stock market data, and only looking for one thing ("relaxed double bottoms"). This hardly proves to me that their system is, as they claim, 800 times faster than the naive approach.


Review 7

The paper is proposed because the processing of sequential data is a big part of database queries in analyzing financial issues concerning pricing and purchases. While the current SQL language does not provide an elegant syntax and high performance algorithm for sequence queries. So the paper puts forward a new algorithm for the sequence data analysis called the Optimized Pattern Search algorithm.

The authors firstly identifies the expected SQL extension by showing the SQL interface. It is obviously more concise and specific for sequence queries. Then the paper shows the KMP algorithm briefly. This part notifies the readers of the idea of sequence text a pattern. Finally the paper points out the weakness of KMP and proposes OPS to handle the situations with inequalities or unfixed sequence length.

In OPS algorithm, every condition in the query is treated as a unit in the search pattern so that it has uniform format to each other and could be applied in the complex relational expressions. In this way the algorithm handles the inequality issue. Also, it handles flexible sequence length issues by modified the next and shift function.

The strength of the paper is that it explains the KMP algorithm in detail and provides a series of consistent examples for both KMP and OPS algorithms. It makes the reader better understand the process and the improvement of OPS.

Though the paper brings out a creative algorithm, it still has several limitations:
1. The paper does not discuss other algorithms related to the sequential problem except the KMP algorithms. It is better to provide a comparison among several different algorithms and give the detailed reason why choosing KMP algorithms as the start point of the OPS algorithm.
2. The amount of experiments designed to evaluate the performance of the OPS algorithm is too limited. It can have more test concerning about the comparison to other algorithms, and the performance with various conditions.

In total, the paper provides a creative algorithm to optimize the performance of sequential queries. The algorithm is based on KMP algorithm and has great improvements so that it could handle flexible length of the sequence and inequalities in the queries. The optimization is very useful in the financial and stock area.


Review 8

This paper introduces an efficient search algorithm and organization of for sequence queries. The paper introduces SQL-TS (Simple Query Language for Time Series), an extension of SQL that specify sequential patterns to help optimize sequential pattern searches. Optimizing sequential queries and pattern searching is important, as it is widely used in data analysis such as stock market prices.

The SQL-TS uses “cluster by” clause to specify data that is bundled/separated by the given variable, “sequence by” clause to specify that data must be traversed by ascending the given variable, and “AS” clause that assign aliases to specify a sequence of tuple variables. The SQL-TS also comes with a star operator that specify a sequence of one or more.
The search optimization is conducted by enhancing the KMP algorithm to suit SQL-TS by allowing inequalities, supporting recurring patterns, and generalizing to database objects. In the proposed Optimized Pattern Search (OPS) algorithm, the search algorithm involves shift(j) operation that determines how far the pattern should be advanced in the input, and next(j) which determines which element in the pattern the checking should be resumed. In a typical KMP, only next(j) is used because it is only applied for equality operator, but shift(j) can be used in OPS, as it supports inequality searches. To support the star operator, a counter keeps track of cumulative number of input triples that matched the pattern up to the element.

The enhanced algorithm optimizes the sequential queries by extending the KMP algorithm to be applied to inequality expressions and supporting recurring patterns and conditions. The algorithm appears to perform well according to the paper, but the experimental results and analysis is quite weak as there is very little quantitative data on the performance. Furthermore, the paper do not state any potential drawbacks of the algorithm, or what is the memory or implementation overhead of SQL-TS.



Review 9

This paper introduces an extended SQL language called SQL-TS that can use query language to express a pattern search and also develops an algorithm similar to KMP to optimize the query.

The SQL-TS language has a powerful expression on patter search. It adds CLUSTER, SEQUENCE and AS clauses, and also a star operator to express pattern of arbitrary length. SQL-TS syntax is simple and increase the search range of the patterns.

The corresponding optimization algorithm is called Optimized Pattern Search(OPS) algorithm. It is a modified version of the Knuth, Morris and Pratt(KMP) algorithm. The key idea of OPS algorithm is similar. It tries to avoid backtracking using the information from the previous matches. It brings out the concept of shift and next. Shift skips the input that will definitely cause a mismatch and next tells where the next compare resumes in the given pattern. Shift and next are calculated solely from the given pattern. In addition, the OPS algorithm addresses the problem of the pattern with star operator. It develops an Implication Graph for certain pattern element to find the shift and next.

The overhead of the algorithm is to calculate next and shift of the given pattern. After pre-calculation of each \theta, \fi, the time complexity of the OPS algorithm to find shift and next is O(m^3) and a certain implementation mentioned in the paper to pre-calculation is O(|S|*n^2 + |T|) + O(|S|+n^3).

The SQL-TS has a great expression power and enlarges the complex pattern search range. The corresponding algorithm shows a better performance than brute force search. However, there are maybe some drawbacks:
1.This paper does not discuss the time complexity of the OPS algorithm when searching pattern from the input. It seems that the worst case time complexity is similar to brute force algorithm.
2.Neither analyze the performance on a real database when searching in a large data set nor real time performance corresponding to complexity of the query.


Review 10

Problem:
Many applications need to search through sequences of data to find intervals that fit a certain pattern. For example, an application might want to find the intervals where a stock’s price grew, then fell, then grew again. This functionality is not efficiently expressed in SQL, so this paper proposes an extension to the SQL language, and an implementation for the extension. The extension allows multiple sequential items to be considered at a time (organized into a tuple), and for predicates to place conditions on single or multiple data elements in the tuple. A regex-like star operator * can also be applied to a data element, in which case it will match multiple sequential data elements that meet its predicates.

The paper draws ideas from the Knuth, Morris, and Pratt algorithm for finding a pattern of characters in a sequence. The idea is that for a string like s = “abcabc” that we are trying to match in a sequence, if s[0] to s[4] all match at a certain position in the sequence, then we know that s[0] to s[1] will match the sequence if we shift the string to the right by three characters (because s[0-2] = s[3-5]). The paper expands on this idea by trying to match a series of predicates to a sequence of data items, for example “p1p2p3p4p5”. Then if p3 implies p1 and p4 implies p2, then we know that if p[1 – 4] is true for a sequence then p[1-2] will be true if we shift this predicate sequence to the right by 2. This saves time when trying to find a sequence of data elements that matches the predicates.

Strengths:
This paper provides a very rigorous description of why this algorithm saves time. It also helps the reader by first describing the KMP algorithm. It also provides a novel approach to the problem of sequence queries.

Weaknesses:
The authors expect a lot of the reader’s mathematical intuition and ability to dig through rigorous mathematical notation. This resulted in a lot of frustration while reading the paper.



Review 11

The paper introduces a query language and techniques specifically designed for expressing and processing sequential patterns in data. The authors propose Simple Query Language for Time Series (SQL-TS), which is an extension of SQL for specifying complex sequential patterns. In order to solve the sequential pattern search problem, the paper also proposes an algorithm, which is called the Optimized Pattern Search (OPS) algorithm, that is again an extension of a text pattern-matching algorithm, KMP algorithm.

SQL-TS is basically identical to SQL, but has three additional clauses for the FROM clause. The CLUSTER/SEQUENCE BY clauses specify how tuples are grouped and traversed. The AS clause specifies a sequence of tuple variables. Within the AS clause, a star operator is used to express recurring patterns, where a star denotes a sequence of one or more tuples that satisfy all conditions in the WHERE clause. These extensions to SQL sound reasonable and well presented.

In contrast to SQL-TS, the OPS algorithm is very complex and not easy to grasp and understand from the paper. It is an extension of the KMP algorithm, which is a text pattern-matching algorithm, finding all occurrences of an input sequence in a matching sequence. To perform such operation, it keeps an array of ‘next’. It stores values of which element in the input sequence the algorithm has to resume comparing to the matching sequence when a mismatch is found. The OPS algorithm adds a lot of complexity to the original algorithm. For example, it has an additional ‘shift’ array, which is how many steps the input sequence needs to be shifted to the right when a mismatch is found.

In conclusion, the paper presents a novel approach to efficiently solve a problem of finding sequential patterns in data. However, the description of the algorithm was not easy to follow, especially when the paper describes how it calculates and maintains extra matrices for the algorithm. Also, the evaluation of the algorithm has a huge space for improvement. The paper simply shows a couple of examples from the experiment and states that it executed faster than a naïve approach. The evaluation could have been more rigorous, which in turn should have justified the algorithm much better than the current evaluation.



Review 12

The current SQL standard is ill suited to searching databases for complex and recurring patterns in sequential data such as stock prices. This paper introduces an extension to the SQL language called, SQL-TS. SQL-TS adds some new keywords to SQL that allows a query to express sequential pattern searches in a simple manner. Since the SQL language is a subset of SQL-TS, the optimizations in SQL also apply to SQL-TS. However, the authors propose an additional optimization, the optimized pattern search algorithm (OPS) to SQL-TS to support the complex sequence queries expressed by the language. OPS algorithm is a more generalized form of a different pattern searching algorithm called the KMP algorithm. The OPS algorithm supports predicates that involve a system of equalities and inequalities, repeating pattern expressions, and can work on general database objects.

The OPS algorithm beings by creating two matrices based on the predicates in the query. These two matrices define a new matrix S which describes the logical relationship between entire patterns. With these three matrices, two arrays called shift and next are computed and used in the main OPS algorithm. The algorithm uses the calculated values to search a sequence of values and determine sets of values that match the pattern specified by the query predicate. The algorithm is also designed to work the the * operator specified in the SQL-TS language. According to the experimental results, the SQL-TS language with the OPS algorithm optimization performs many times better than naive searches on sequential data.

This paper introduced many technical details that would have confused an educated reader causing them to skim some of the important information. However, the authors included many easy to understand examples. When discussing the new language, they included typical queries a sequential database might expect and how their new language easily expressed the query compared to SQL. Furthermore, when discussing the main contribution of the paper, the OPS algorithm, the authors included a step by step figures to demonstrate the algorithm. I think these numerous examples and explanatory figures really helped in demonstrating the power of SQL-TS and the OPS algorithm.


Review 13

This paper introduces a new query language SQL-TS and some algorithms for dealing with sequential data. This is very important research area to explore because many real-life applications involve the problem of finding matching sequential patterns in a large stream of data such as consumer purchases, stock values, weather records, ... etc. It would be very helpful if some features can be added to conventional query language to integrate such algorithms with the simple usage of declarative scripts.

An extension of SQL called Simple Query Language for Time Series (SQL-TS) is introduced. It is a superset of SQL which preserves most of the features in SQL but also has some new, powerful features that is suitable for searching and manipulating sequential data. The common usage of SQL-TS is to first group (CLUSTER BY) the tuples with an attribute and then traverse (SEQUENCE BY) tuples by an targeted attribute. This feature prevents the expensive join in SQL and also provides a way to go through the stream with window size using the additional star (*) feature.

The authors introduces the KMP algorithm for searching a pattern string within an input text string. KMP algorithm is then further generalized and extended to the Optimized Pattern Search (OPS) algorithm. OPS algorithm is capable of searching arbitrary length of matching tuples within a stream of tuples and also allowing more flexible form of predicates, for example, inequalities or range-based predicates are possible. The main idea is to generalize the matching criterion into a logic statement and to utilize previous information to make the search more efficient.

The strength of this work is that they create features on top of SQL. This preserves both most the syntax of the language and also the query optimization techniques. The example queries in this paper are very clear and demonstrate the new features for sequential data very well.

As for the weaknesses, if there is any, I would say that the experiments can be more thorough by comparing the efficiency of using SQL-TS and SQL. Also, the explanation of algorithms in this paper is sometimes confusing to me. For example, the paper mentions that next(j) should reset to "0" in some circumstances while the index seems to start with "1" in the other parts of the paper.


Review 14

This paper address the optimization of complex and recurring patterns in SQL queries. The motivation for this functionality stems from applications that require analyzing sequencial data, such as analysis of stock market prices, meteorological events, and customer purchasing patterns. For example, finding stocks that went up by 15% one or more day, and then drop down 20% or more the next day would take 3 joins and would be hard to optimize. For this example, the paper's proposed SQL-TS language can accomplish the task in one simple query. The SQL-TS language minimizes repeated runs over data by utilizing inter-dependecies between the elements of a sequential pattern.

SQL-TS specifies complex sequential patterns by adding on the SQL. The additions to the FROM clause include CLUSTER BY, which groups data from each source to be processed separately, SEQUENCE BY, which sorts the data in each group, AS, which specifies a sequence of tuple variables, and the * (star operation), which captures repeating patterns. SQL-TS also generalizes the KMP optimal text search algoithm to to the Optimized Patter Search algorithm (OPS), which can handle general conditions in time series applications. While KMP can only handle the equivilant operation, OPS can handle inequalities. While KMP only needs to keep track of the next index, OPS needs to keep track of the next and shift index. This is because OPS shifts the search pattern in order to be able to handle a richer set of functionality--handling repeated patterns and inequalities.

Overall, I found the paper very intriguing, and enjoyed learning about a tool that provides users with an easy to write, quick, and optimized way of querying for complex and recurring patterns. I liked the SQL-TS query examples that showed how SQL-TS was much more effective than SQL. The limitations of the paper were that it was unclear in section 4.2 how the theta and phi matrices were calculated from Example 4, which was about IBM stock prices. What do p1, p2, p3, and p4 in section 4.2 represent in terms of the X, Y, Z, T, U sequential objects in Example 4? It was also unclear what the => symbol means in the theta and phi definitions in Section 4.2.


Review 15

Reza et al. discuss optimization of sequence queries in this paper. They are very theory oriented which is what led them to their novel contribution in the first place but it takes focus away from some important issues I will mention in my review.

The paper discusses previous work and SQL extensions to handle sequence queries and their shortcomings. They state that pattern searching and advanced techniques for optimizing queries using their algorithm is more powerful than these approaches. They then go on to discuss a text searching algorithm by KMP and how it can be adapted to the introductions in the introduction. They give a exhaustively thorough explanation of their optimized pattern search (OPS) algorithm and it's complexity. They then go on to discuss experimental results and future work.

This all sounds great. They are clearly adapting and building upon previous work. They have a couple great examples in the beginning. We get a complexity analysis of the algorithms used, which we don't always see in papers, and we see experimental results that use the Dow Jones Industrial Average for the past 25 years as a time-series data set. This paper is easy to understand in the beginning and at the end of the paper and all of these things are strengths.

However, it is not so clear in the middle. I struggled a bit with the proof provided. This however has to do with the theoretical rigor of the paper rather than experimental flaws. More importantly, one should note that they measure performance for ONE time-series using ONE pattern. It is not clear how useful this actually is for other types of query patterns or other data sets. Expanding in one of these two directions would have made section 7 much stronger. Preferably, different patterns should have been used. Furthermore, the authors state that it improved performance by "more than two orders of magnitude". This doesn't suffice as proof of the practical superiority of their method. They could have provided speedups for different patterns, or different points in time.


Review 16

Part 1: Overview

This paper discusses methods to optimize the queries in SQL-TS language in order to explore ways to optimize sequence queries in all database systems. Sequential queries often share some inner dependencies. In order to exploit the similarities between sequential queries, they designed SQL-TS, namely, simple query language for time series, as an extension of regular SQL language. They basically add a Cluster By, Sequence By, and a new As clauses in order to record the stream information of queries. Taking advantages of these clauses, user can specify some sequential characteristics while doing query processing. Knuth, Morris and Pratt (KMP) text searching algorithm is introduced as a basic optimization method and it gets extended in this work in order to deal with streaming queries.

In KMP algorithm for text searching, back tracing has been avoided by smartly jumping over the impossible start letters/words. They propose a new algorithm, so called Optimized Pattern Search (OPS) algorithm. New shift algorithm has been developed to shift the pattern as well as the streaming data. Matrices computation is included and adds up to the computational complexity.

The major method used in this paper is that first to design a suitable language for exploring the optimization problem of the sequential queries, which is SQL-T. Then they try to generalize test search algorithms into more complicated cases of queries of sequence.

Part 2: Contributions

Optimizing sequential queries is really a hot spot of database industry. This paper tries to catch the sequential queries characteristics by building up an extended SQL language for time series of queries. The three clauses, Cluster By, Sequence By and As clauses capture the inner connection between sequential queries by letting user specify some of the similarities between sequential queries.

Part 2: Possible drawbacks

The optimizing algorithm introduces much complexity to the traditional KMP text searching algorithm. As shown in the paper, KMP only takes O(m+n) linear time, while the proposed OPS algorithm takes O(m^6) time before using inverse graph. After trading off some space complexity the time complexity becomes O(n^2), for implementation concern, or O(n^3), for satisfiability concern. High complexity algorithm can add much latency to the response time of an online database system.

It is always questionable for an new algorithm design that if the data set used in experiment section is suitable. It is hard to fully characterize the similarities between the sequential queries gathered from real world databases. We can see in the plotting (Figure 7) that the variation or turbulence between days are huge.


Review 17

The paper addressed how to extend SQL query language to support efficient query of time series applications. Particularly, the authors claimed that there is no expressive query language for finding complex patterns in database sequences for time series analysis. This problem is important because many applications such as analysis of stock market prices, meteorological events, and the identification of patterns of purchases by customers over time involves sequential patterns which benefits from the solution. The paper addressed these problems by introducing the Simple Query Language for Time Serie(SQL-TS) which is an extension of SQL and a query optimization algorithm over the Knuth, Morris and Pratt (KMP) search algorithm.

SQL-TS language provides simple construct for specifying of complex sequential pattern only by modifying the FROM clause instead of requiring multiple join SQL statements. SQL-TS being superset of SQL allows all query optimization of SQL to remain applicable. In addition, the paper provides new optimization techniques using KMP search algorithm which are specifically akin to searching sequential pattern. While a direct application of the KMP algorithm could be used to optimize simple queries such as those which have equality predicates condition in their WHERE clauses, the paper proposes the Optimized Pattern Search (OPS) algorithm, which can handle general predicates and repeating patterns that can be expressed by the star construct. This algorithm computes shift(j) in addition to the next(j) array. Shift(j) determines how far the pattern should be advanced in the input, and next(j) determines from which element in the pattern the checking of conditions should be resumed after the shift. The OPS algorithm construct a positive precondition matrix, , and a negative precondition logic matrix,. These matrices capture all the logical relations among pairs of the pattern elements. The shift(j) and next(j) arrays are determined based on them. The OPS algorithm uses these arrays to optimize the pattern search by avoiding as many unnecessary comparison as possible.

The strength of the paper is that it provides an optimized sequential pattern search algorithm which is not otherwise possible with a traditional SQL query optimization techniques. In addition, their technique is easily generalizable for a complex pattern search including those which involve star in SQL-TS. An example of such pattern look like: “if *x is an element in the search pattern, where x= x.price < x.previous.price”. For this pattern, any sequence of records with a decreasing price will match. As this is complex to optimize in SQL, I like the fact that the OPS algorithm is generalizable for such kind of pattern searching as well. In addition, the paper provides different but insightful examples of queries which makes the explanation of their technique easier.

One drawback of the OPS algorithm is it needs to compute two arrays instead of one (required by KMP) which leads to additional performance and storage overhead. It would have been better if they have evaluated an alternative less complex techniques. The paper seems biased towards the KMP search algorithm. Instead of concentrating only on flavor of this algorithm, they could have investigated other kind of pattern searching algorithms and provided relative performance and space efficiency. Furthermore, the paper provides result for a couple of queries, it could have been more insightful if there were more analysis for different kind of queries. For example, for simple queries the performance gained by the OPS algorithm might not justify the naive search algorithm. While not a major limitation of the paper, I would be happy to see a more detailed comparison of query optimization in SQL-TS and traditional SQL.



Review 18

This paper focuses on how to optimize sequence queries in database systems. An algorithm and an extended SQL design are proposed. The extend database query language has the ability of searching for and manipulating sequential patterns. More functionalities are added through AS clause and the star syntax.

KMP algorithm is extended to handle the conditions where the pattern expressions are more flexible than just constants. Two matrixes instead of one are created in the new algorithm, next[i] and shift[j], to avoid unnecessary reevaluation of data over a certain pattern. Shift determine how far the pattern should be advanced in the input. Next determines from which element in the pattern the checking of conditions should be resumed after the shift. In order to construct the two matrixes, information about the implications between elements are required. As we can see in the later parts of the paper, this algorithm significantly reduced the length of the search path in the data.

The paper is well written and the example queries in it all have a little word explanation instead of code only, which is nice. Also, it makes perfect sense to compare the test results in terms of number of times that an element of input is tested, instead of measuring the run time of the program.

One downside would be that the paper, at the end, provide a 800:1 ratio of optimization, which would be better if the paper can provide more examples. Overall, the paper solves a very practical problem in database design. Sequential query is very useful in data analysis and definitely has a lot of application.


Review 19

By identifying the increasing need of processing and analyzing sequential data, this paper provides an optimization of sequence queries by proposing a new SQL-like language called Simple Query Language for Time Series (SQL-TS) and an algorithm called Optimized Pattern Search (OPS) which extends KMP.
This paper first gives a brief introduction on SQL-TS. It is basically identical to SQL except CLUSTER BY and SQUENCE BY additions to the FROM clause. CLUSTER separate data into different groups and SQUENCE specify the data traversed. Then query optimization for sequential data query is discussed. While simple queries can be optimized using KMP algorithm, 1) general predicates, 2) Repeating pattern expressions and 3) general objects are situations where simple KMP are not very useful. To address this issue, OPS is proposed. OPS uses next(j) and shift(j) as subroutines to generalize KMP. Finally, this paper further generalized OPS algorithm to deal with stars that appears in SQL-TS queries.

The technical contribution of this paper is that it identified an situation where SQL is not very efficient and provided an interesting fix by extending the SQL language and implementing an novel algorithm. Some applications, such as stock price analysis might benefit from such improvement.

In this paper, I like the part that talks about star case. It gives a good theoretical analysis and discussed every possible cases can happen.

One weakness of this paper is that it has limited performance experiment. And it measured is performance by counting the number of times that an element of input is tested against a pattern element. And it provided only a single figure to illustrate its result. I am expecting some results related to time and other resources usage.



By identifying the increasing need of processing and analyzing sequential data, this paper provides an optimization of sequence queries by proposing a new SQL-like language called Simple Query Language for Time Series (SQL-TS) and an algorithm called Optimized Pattern Search (OPS) which extends KMP.
This paper first gives a brief introduction on SQL-TS. It is basically identical to SQL except CLUSTER BY and SQUENCE BY additions to the FROM clause. CLUSTER separate data into different groups and SQUENCE specify the data traversed. Then query optimization for sequential data query is discussed. While simple queries can be optimized using KMP algorithm, 1) general predicates, 2) Repeating pattern expressions and 3) general objects are situations where simple KMP are not very useful. To address this issue, OPS is proposed. OPS uses next(j) and shift(j) as subroutines to generalize KMP. Finally, this paper further generalized OPS algorithm to deal with stars that appears in SQL-TS queries.

The technical contribution of this paper is that it identified an situation where SQL is not very efficient and provided an interesting fix by extending the SQL language and implementing an novel algorithm. Some applications, such as stock price analysis might benefit from such improvement.

In this paper, I like the part that talks about star case. It gives a good theoretical analysis and discussed every possible cases can happen.

One weakness of this paper is that it has limited performance experiment. And it measured is performance by counting the number of times that an element of input is tested against a pattern element. And it provided only a single figure to illustrate its result. I am expecting some results related to time and other resources usage.



By identifying the increasing need of processing and analyzing sequential data, this paper provides an optimization of sequence queries by proposing a new SQL-like language called Simple Query Language for Time Series (SQL-TS) and an algorithm called Optimized Pattern Search (OPS) which extends KMP.
This paper first gives a brief introduction on SQL-TS. It is basically identical to SQL except CLUSTER BY and SQUENCE BY additions to the FROM clause. CLUSTER separate data into different groups and SQUENCE specify the data traversed. Then query optimization for sequential data query is discussed. While simple queries can be optimized using KMP algorithm, 1) general predicates, 2) Repeating pattern expressions and 3) general objects are situations where simple KMP are not very useful. To address this issue, OPS is proposed. OPS uses next(j) and shift(j) as subroutines to generalize KMP. Finally, this paper further generalized OPS algorithm to deal with stars that appears in SQL-TS queries.

The technical contribution of this paper is that it identified an situation where SQL is not very efficient and provided an interesting fix by extending the SQL language and implementing an novel algorithm. Some applications, such as stock price analysis might benefit from such improvement.

In this paper, I like the part that talks about star case. It gives a good theoretical analysis and discussed every possible cases can happen.

One weakness of this paper is that it has limited performance experiment. And it measured is performance by counting the number of times that an element of input is tested against a pattern element. And it provided only a single figure to illustrate its result. I am expecting some results related to time and other resources usage.



Review 20

This paper introduces a language SQL-TS in order to query the database for complex database patterns based on extension of the well-known pattern matching KMP algorithm.

The examples of the queries that they have given in the paper would be really difficult and involve multiple partial queries to implement using typical SQL queries. They also take advantage of the KMP algorithm and extending it, they have been able to query more generalized patterns and predicates as well. They also explain how their algorithm is capable of handling sequences and patterns put together. They have created a distinction between shift(j) and next(j) in order to differentiate between non-matching values and different patterns. The examples were very well elucidated in order to bring about a great flow through the length of the paper.

However, I thought that implementing the theta and phi matrices might get a little too complicated when it comes to complex queries involving multiple patterns. Their results definitely showed a great reduction of search effort. I believe a few more graphs/examples explaining the kind of complex queries this algorithm did perform better at would have given more clarity about the performance of this algorithm. It would have also been great if they could have probably touched upon the idea of patterns over multiple tables because the performance would definitely be different in that situation.



Review 21

This paper presents a new solution to support sequential pattern queries efficiently, which uses an the SQL-TS, an extension to sql. To deal with time series queries, three simple constructs are added in the language of SQL-TS. The first one is, ‘CLUSTER BY’, which is used specify that data for each group is processed separately. The second expression is, ‘SEQUENCE BY’, which tells in what order to traverse in a time series. The last one is, ‘AS’ clause, which is used to specify a sequence of tuples variables in series of the target table. And in the ‘AS’ clause, a star can be added to one or more variables in the sequence, to represent that such a tuple can appear one or more in the series given them satisfy all requirements.

In order to optimize the query in series, the paper paper chooses the most efficient Knuth, Morris and Pratt algorithm to calculate the pattern matching in the text sequence. However, within the database tables, more types of data are stored, and the search predicates are not only limited to simple ones like just ‘equality’.

So the final resort is to use an extension to KMP algorithm, the ‘optimized pattern search’ algorithm to provide a more general search option in the SQL-TS queries. The more power OPS can support the following three features besides the original KMP ones. The first of which is that, more general predicates in pattern definition, inequalities and interval predicates can be handled easily. The second is that, OPS can support star operation. Compared to the simple fixed length pattern matching mechanism in KMP, OPS is moving towards the direction of dynamic length ones. The last one is that, OPS can search for more general objects, like the images, text, and xml objects.

Although the ops algorithm is not a linear time pattern recognition algorithm like KMP, it shows far better performance in the test than the naive searching algorithm. The OPS also takes more space than KMP to store the calculated logic matrix for Θ,Φ and S, but as the pattern string is usually of limited length, it will pose a problem. For future development, better heuristic to achieve larger shift and next values in still under exploring. And one possible drawback is probably that it didn’t show the solution to the most important search criteria, the disjunction of predicates, which is very common in terms of user needs.



Review 22

This paper introduces a method of complex sequential pattern query named optimized pattern search algorithm which is an extension of KMP algorithm.

First, the author introduce the SQL-TS language which supports the complex sequential pattern searching. Then the paper talks about KMP which is a optimized text searching algorithm. The KMP avoid duplicate checking the same text element. But the KMP cannot fully support the complex pattern searching problems of SQL-TS as it cannot support the general predicates, repeating pattern expression or some complex object other than text.

Then the author talks about their algorithm which is has the same idea of KMP. It first look into the relationship inside the patterns and then use these relationship calculated first to do optimization. The paper also talks about how the OPS deal with the repeated pattern searching. At last, the paper uses the experiment result to show that their algorithm can provide better performance.

This paper introduced how the algorithm works based on the KMP and is very useful as the complex pattern searching is common. It first give reader a overview of SQL-TS and KMP that can help reader understand their idea and step by step explain how the matrix is deduced.

The drawback is : (1) It should take more space about the KMP calculate “next()”, which I think is the key part of KMP. (2) The paper contains many logical deduction and long logic chain from begin to end, so that many get confuse in the middle of deduction. It is easier to understand if it shows a specific example at first and then explains the logical deduction along with the example.


Review 23

The authors discuss the limitations of SQL in searching for trends in time-series data and attempt to resolve this by introducing the SQL-TS language, an extension of SQL that better support time-series data. The main contribution of this paper is a generalized text search algorithm based on that of Knuth, Morris, and Pratt that is much faster than a naive search plan.

In a typical SQL query, a SELECT [row names] query will only return the rows provided. However, with SQL-TS, this is no longer the case as the AS clause may add additional columns from the table. I would argue that more thought should be put into the syntactical changes. Additionally, I would like a more comprehensive performance evaluation. The only practical performance evaluation was against a naive approach which is not difficult to beat. Superiority in terms of algorithmic complexity does not necessarily translate into lower runtime in practice due to complex algorithms often having large coefficients that are not overcome except for large workloads (with the meaning of "large" depending on the algorithm).


Review 24

This paper proposes a new extension on top of SQL for sequential queries in database systems. Usual SQL queries involve only comparisons between a set value and the values in some column in the database. However, when we want to do computations where previous and next values in a column need to be compared, traditional SQL queries will take much longer and be much more inefficient. SQL-TS, the proposed language, will solve the problem of needing to do sequential queries for cases such as stock market analysis.

The SQL-TS language is an extension on top of SQL that allows programmers to specify a cluster and sequence and define relationships between the values in the sequence. For example, we can query for plane tickets from DTW to SFO where the price of the ticket is lower today than it was yesterday. The authors of the paper also propose the algorithm to efficiently find these relational values in sequential data. They start off with the KMP algorithm which is a simpler, brute force method of finding patterns in strings that runs in O(n * m) time. Then they introduce the optimized pattern search algorithm that can run in O(n + m) time. It keeps an m by m matrix to maintain the relationships between all pairs of characters in the two strings. With this algorithm, SQL-TS queries can be run efficiently.

The following are some weaknesses with the paper and the proposed algorithm:

1. The author does not explain how indexing would be used on these columns. If there were n rows in the table, would each query have to go through all n rows to find the matches?
2. What happens when we want to perform both regular queries and sequential queries on a column? Would the indexing needed for the regular query mess up the sequential ordering? For example, we might want to find the usual patterns in a certain stock index, but we might also want to find all the instances where the index closed below a certain amount.
3. During the experimental results section, the authors used a real world example, which removes any sample bias. However, I would have liked to seen more examples and experiments on where this sort of query would be implemented to get a better understanding about how much the speed up is with SQL-TS.



Review 25

This paper discusses the development of SQL-TS, an extension of SQL designed to allow efficient searches for complex sequential patterns in time series data. This is an important issue, as researchers often need to search through enormous data sets to find meaningful patterns in meteorological, financial, or scientific data. In standard SQL, searching for stocks that increased by 10% one day, then dropped by 20% the next day would involve three joins and would be difficult to express. The authors of this paper propose SQL-TS as a means of simplifying the syntax of such a query and optimize searches for complex time series patterns.

To solve the syntax problem, the authors build on the work of systems such as SRQL, adding a CLUSTER BY clause to specify that data with the same value for this attribute should be processed separately. They then add a SEQUENCE BY clause to specify how data should be traversed, how many groups the pattern is broken into, and whether each group should match “exactly one” or “one or more” tuples. The * operator allows a user to specify that all consecutive elements for which a given predicate is fulfilled should be included in the group modified by the *. These semantics extend SQL to allow for much easier expression of complex search patterns.

In order to make these newly supported operations efficient, the authors implemented a modified form of the KMP algorithm for matching an input string to a given pattern. The basic principle of the KMP algorithm is to take known information about how many input characters have matched the pattern and what possible shifts can be applied to the pattern such that an input matching the first x characters might match some other prefix of the pattern. The authors extend this algorithm by converting a query into a set of predicates, creating matrices defining which predicates imply/preclude other predicates, then compile this into a matrix that can be used to determine the shift that should be applied to the pattern when a failure occurs.

This paper offers some very important contributions, but I feel as if it spent too much time discussing the low-level details of the search algorithm and not enough time demonstrating the results observed when the authors used SQL-TS to search for complex patterns in a stock market data. The authors devote about 60% of their paper to discussing their algorithm, then spend only a few paragraphs on experimental results. They claim to have gotten results 800-fold better than naïve search, yet they offer no data or graphs to demonstrate this. One would think that, having seen such incredible increases in performance, the authors would want to emphasize their experimental results more strongly.





Review 26

The paper proposed a very SQL-like language called “SQL-TS” for pattern searching, and techniques for optimizing these queries. The need for searching complex and recurring patterns arose in more and more situations, such as stock market prices and meteorological events. There were some researches on this topic. However, they lacked in expressive power or flexibility. Therefore, this paper described an approach for querying complex sequential patterns and optimizing these queries.

First, the paper introduced the SQL-TS (time-series) Language, which is more intuitive for users to writes queries for searching patterns. It had two main features: One of them was “CLUSTER BY” and “SEQUENCE BY” clauses, and the other was star operator, which can express recurring patterns. For the search optimization, the paper used KMS algorithms, optimizing comparison cost based on the properties of the pattern. By computing the array next[j], KMS algorithm could do the searching without backtracking the index of the input text. Thus, SQL-TS and KMS searching algorithm are two foundations for searching in general predicates.

Second, the paper talked about Optimized Pattern Search (OPS) algorithm, which could be applied in general predicates. The main idea was computing shift array and next array. When mismatch occurred, the shift array could determine how far the pattern should be advanced in the input, and the next array could determine from which element in the pattern the checking resumed. These two arrays can be computed by OPS algorithm, which applied some logic properties on the query predicates. The paper also proposed an approach for computing shift and next arrays when dealing with star operators. Thus, the OPS algorithm provided an efficient way to search patterns in SQL-TS language.

To sum up, the paper proposed SQL-TS language and OPS algorithm for optimizing pattern search problems. Starting from introducing SQL-TS language, the paper provided good foundation for the next few sections talking about OPS, which was good for studying this paper. However, the logical proof on the OPS algorithm could be explained by using more examples. More examples can help readers understand the difficult logic more easily. In conclusion, I think this paper is useful because sequential data analysis arises nowadays in more and more situations.


Review 27

This paper is an introduction to an improvement in pattern searching queries. It introduces add ons to SQL in what they call SQL-TS, which is an optimized language for pattern searching queries. SQL-TS uses and extension on the Knuth, Morris, and Pratt algorithm (they called Optimized Patter Search (OPS)) to optimize the searching for patterns so that you can use information on the data you just processed so you don’t need to process it multiple times.

The paper did a good job describing the OPS algorithm and how the * operator performs but is severely lacking the metrics to convince me to change my ways to use SQL-TS. So much of this paper was spent describing the algorithm, which is important but if you are trying to pitch SQL-TS to me all I want to hear is performance improvements I will get, then I’ll care about how you did it. This paper only spent two paragraphs discussing improvements and only had one graph, and the graph didn’t even have a comparison to what the old “naïve” algorithm would have done.

I would imagine that because of this flaw in the paper SQL-TS did not gain market share. One of Stonebraker’s lessons was that “unless there is a big performance or functionality advantage new constructs will go nowhere” and I think the hassle of upgrading to SQL-TS would outweigh the benefits provides (at least for most query users who don’t do much pattern searching).



Review 28

The authors describe the motivations for this paper as the “need to search for complex and recurring patterns in database sequences.” For example, identifying and grouping temporal data for analysis by some other application can improve overall performance of a system that consistently processes sequential data. By interpreting a query as a predicate string, search optimizations can be performed in order to identify desired patterns in the data.

The most interesting part of the paper is the analogy from the Knuth-Morris-Pratt (KMP) string search algorithm, which exploits the observation that mismatches of a word within a string contains information about the location of the next match of the searched term. The analogy is that matches and mismatches of desired query results will contain information about where other matches in the data will reside. Many diagrams and explanations, along with experimental results, made this paper easy to digest and understand, even from a high-level perspective.

I think it would be interesting to see why they chose, in particular, the KMP string search algorithm over other potential choices (which are only briefly mentioned at the end), and the motivation behind modifying one specific algorithm rather than using multiple methods and comparing them. Also, if their extensions would work better on other languages for DBMS systems other than SQL.


Review 29

This paper develops an effective approach for the optimization of querying complex sequential patterns. This is done by extending SQL to include functionality that is suitable for such query patterns. The SQL extension contains a novel algorithm for handling complex queries on sequence.

The SQL-TS extension is a simple yet powerful extension that adds a “CLUSTER BY” clause, a “SEQUENCE BY” clause and some expressions for describing complex patterns. In particular, the addition of the “*” allows for powerful expressions that can describe reoccurring patterns. The paper also introduces a novel pattern search algorithm that is an extension of the Knuth-Morris-Pratt string search algorithm. The OPS algorithm construct two logical matrixes, omega and theta that contain all the logical relations among pairs in the pattern elements. From those matrices, another triangular matrix is derived which contains the logical relationships for the whole patterns. From this matrix, the shift and next arrays can be derived which are used to reduce the runtime of the OPS algorithm.

The largest, most glaring deficiency in this paper is its lack of experimental results. Though the paper does state that they ran many complex queries and saw speedups of over 800x, not much discussion is given to testing. Showing the queries tested would have been interesting and informative. While the OPS algorithm clearly has some key insights to shorten the length of the short path vs the KMP algorithm, it does require heavy and complex preprocessing. Thus, some of the speed of the algorithm is lost.


Review 30

This paper discusses how to express and support efficiently sophisticated sequential pattern queries in the database using SQL-TS as well as how to optimize the search query in SQL-TS. The paper also proposes method to optimize the SQL-TS queries using Optimized Pattern Search algorithm, which is an extended version of Knuth, Morris, and Pratt (KMP) algorithm. The paper did experiment using optimized query on double bottom queries, confirming that substantial speedups are achieved by the proposed optimization technique.

The language of SQL-TS is interesting because in querying data, it is able to treat each reach record as separately (using CLUSTER BY). For example, when analyzing a stock market data, using CLUSTER BY clause user is able to treat each stock data as separate stream, thus enabling comparison with previous and subsequent data. For query search optimization, the paper takes KMP algorithm for text search and extends it to OPS algorithm. The OPS algorithm must be able to support general predicates (equalities and inequalities involving variables), repeating pattern expression (non-fixed number of elements), and more general object (images, text, XML, etc). For text search, the OPS algorithm introduces the use of “shift” in addition to “next”, so that there is no need to check element sequentially for a pattern. Another thing that OPS must be able to support is the “star”, a clause in SQL-TS which is used to express the sequence of one or more tuples that satisfy all applicable conditions in the where clause.

The main contribution of this paper is presenting the SQL-TS as tool for pattern recognition in sequential data AND its optimization technique. Pattern identification in data collection has been one of the most researched subject and around the time of this paper was published, there had been many proposal about pattern recognition language but not many of them has integrated SQL. This paper is one of the early proposals that integrate pattern recognition in massive data collection using extension of SQL. Having the SQL extension means a lot because it is more powerful, more flexible, and more integrated with DB query language (as most of the large-sized datasets are usually implemented on SQL-based DB).

Unfortunately, in this paper there is no elaborate explanation concerning the performance of SQL-TS query compared to other pattern identification techniques, especially those using different basic algorithm other than KMP. Also, there is no comparison between SQL-TS and custom SQL queries (although there are not many custom SQL query that can substitute the SQL-TS, such queries exist). Semantically, SQL-TS is simpler than custom/nested-query SQL, but how is the performance improvement?



Review 31

The purpose of this paper is to present a new algorithm based off of KMP text search algorithm called Optimized Pattern Search algorithm. This algorithm is useful because it improves performance in the processing of sequence queries. They seek to minimize unnecessary repeated passes over the same amount of data by minimizing the amount of backtracking necessary in a pattern matching algorithm.

A main contribution of this paper is the development and analysis of the Optimized Pattern Search (OPS) algorithms which is an extension of the KMP text search algorithm to handle other types of data types as input as well as all kinds of comparisons (KMP only supports assertions of equality with constants), and also to support repeating patterns of undetermined length. They present detailed walkthroughs of the computation of logical matrices that are used in their algorithm and how these computations must change upon the allowance of repeating patterns.

A strength of this paper comes in its detailed presentation of a new algorithm. The explanations are in depth and accompanied by pictorial representations of the many cases they consider in algorithm development. Additionally, the authors present theoretical as well as empirical analysis of the running times of their new OPS algorithm implemented in their SQL-TS system as compared with related algorithms deployed in other systems to show that it outperforms in practice as well as in theory.

One weakness of this paper comes in the examples they present, especially in the sections regarding queries that allow for starred predicates. The example presented seem to me to be needlessly complex and not very useful in the real world. I wish the authors would have contextualized these examples more rather than throwing seemingly arbitrary and very complicated “motivating” examples at the reader. It’s hard to see an example as motivational if it cannot be couched in existing knowledge or presented database problems that somebody wishes to solve or relevant queries that they wish to know the answer to. They finally present what is a real-world algorithm in the Experimental results section, but this section is almost non-existent and some numerical comparisons are thrown at the reader without an further discussion of the results, which is another weakness. The graphs do not even visibly represent the data they present in the text, and thus I am confused and disappointed by the results section.




Review 32

Paper Title: Optimization of Sequence Queries in Database Systems
Reviewer: Ye Liu

Paper Summary:
This paper proposes a new SQL-like language called SQL-TS, which focuses on pattern searching and sequential queries optimizations. The motivation of this proposal is that the need for searching complex and sequential data is growing and that the then current commonly used tools were all of defects. The proposed language, SQL-TS, has the advantages in expressive power, flexibility and compatibility.

Paper Review:
The proposed language is, in fact, a superset of SQL. That means all SQL implementations remain valid in SQL-TS. The major difference is the improvement of the expressive power and efficiency when it comes to queries of complex data patterns and sequential data analysis such as sequence prediction.

Another key point of this paper is that it proposes the Optimized Pattern Search algorithm. This algorithm can capture the logical relationship between the elements of the pattern as KMP algorithm does. However it can also provide a richer set of possibilities that can occur in OPS demand as an advantage over KMP algorithm.

A general comment on this paper is that it uses a number of chunks of pseudo code, which at a point may become annoying. A game saver is that it also provides many clear graphs to help demonstrating the ideas that help the paper become more reader-friendly.



Review 33

Summary:
In order to solve the problem raised by the needs of searching for complex and recurring patterns in data bases sequence, this paper proposed a framework to express and optimize the queries for sophisticated sequential patterns. This paper introduces an extension of SQL called SQL-TS that add 3 additional SQL features in the from clause to express the sequence pattern in a effective way. The three features are: 1. Sequenced by which specify the attribute that goes for the sequence analysis. 2. Cluster by that similar to group by that identify the attributes that need to be processed separately. 3. As that used to specify the items in the sequence of the data that is going to be analyzed. This paper also describes the algorithm it used to leverage the performance of time sequence analysis. It extends the use of KMP algorithm used in text searching algorithm and the experimental result shows promising performance boost comparing with the standard SQL system.

Strengths:
1. This paper introduces a expressive way for analyzing sequence pattern data and also introduces a novel optimization approach in DBMS field by applying KMP. The experiment results shows that the performance of the system is better in a convincing way.
2. This paper illustrates its concepts with nice example and expressive graphs, which helps its reader understand the concepts clearly.

Weaknesses:
1. Although this paper shows the promising result by using its expressive SQL extension and its underlying optimization, the portion of workload of sequence analysis is not clear. It would be better if the paper can provide some relevant data to show that it is necessary to have specific optimization for this kind of queries.
2. I believe the KMP algorithm used in this paper can have more potential power, so it is worth a discussion that instead of adding the tokens (cluster by, sequence by, as), it could automatically detect the pattern and then apply this optimization for this query pattern.



Review 34

This paper introduced SQL-TS, an extension of SQL, that can query sophisticated sequential pattern in databases. To optimize search queries in SQL-TS, the authors also proposed a generalized version of KMP algorithms that handles complex queries on sequence.

The SQL-TS languages is used for searching complex sequential pattern in streaming data, such as finding a stock that drops down one day and then goes up. The syntax of SQL-TS is similar with standard SQL except for adding CLUSTER BY that specifies how data is organized and SEQUENCE BY which sorts the records based on an attribute. It is also able to express recurring patterns by using a star operator.

The sequential pattern problem can be abstracted into searching pattern problem. Simple word pattern match can be solve by KMP algorithm efficiently. Consider a string s and a pattern p(which is also a string), the idea of KMP is to investigate the pattern p and learn the information that, if a mismatch occurs, where the next match could begin. After compiling the pattern, when unmatch happens at s[i] and p[j], the next matching start from s[i] and p[next(j)], instead of starting over from s[i-j+1] and p[0]. Inspired by the idea of KMP, the authors proposed Optimal Pattern Search(OPS) that studies the implications of elements in a sequential pattern. It also studies the implications between the sequence of searching predicates. Compared to KMP, the index on s doesn’t necessarily increase or stay at the same position, it may go back sometime, determined by the pattern. So OPS computes not only a next[] table, but also a shift[] table for the index on s to accommodate this. The star operator is also supported by configuring these two tables.

The main contribution of this paper is to proposed an easier way of searching sequential pattern in database. Pattern search is a very complex problem. It takes much expertise in implementing algorithms and writing procedural code to achieve good performance. The SQL-TS solves this problem by using declarative language and a specific query optimizer designed for pattern searching.

One drawback of this paper is lacking of experimental result. I would like to see a comparison on performance between SQL-TS and native procedural pattern searching program. To convince people to accept this declarative language approach, the authors should show their performance is comparable.