An Algorithm for L Approximation by Step Functions

Quentin F. Stout
Computer Science and Engineering, University of Michigan

 

Abstract: Step functions are a fundamental form of approximation, arising in variable width histograms, databases, segmentation, approximating sets of planar points, piecewise constant approximations, etc. We give an algorithm for determining an optimal step function approximation of weighted data, where the error is measured with respect to the weighted L norm. The algorithm takes Θ(n + log n ⋅ b(1+ log n/b)) time and Θ(n) space, where b is the number of steps. Thus the time is Θ(n log n) in the worst case and Θ(n) when b = O(n/log n loglog n). The fastest previous algorithms took Θ(n log n) time, independent of b, but required parametric sort, which is highly impractical. This algorithm is far more practical, is faster when b = o(n), and takes only Θ(n) space. It is even faster than the best previous algorithm for unweighted data.

With only a minor change the algorithm determines an optimal reduced isotonic regression in the same time and space bounds. A function is isotonic iff it is nondecreasing, and a function is an optimal L b-step reduced isotonic regression iff it minimizes the weighted L regression error among all b-step isotonic functions. Isotonic regression arises when researchers want to avoid parametric assumptions and use weaker shape-constrained assumptions, such as being increasing. Some reseachers are concerned that unrestricted isotonic regression can overfit the data and/or have too many steps, and hence they utilize reduced isotonic regression instead. However, because previous exact algorithms were too slow, they used approximations instead of the optimal step function.

The algorithm also solves the k-center problem for 1-dimensional data. Given a set of weighted values V=(yi,wi), i=1 ... n, where the y values are arbitrary real numbers and the w values, the weights, are nonnegative real numbers, and given a set of real numbers S, the weighted L distance from V to S is

max{ min{ wi|yi - sj| : sj ∈ S} : yi ∈ V}
A subset S of real numbers is a k-center of V iff it has k elements and minimizes the distance from V to it among all sets of k real numbers. This can be found by sorting the data and then finding the optimal k-step L approximation, which is also the optimal L k-step reduced isotonic regression.

The paper introduces the interval tree of bounded error envelopes, instead of the full upward and downward error envelopes used previously. This allows a reduction in time and space. The algorithm uses an indirect approach, finding the minimal possible error and using the error to determine the step function. Exploiting the geometric properties of the entries being evaluated in search in an ordered matrix allows a reduction in the time required to find the optimal error.

Keywords: stepwise approximation, weighted L error, variable width histogram, piecewise constant approximation, reduced isotonic regression, k-center, nonparametric

Complete paper, arXiv 1412.2379

 

Related work: Here is an overview of my work on shaped constrained regression (isotonic, unimodal, step), and here is information about the fastest (in O-notation) isotonic regression algorithms known to date, for a variety of orderings and metrics.


Quentin's Home Copyright © 2014-2024 Quentin F. Stout.