Weighted L Isotonic Regression

Quentin F. Stout
Computer Science and Engineering, University of Michigan

 

Abstract: This paper gives algorithms for determining weighted L isotonic regressions satisfying order constraints on a partially ordered set specified via a directed acyclic graph (dag). Except for degenerate caes, L isotonic regression is not unique, and some regressoins have undesirable properties. Hence we examine properties of the regressions an algorithm produces, in adition to its running time. Note that the L metric is also called the uniform metric, Chebyshev metric, and minimax metric, and isotonic regression is also known as monotonic regression. Isotonic regression is a form of nonparametric shape-constrained regression.

For a dag with n vertices and m edges we give an algorithm taking Θ(m log n) time. Unfortunately it is based on parametric search, which is completely infeasible. We gave a simple, fast, randomized algorithm achieving the same expected time. While not worst-case time, it achieves this time with high probability. Further, its approach can easily replace parametric search in several other problems, such as approximation by step functions.

The properties of the regressions these two algorithms produce are not always desireable. For example, the algorithms are not monotonic, in that there can be functions f and g where pointwise f ≥ g, but the isotonic regression produced for f is not pointwise ≥ the regression produced for g. An approach that is monotonic is to solve the prefix isotonic regression problem, where for every element x in the partial order, you determine the regression error for the isotonic regression on x and all of its predecessors. This regression is called Prefix. An algorithm is given for computing Prefix of linear and tree orders in Θ(n log n) time. It is also used to find a unimodal regression on linear orders in the same time bound. This completes the work in Unimodal Regression via Prefix Isotonic Regression by solving the remaining open case, namely L isotonic regression on weighted data. This is significantly more difficult than the unweighted data case.

Algorithms are also given to determine unimodal regression for an undirected tree, for isotonic regressions where the regression values must come from a specified set, and for isotonic regressions when the number of different weights, or number of different values, is much smaller than the number of vertices. An algorithm is also given for isotonic regression on a rooted tree, where the value of a parent must be the largest value of its children. We call this ``river isotonic regression'' since it is similar to the naming convention for rivers.

The parametric search and randomized algorithms mentioned above can produce Max, the regression which is pointwise the largest of any L isotonic regression, and Min, the pointwise minimum. Other regressions considered are Basic and Strict. Basic is the most commonly studied one, but people have rarely addressed the fact that it is nontrivial to compute for weighted data. Prefix is somewhat easier. Strict is discussed more thoroughly in Strict L isotonic regression. Strict regression is the "best best" L isotonic regression. It is the limit, as p → ∞, of Lp isotonic regression. It has a strong property concerning the number of vertices with large regression errors (see the paper for a precise statement). Prefix and Basic have a somewhat weaker property in that they minimize the set of vertices with maximal error. At the other end of the spectrum, at any vertex, one or both of Max and Min have the worst error among all isotonic regressions.

Complete paper. This is a major revision of a much longer version from 2008. That version included an approach for multidimensional data which was moved to Isotonic Regression for Multiple Independent Variables. There the approach is also applied to L1 and L2 isotonic regressions. That paper was published a few years ago, and now, after a sequence of absurdly long delays, including ones caused by the death of an editor, this paper will finally appear in Journal of Computer and System Sciences.

Related work: Here is information about the fastest isotonic regression algorithms known to date, for a variety of orderings and metrics.

Here are some new, faster, but more complex, algorithms for L isotonic regression on linear, multidimensional, and tree orders, and algorithms for approximation by step functions with a specified number of steps. Here are papers on L1 isotonic regression (also known as isotonic median regression), isotonic regression for multiple independent variables, isotonic regression with a fixed number of level sets, and unimodal regression. The paper on L1 isotonic regression also includes new algorithms for Lp isotonic regression, 1 < p < ∞. These are based on using a sequence of L1 binary-valued isotonic regressions. For L2 several of these improve upon prior results, which is somewhat surprising since it is the case that has been studied the most.


Quentin's Home Copyright © 2008-2017 Quentin F. Stout.