Isotonic Regression for Multiple Independent Variables

Quentin F. Stout
Computer Science and Engineering, University of Michigan

 

Extended Abstract: This paper gives an approach for computing isotonic regressions at points in multidimensional space, with the ordering given by domination (also known as component-wise ordering, dimensional ordering, general dimensions, multiple dimensions, majorization, or matrix ordering). That is, given points p=(p1,...,pd) and q=(q1,...,qd) in d-dimensional space, q is said to dominate p if pi ≤ qi for all 1 ≤i ≤ d. Expressing this in standard dag (directed acyclic graph) terminology, or, equivalently, terminology of partial orderings, q dominates p is the same as q succeeds p.

Given a set of n d-dimensional points V = {v1...vn} where point vi has data value yi and weight wi, the Lp isotonic regression of the data is a set of values ýi that minimizes

(Σ {wi | yi - ýi | p)1/p : 1 ≤ i ≤ n }   if 1 ≤ p < ∞
max { wi|yi - ýi| : 1 ≤ i ≤ n}   if p = ∞
subject to the isotonic constraint that if vi dominates vj, then ýi ≥ ýj. Typically people are interested in L2 (least squares), L1 (median regression), or L (mini-max error, or the uniform metric).

Note that each dimension need only be a linearly ordered set. For example, one may be height, another shoe size, and a third S < M < L < XL shirt size, where y may correspond to weight. The isotonic requirement is based on the reasonable assumption that if any two of the dimensions are held constant then an increase in the third will increase the expected weight, or, at least, not decrease it.

The key to the results in this paper is efficiently computing an embedding of V into a larger dag G=(V∪R, E). G is bipartite, with every edge having one endpoint in V and one in R, and for all u, v ∈ V, v dominates u iff there is an r ∈ R such that (u,r), (r,v) ∈ E. We call G a rendezvous graph, and r is the unique rendezvous vertex for u and v. R and E are only slightly larger than V, with both having size Θ(n logd n), and they can be constructed in time linear in their size. G is also known as a Steiner 2-transitive-closure spanner, and it is known that its size is optimal. Some of the algorithms exploit the fact that G is not much larger than V, while others use its structure in a more central way. By using the rendezvous vertices operations are performed over the transitive closure more efficiently than they can be performed over the transitive closure itself. We also give a reduced rendezvous graph G' with Θ(n log d-1n ) vertices and edges, where for all u, v ∈ V, v dominates u iff there is a path in G' from u to v. See the note below concerning the optimality of G'. These rendezvous graphs are particularly useful and efficient if the graph is sparse and high-dimensional, situations that are increasingly arising in machine learning and data mining.

Using G, algorithms are given for the L1 and L2 metrics which are n/poly-log n faster than previous algorithms, taking approximately Θ(n2) and Θ(n3) time, respectively. A yet faster algorithm is given for L1 isotonic regression with unweighted data. L is not unique, so algorithms are given for several different variants that have been studied. This includes the strict L isotonic regression, which is the limit, as p→∞, of Lp isotonic regression. These too are n/poly-log n faster than previous algorithms, taking approximately Θ(n) time.

Another approach to minimizing the representation of the transitive closure of a dag is via its transitive reduction, i.e., the smallest subset of edges that has the same transitive closure as the original dag. For some sets of multidimensional points this can be smaller than the rendezvous graph, e.g., if the points in 2-space are (1,1), (2,2), ... (n,n) then the transitive reduction has n-1 edges, not Θ(n log2 n). However, the worst case of the transitive closure is Θ(n2), e.g., if the data is (-5,-1), (-4, -2), (-3, -3), (-2, -4), (-1,-5), (1,4), (2,3), (3,2), (4,1) then the last 4 points dominate the first 5, and the transitive reduction has 20 edges. This can be reduced by adding the point (0,0), which dominates the first 5 points and is dominated by the second 4, and the transitive reduction has 9 edges. This is the essential concept of the rendezvous graph.

This embedding approach had first appeared a paper on L isotonic regression, which I posted on the web in 2008. That paper was too long so I moved the approach to this paper, extending it to other metrics and to the strict L isotonic regression.

Notes:

Keywords: weighted isotonic regression, isotonic regression algorithm, monotonic, multidimensional, nonparametric, order-embedding, shape-constrained, transitive closure, Steiner points

Complete paper. This appears in Algorithmica 71 (2015), pp. 450-470. You can also request a copy from Springer.

 

Related work: Here is an overview of my work on shape constrained regression (isotonic, unimodal, step), and here is information about the fastest (in O-notation) isotonic regression algorithms known to date, for a variety of orderings and metrics.


Quentin's Home Copyright © 2009-2024 Quentin F. Stout.