Multidimensional Isotonic Regression

Isotonic Regression for Multiple Independent Variables

Quentin F. Stout
Computer Science and Engineering, University of Michigan

Extended Abstract: This paper gives an approach for computing isotonic regressions at points in multidimensional space, with the ordering given by domination (also known as component-wise ordering, dimensional ordering, general dimensions, multiple dimensions, majorization, or matrix ordering). That is, given points p=(p₁,...,p_d) and q=(q₁,...,q_d) in d-dimensional space, q is said to dominate p if p_i ≤ q_i for all 1 ≤i ≤ d. Expressing this in standard dag (directed acyclic graph) terminology, or, equivalently, terminology of partial orderings, q dominates p is the same as q succeeds p.

Given a set of n d-dimensional points V = {v₁...v_n} where point v_i has data value y_i and weight w_i, the L_p isotonic regression of the data is a set of values ý_i that minimizes

(Σ {w_i | y_i - ý_i | ^p)^1/p : 1 ≤ i ≤ n } if 1 ≤ p < ∞
max { w_i|y_i - ý_i| : 1 ≤ i ≤ n} if p = ∞ subject to the isotonic constraint that if v_i dominates v_j, then ý_i ≥ ý_j. Typically people are interested in L₂ (least squares), L₁ (median regression), or L_∞ (mini-max error, or the uniform metric).

Note that each dimension need only be a linearly ordered set. For example, one may be height, another shoe size, and a third S < M < L < XL shirt size, where y may correspond to weight. The isotonic requirement is based on the reasonable assumption that if any two of the dimensions are held constant then an increase in the third will increase the expected weight, or, at least, not decrease it.

The key to the results in this paper is efficiently computing an embedding of V into a larger dag G=(V∪R, E). G is bipartite, with every edge having one endpoint in V and one in R, and for all u, v ∈ V, v dominates u iff there is an r ∈ R such that (u,r), (r,v) ∈ E. We call G a rendezvous graph, and r is the unique rendezvous vertex for u and v. R and E are only slightly larger than V, with both having size Θ(n log^d n), and they can be constructed in time linear in their size. G is also known as a Steiner 2-transitive-closure spanner, and it is known that its size is optimal. Some of the algorithms exploit the fact that G is not much larger than V, while others use its structure in a more central way. By using the rendezvous vertices operations are performed over the transitive closure more efficiently than they can be performed over the transitive closure itself. We also give a reduced rendezvous graph G^' with Θ(n log ^d-1n ) vertices and edges, where for all u, v ∈ V, v dominates u iff there is a path in G^' from u to v. See the note below concerning the optimality of G^'. These rendezvous graphs are particularly useful and efficient if the graph is sparse and high-dimensional, situations that are increasingly arising in machine learning and data mining.

Using G, algorithms are given for the L₁ and L₂ metrics which are n/poly-log n faster than previous algorithms, taking approximately Θ(n²) and Θ(n³) time, respectively. A yet faster algorithm is given for L₁ isotonic regression with unweighted data. L_∞ is not unique, so algorithms are given for several different variants that have been studied. This includes the strict L_∞ isotonic regression, which is the limit, as p→∞, of L_p isotonic regression. These too are n/poly-log n faster than previous algorithms, taking approximately Θ(n) time.

Another approach to minimizing the representation of the transitive closure of a dag is via its transitive reduction, i.e., the smallest subset of edges that has the same transitive closure as the original dag. For some sets of multidimensional points this can be smaller than the rendezvous graph, e.g., if the points in 2-space are (1,1), (2,2), ... (n,n) then the transitive reduction has n-1 edges, not Θ(n log² n). However, the worst case of the transitive closure is Θ(n²), e.g., if the data is (-5,-1), (-4, -2), (-3, -3), (-2, -4), (-1,-5), (1,4), (2,3), (3,2), (4,1) then the last 4 points dominate the first 5, and the transitive reduction has 20 edges. This can be reduced by adding the point (0,0), which dominates the first 5 points and is dominated by the second 4, and the transitive reduction has 9 edges. This is the essential concept of the rendezvous graph.

This embedding approach had first appeared a paper on L_∞ isotonic regression, which I posted on the web in 2008. That paper was too long so I moved the approach to this paper, extending it to other metrics and to the strict L_∞ isotonic regression.

Notes:

An open question is whether the size of the reduced rendezvous graph mentioned above is optimal. That is, is it true that for any d there are sets S(n) of n d-dimensional points such that for any dag G containing S(n) among its vertices, where G has the property that for all u, v in S(n), v dominates u iff there is a path in G from u to v, then G has Ω(n log^d-1 n) edges? The paper proves that this is true for d=2, and it is trivially true for d=1, but for larger d appears to be an open question.
A result not mentioned in the paper is that given a random set of n d-dimensional points, where in each dimension all of the coordinates are different and they are in random order independent of the other dimensions, then the expected number of edges in the transitive reduction is Θ(n log^d-1 n), where the implied constants depend upon d. This follows from a result of Bentley et al. (JACM 1978, pp. 536-543). Note that this is the same (in O-notation) as the size of the worst-case reduced rendezvous graph.
Utilizing the constructions in the paper shows that the transitive reduction can be computed in Θ(n² log^d-1 n) time. I haven't searched the literature, but the optimal time to find the transitive reduction in this setting may be an open question.

Keywords: weighted isotonic regression, isotonic regression algorithm, monotonic, multidimensional, nonparametric, order-embedding, shape-constrained, transitive closure, Steiner points

Complete paper. This appears in Algorithmica 71 (2015), pp. 450-470. You can also request a copy from Springer.

Related work: Here is an overview of my work on shape constrained regression (isotonic, unimodal, step), and here is information about the fastest (in O-notation) isotonic regression algorithms known to date, for a variety of orderings and metrics.