### Isotonic Regression for Multiple Independent Variables

Quentin F. Stout
Computer Science and Engineering, University of Michigan

Abstract: This paper gives an approach for computing isotonic regressions at points in multidimensional space, with the ordering given by domination (also known as component-wise ordering, dimensional ordering, majorization, or matrix ordering). That is, given points p=(p1,...,pd) and q=(q1,...,qd) in d-dimensional space, q is said to dominate p if pi ≤ qi for all 1 ≤i ≤ d. Expressing this in standard dag (directed acyclic graph) terminology, or, equivalently, terminology of partial orderings, q dominates p is the same as q succeeds p.

Given a set of n d-dimensional points V = {v1...vn} where point vi has data value yi and weight wi, the Lp isotonic regression of the data is a set of values ýi that minimizes

(Σ {wi | yi - ýi | p)1/p : 1 ≤ i ≤ n }   if 1 ≤ p < ∞
max { wi|yi - ýi| : 1 ≤ i ≤ n}   if p = ∞
subject to the isotonic constraint that if vi dominates vj, then ýi ≥ ýj. Typically people are interested in L2 (least squares), L1 (median regression), or L (mini-max error, or the uniform metric).

Note that each dimension need only be a linearly ordered set. For example, one may be height, another shoe size, and a third S < M < L < XL shirt size, where y may correspond to weight. The isotonic requirement is based on the reasonable assumption that if any two of the dimensions are held constant then an increase in the third will increase the expected weight, or, at least, not decrease it.

The key to the results in this paper is efficiently computing an embedding of V into a larger dag G=(V∪R, E). G is bipartite, with every edge having one endpoint in V and one in R, and for all u, v ∈ V, v dominates u iff there is an r ∈ R such that (u,r), (r,v) ∈ E. We call G a rendezvous graph, and r is the unique rendezvous vertex for u and v. R and E are only slightly larger than V, with both having size Θ(n logd n), and they can be constructed in time linear in their size. G is also known as a Steiner 2-transitive-closure spanner, and it is known that its size is optimal. Some of the algorithms exploit the fact that G is not much larger than V, while others use its structure in a more central way. By using the rendezvous vertices operations are performed over the transitive closure more efficiently than they can be performed over the transitive closure itself. We also give a reduced rendezvous graph G' with Θ(n log d-1n ) vertices and edges, where for all u, v ∈ V, v dominates u iff there is a path in G' from u to v. See the note below concerning the optimality of G'.

Using G, algorithms are given for the L1 and L2 metrics which are n/poly-log n faster than previous algorithms, taking approximately Θ(n2) and Θ(n3) time, respectively. A yet faster algorithm is given for L1 isotonic regression with unweighted data. L is not unique, so algorithms are given for several different variants that have been studied. This includes the strict L isotonic regression, which is the limit, as p→∞, of Lp isotonic regression. These too are n/poly-log n faster than previous algorithms, taking approximately Θ(n) time.

Another approach to minimizing the representation of the transitive closure of a dag is via its transitive reduction, i.e., the smallest subset of edges that has the same transitive closure as the original dag. For some sets of multidimensional points this can be smaller than the rendezvous graph, e.g., if the points in 2-space are (1,1), (2,2), ... (n,n) then the transitive reduction has n-1 edges, not Θ(n log2 n). However, the worst case of the transitive closure is Θ(n2), e.g., if the data is (-5,-1), (-4, -2), (-3, -3), (-2, -4), (-1,-5), (1,4), (2,3), (3,2), (4,1) then the last 4 points dominate the first 5, and the transitive reduction has 20 edges. This can be reduced by adding the point (0,0), which dominates the first 5 points and is dominated by the second 4, and the transitive reduction has 9 edges. This is the essential concept of the rendezvous graph.

This embedding approach had first appeared a paper on L isotonic regression, which I posted on the web in 2008. That paper was too long so I moved the approach to this paper, extending it to other metrics and to the strict L isotonic regression.

Notes:

• An open question is whether the size of the reduced rendezvous graph mentioned above is optimal. That is, is it true that for any d there are sets S(n) of n d-dimensional points such that for any dag G containing S(n) among its vertices, where G has the property that for all u, v in S(n), v dominates u iff there is a path in G from u to v, then G has Ω(n logd-1 n) edges? The paper proves that this is true for d=2, and it is trivially true for d=1, but for larger d appears to be an open question.
• A result not mentioned in the paper is that given a random set of n d-dimensional points, where in each dimension all of the coordinates are different and they are in random order independent of the other dimensions, then the expected number of edges in the transitive reduction is Θ(n logd-1 n), where the implied constants depend upon d. This follows from a result of Bentley et al. (JACM 1978, pp. 536-543). Note that this is the same as the size of the worst-case reduced rendezvous graph.
• Utilizing the constructions in the paper shows that the transitive reduction can be computed in Θ(n2 logd-1 n) time. I haven't searched the literature, but the optimal time to find the transitive reduction in this setting may be an open question.

Keywords: weighted isotonic regression, isotonic regression algorithm, monotonic, multidimensional, nonparametric, order-embedding, shape-constrained, transitive closure, Steiner points

Complete paper. This appears in Algorithmica 71 (2015), pp. 450-470.

Related work: Here is information about the fastest isotonic regression algorithms known to date, for a variety of orderings and metrics.

Here are papers on L isotonic regression for arbitrary dags (also known as minimax regression), L isotonic regression for linear, multidimensional, and tree orders, strict L isotonic regression, L1isotonic regression (also known as median regression), isotonic regression with a fixed number of level sets, and unimodal regression.