ALATE Dynamic Programming Overview

Overview of ALATE's Work on Dynamic Programming

This is an overview of some of the work done by my students and collaborators in the area of dynamic programming. It includes work by the faculty members Quentin F. Stout, Janis Hardwick and Marilynn Livingston, and various undergraduate and graduate students, as well as former students.

Our work on dynamic programming has emphasized finite state spaces and computational approaches that yield exact optimizations and analyses. There are two main application areas: adaptive (sequential) designs for random responses, and determining properties of graphs. There is also some effort on optimizing parallel implementations of dynamic programming. I've also listed a few open questions related to our work.

For an extensive collection of pointers to work on the subject throughout the world, visit the dynamic programming component of the World-Wide Web for Operations Research and Management Science (WORMS) pages. The material there is interesting, and includes a fun collection of graphics.

For papers, Paper.ps means a Postscript version of the paper, and Paper.pdf means a PDF version. Each Abstract is in HTML.

Adaptive Designs for Random Responses

Much of the work on adaptive designs has been motivated by problems in allocating patients to treatments in a clinical trial or in industrial applications of estimation or fault tolerance, though other areas are also being worked on. In the simplest clinical setting, each patient is assigned to one of two treatments, and the outcome (success or failure) is observed before the next patient arrives. If we are trying to maximize the number of successes, then the problem is the well-known two-armed bandit problem, which can be solved via dynamic programming if one has a Bayesian model. (The bandit terminology comes from considering slot machines, which are sometimes called bandits. In this view, the two different treatments are two different arms on the slot machine, and allocating a patient to a treatment corresponds to pulling an arm. For the bandit, the problem is to maximize the expected payout.) Bandit problems are a specific form of stochastic optimization problems, and are part of the general areas of control theory, dynamical systems, optimization theory, and discrete applied mathematics.

However, we are also interested in other criteria such as the probability of correctly identifying the better treatment at the end of the trial, or the number of times treatments are switched, which introduces new optimization constraints or forces us to evaluate an allocation procedure multiple times. We developed the technique of path induction to reduce the time required for such repeated evaluations. (Note that we had previously called path induction "forward induction", but that term proved confusing as it had been previously used.)

Here is a more more extensive explanation of adaptive designs. A survey paper which discusses the algorithms is:

J. Hardwick and Q.F. Stout, ``Flexible algorithms for creating and analyzing adaptive designs'', In New Developments and Applications in Experimental Design, N. Flournoy, W.F. Rosenberger, and W.K. Wong, eds., IMS Lecture Notes--Monograph Series vol. 34, 1998, pp. 91-105.
Keywords: adaptive design, stochastic optimization, dynamic programming, bandit problems, algorithms, high-performance computing, few-stage allocation, group sampling, switching, robustness, path induction.
Abstract Paper.ps Paper.pdf

Some additional papers in this area are:

J. Hardwick and Q.F. Stout, ``Bandit strategies for ethical sequential allocation'', Computing Science and Statistics 23 (1991), pp. 421-424.
Keywords: sequential allocation, dynamic programming, medical ethics, Gittins index, power, probability of correct selection, indifference region, adaptive clinical trial
Abstract Paper.ps Paper.pdf
J. Hardwick and Q.F. Stout, ``Optimal allocation for estimating the product of two means'', Computing Science and Statistics 24 (1992), pp. 592-596.
Keywords: sequential allocation, nonlinear estimation, dynamic programming, reliability, myopic allocation, fixed allocation.
Abstract Paper.pdf
J. Hardwick and Q.F. Stout, ``Exact computational analyses for adaptive designs'', in Adaptive Designs, N. Flournoy and W.F. Rosenberger, eds., Institute of Math. Stat. Lecture Notes Monograph Series Vol. 25, 1995, pp. 223-237.
Keywords: constrained dynamic programming, forward induction, backward induction, Bayesian design, multiple criteria, clinical trials.
Abstract
J. Hardwick and Q.F. Stout, ``Optimal allocation for estimating the mean of a bivariate polynomial'', Sequential Analysis 15 (1996), pp. 71-90.
Keywords: nonlinear estimation, sequential design, robustness, myopic, hyperopic, dynamic programming, adaptive allocation, product, Bayesian, fault tolerance.
Abstract Paper.ps (color figures) Paper.ps (black/white figures) Paper.pdf (color figures) Paper.pdf (black/white figures)
J. Hardwick, C. Page and Q.F. Stout, ``Sequentially deciding between two experiments for estimating a common success probability'', J. American Statistical Association, 93 (1998), pp. 1502-1511.
Keywords: batch testing, risk assessment, infection rate, grouped data, omniscient allocation, composite testing.
Abstract Paper.ps Paper.pdf
J. Hardwick and Q.F. Stout, ``Using path induction to evaluate sequential allocation procedures'', SIAM J. Scientific Computing, 1999, to appear.
Keywords: backward induction, adaptive allocation, staged allocation, group sampling, sequential design, path counting, Bayesian, bandit problems, dynamic programming.
Abstract Paper.ps Paper.pdf
Q.F. Stout and J. Hardwick, ``Minimizing the costs of screening trials'', Computing Science and Statistics 31 (1999), to appear.
Keywords: few-stage, stopping rule.
Abstract Paper.pdf

One important variation on this is when one has to make allocation decisions in stages, deciding allocations for several patients at one time. This is highly desirable from a practical viewpoint, but it apparently greatly increases the time and space required to fully optimize the allocation procedure - see

J. Hardwick and Q.F. Stout, ``Optimal few-stage designs'', J. Statistical Planning and Inference 104 (2002), pp. 121-145.
Keywords: sequential analysis, clinical trials, two-stage, three-stage, experimental design, group allocation, adaptive sampling, selection, estimation, bandit problems, product of means, dynamic programming, algorithms.
Abstract Paper.ps Paper.pdf

A closely related variation is where one tries to minimize switching between the alternatives, since switching can be quite costly in some settings. See

J. Hardwick and Q.F. Stout, ``Response adaptive designs that incorporate switching costs and constraings'', J. Statistical Planning and Inference 137 (2007), pp. 2654-2665
Keywords: adaptive sampling, switching costs, constraints, bandit, estimation, dynamic programming, optimal tradeoffs.
Abstract Paper.pdf

Another variation occurs when one is going to allocate an equal number of patients to each treatment (to maximize the probability of correctly selecting the better treatment), but will stop as soon as the outcome is known. Dynamic programming can be used to minimize the expected value of other criteria, such as the number of failures or the sample size - see

J. Hardwick and Q.F. Stout, ``Optimal adaptive equal allocation rules'', Computing Science and Statistics 24 (1992), pp. 597-601.
Keywords: sequential allocation, dynamic programming, curtailment, probability of correct selection, equal allocation, vector at a time.
Abstract Paper.ps Paper.pdf

Some of our recent results require the use of parallel computers, due to the very large memory and computation requirements of the problems.

Properties of Graphs

Many properties of graphs, such as domination numbers, covering numbers, number of codewords, and packings, can be easily determined by dynamic programming in time linear in the number of nodes, if the graph has a simple structure such as a tree or directed acyclic graph (dag). Similarly, it was fairly well-known that for certain families of graphs, such as k x n meshes (grids) or complete k-ary trees of height n, where k is fixed, the same properties can be computed by dynamic programming in time linear in n (though exponential in k).

We showed that for these families these properties could actually be computed in constant time (though still exponential in k) by showing that they have a closed form. We exploit the fact that the dynamic programming is essentially determining minimal weight paths of length n in a finite state space, and must ultimately reach a periodic solution (which can be detected). See

M.L. Livingston and Q.F. Stout, ``Constant time computation of minimum dominating sets'', Congresses Numerantium 105 (1994), pp. 116-128.
Keywords: codes, covering, domination, packing, matching, perfect domination, grid graph, product graph, mesh, torus, dynamic programming, finite state space.
Abstract Paper.ps Paper.pdf

A related use of dynamic programming concerns evaluating the fault tolerance of allocation systems for parallel computers. For example, suppose we have an n x n mesh, where n is even, which we view as containing four n/2 x n/2 quadrants. Vertices represent processors in the parallel computer, and suppose initially all processors are functioning correctly. If processors fail independently, what is the expected number of processors such that no quadrant is fault-free? (We may have a program that needs an n/2 x n/2 submesh, but the allocation system may be only able to allocate the quadrants, not arbitrary submeshes. Obviously the fewest faults that destroy all quadrants is 4, and the most that leave one quadrant fault-free is 3n²/4.) This problem, and more interesting situations where the allocable submeshes overlap, can be solved by dynamic programming - see

M.L. Livingston and Q.F. Stout, ``Fault tolerance of allocation schemes in massively parallel computers'', Proc. 2nd Symp. Frontiers of Massively Parallel Computation (1988), pp. 491-494.
Keywords: fault tolerance, allocation, hypercube computer, mesh, torus, buddy system.
Abstract
M.L. Livingston and Q.F. Stout, ``Parallel allocation algorithms for hypercubes and meshes'', Proc. 4th Conf. on Hypercubes, Concurrent Computers, and Applications (1989), pp. 59-66.
Keywords: subcube allocation, buddy system, cyclic buddy system, submesh allocation, torus.
Abstract
M.L. Livingston and Q.F. Stout, ``Fault tolerance of the cyclic buddy subcube location scheme in hypercubes'', Proc. 6th Distributed Memory Computing Conf. (1991), IEEE, pp. 34-41.
Keywords: fault tolerance, subcube location, subcube allocation, hypercube computer, buddy system, gray-coding, folded hypercube, projective hypercube
Abstract

Unfortunately, due to page constraints, many of the details of the dynamic programming involved were not given in these papers.

Parallel Dynamic Programming

Because dynamic programming often involves large state spaces and extensive computations, many people have implemented it on parallel computers. Our interest is in achieving very high performance in solving DP problems such as occur when creating the adaptive designs discussed above. The high performance is needed because we want to push the limits of the problem sizes that can be solved. To do so, we must obtain optimal or near-optimal load balance, while minimizing communication and retaining all of the serial optimizations. The high dimensionality encountered means that the array rapidly decreases as the dynamic programming proceeds, which forces processors to continually shift the index ranges that they are working on. For various formulations of the problem, the optimal work assignment is itself a dynamic programming problem, though one that is far simpler than the original. Fortunately, some heuristics work extremely well.

A recent result in this area is the ability to optimally design a clinical trial for 200 patients, where there are 3 different treatments available. Other researchers have only been able to handle a handful of patients when 3 treatments were considered, or they had to resort to suboptimal designs, because the time and space required grow as O(n⁶), where n is the number of patients. One of the things we will now be able to do is determine exactly how suboptimal their designs were, and we will be able to consider better tradeoffs such as statistical power versus failures. With simple changes, this program can also handle 2 treatments where there are 3 possible outcomes (instead of only 2, as we normally assume). We know of no previous work which achieves optimal solutions for this problem. With less simple changes, this program can handle some models of 2 treatments with delayed responses, i.e., you must make new allocations before all of the results of previous allocations are known. Here too we know of no previous optimal designs.

Bob Oehmke is the graduate student working on implementing this on an IBM SP2. He is also working on programs for several other aspects of adaptive designs, and more generally on the problem of producing efficient highly parallel programs for dynamic programming. Our most recent paper on this work is

J. Hardwick, R. Oehmke and Q.F. Stout, ``A program for sequential allocation of three Bernoulli populations'', Computational Statistics and Data Analysis 31 (1999), pp. 397-416.
Keywords: adaptive allocation, sequential sampling, bandit, dynamic programming, design of experiments, path induction, stochastic optimization, parallel computing, IBM SP2, MPI, machine learning, parallel algorithm, multi-arm bandit, clinical trial, high-performance computing.
Abstract Paper.ps Paper.pdf

which builds upon work first reported in

J. Hardwick, R. Oehmke and Q.F. Stout, ``A parallel program for 3-arm bandits'', Computing Science and Statistics 29 (1997), pp. 390-395.
Keywords: sequential allocation, dynamic programming, parallel computing, bandits.
Abstract Paper.pdf

Some Open Questions

Several of the above papers mention specific open questions. Here are some of the more general ones. Feel free to contact me if you want (or can give) some more information.

For all of the variations mentioned in adaptive designs, there is a common open question:

Is the given dynamic programming algorithm the fastest way to determine the optimal answer?

That is, dynamic programming finds the optimal answer, but often requires a great deal of time and space. Can the optimal answer be found more quickly? This question applies to a great many other problems as well.

Related to the work on fault tolerance of allocation systems in parallel computers, the following problem gives the limiting behavior, as the system becomes arbitrarily large, of being able to find a fault-free square submesh of 1/4 the size of the full mesh.

Suppose points are place uniformly and independently in the unit square, one at a time, until there is no point-free subsquare of edgelength 1/2, where the subsquare must have its sides parallel to the sides of the square. What is the expected number of points placed?

(It is possible that the solution will not involve dynamic programming.) It is also interesting to know the solution if the original square has both pairs of opposite sides connected, creating a torus.

Here are some conjectures and open problems in the area of finding properties of families of graphs.