Janis Hardwick
Robert Oehmke
Quentin F. Stout
University of Michigan
Abstract: We describe a program for optimizing and analyzing sequential allocation problems involving three Bernoulli populations and an arbitrary objective function. Previous researchers had considered this problem computationally intractable, and there appears to be no prior exact optimizations for such problems, even for very small sample sizes. This paper contains a description of the program, along with the techniques used to scale it to large sample sizes. The program currently handles problems of size 200 or more by using a modest parallel computer, and problems of size 100 on a workstation. As an illustration, the program is used to create an adaptive sampling procedure that is the optimal solution to a 3-arm bandit problem. The bandit procedure is then compared to two other allocation procedures along various Bayesian and frequentist metrics. Note that such models can be applied to settings such as clinical trials, and in the area of machine learning can be used for active learning.
An important aspect of the dynamic programming equations being solved is that they are near-neighbor recurrence equations, having very simple template patterns. Many other problems can be solved via similar equations, and the basic control and communication structure of these programs can be used to solve several such problems. A few of these are pointed out in the conclusion.
Keywords: multi-arm bandit problem, controlled clinical trial, response adaptive sampling design, Bernoulli binary response, sequential allocation, adaptive procedure, dynamic programming, design of experiments, active learning, parallel computing, supercomputing
Complete paper. This paper appears in Computational Statistics and Data Analysis 31 (1999), pp. 397-416.
Copyright © 2004-2016 Quentin F. Stout |