3-Arm Bandit Problem

A Program for Sequential Allocation of Three Bernoulli Populations

Janis Hardwick Robert Oehmke Quentin F. Stout
University of Michigan

Abstract: We describe a program for optimizing and analyzing sequential allocation problems involving three Bernoulli populations and an arbitrary objective function. Previous researchers had considered this problem computationally intractable, and there appears to be no prior exact optimizations for such problems, even for very small sample sizes. This paper contains a description of the program, along with the techniques used to scale it to large sample sizes. The program currently handles problems of size 200 or more by using a modest parallel computer, and problems of size 100 on a workstation. As an illustration, the program is used to create an adaptive sampling procedure that is the optimal solution to a 3-arm bandit problem. The bandit procedure is then compared to two other allocation procedures along various Bayesian and frequentist metrics. Note that such models can be applied to settings such as clinical trials, and in the area of machine learning can be used for active learning.

An important aspect of the dynamic programming equations being solved is that they are near-neighbor recurrence equations, having very simple template patterns. Many other problems can be solved via similar equations, and the basic control and communication structure of these programs can be used to solve several such problems. A few of these are pointed out in the conclusion.

Keywords: multi-arm bandit problem, controlled clinical trial, response adaptive sampling design, Bernoulli binary response, sequential allocation, adaptive procedure, dynamic programming, design of experiments, active learning, parallel computing, supercomputing

Complete paper. This paper appears in Computational Statistics and Data Analysis 31 (1999), pp. 397-416.

Related Work

Computer Science:: A paper emphasizing the computational efficiency aspects of this work.
Adaptive Allocation:: An explanation of this topic, including bandit problems, and our publications. Here is a poster and handout about this topic. It appeared in the NPACI exhibit booth at SC'98.
Dynamic Programming (sometimes known as backward induction):: An overview of our work.
Parallel Computing:: A modest explanation of parallel computing.
A tutorial on parallel computing.
An overview of our work, and relevant papers.