Bandit Strategies for Ethical Sequential Allocation

Janis Hardwick     Quentin F. Stout
University of Michigan


Abstract: We consider the problem of allocating patients in a clinical trial with two treatments, each of which has dichotomous responses. The goal is to determine the better treatment while incurring as few patient losses as possible. Several sampling procedures are compared, including equal allocation, which maximizes power, and the uniform bandit, which minimizes expected failures. It is found that a modified bandit strategy performs well on both criteria in that it achieves nearly optimal power while keeping expected trial failures nearly minimal. The modified bandit model is based on an approximation to the Gittins' index. Bandits form an important class of models for adaptive sampling problems, and this approach can be used in many settings other than clinical trials, achieving a good compromise between two binary objectives.

The rules are also evaluated according to the time required to compute them. By using an approximation to the Gittins index, rather than the true value, the computational complexity of the modified bandit is significantly reduced.

Keywords: medical ethics, controlled clinical trial, statistical computing, sequential allocation, response adaptive sampling procedure, bandit problem, design and analysis of experiments, active learning, Gittins index, power, probability of correct selection, indifference region, computational learning theory

Complete paper. This paper appears in Computing Science and Statistics 23 (1991), pp. 421-424.


Related Work
Adaptive Allocation:
Here is an explanation of this topic, including a description of bandit problems, and here are our relevant papers
Dynamic Programming (also known as Backward Induction):
Here is an overview of our work.

Quentin's Home Copyright © 2005-2021 Quentin F. Stout