Next: References Up: How to Make Software Previous: Conclusion

Sidebar: Genetic Algorithms

The field of Genetic Algorithms (GA) addresses the same set of problems as RL, but with a different methodology. Instead of learning a utility function, Genetic Algorithms learn the policy directly, representing it as an encoded bit string (or, in Genetic Programming, as a program in some computer language). The approach is to search the space of policies by generating a pool of policies and evaluating each one for a sequence of time steps in a simulated environment. A new pool of policies is constructed by combining the more successful existing policies, along with some random change. The algorithm iterates until a sufficiently good policy is found.

The name ``Genetic Algorithms'' comes from an analogy with biology: the bits representing a policy are like the bases in a strand of DNA, the creation of new policies is like the process of sexual reproduction, the random change is like mutation, and the propagation of successful policies is like natural selection.

The GA approach shares with RL the twin benefits of being based on training, not programming, and being responsive to changes in the environment through the possibility of online re-training. However, there are reasons to believe that RL makes better use of available computing resources. Consider, for example, a policy A that is in the pool of policies being evaluated by GA. Further, assume that the long-term cumulative reward from policy A is very poor. Then GA would simply discard policy A in favor of better policies in its pool without learning anything from the simulation of A. RL on the other hand has the potential of learning a lot from following policy A. Supposing policy A assigns action a to some state s, and that whenever action a is executed in state s the resulting next state always has high utility. RL will learn that action a is good in state s from executing policy A, even though the policy itself is bad. It can do this because it learns a state-based utility function. GA on the other hand ignores the state-based structure of the problem and therefore wastes information.

In summary, the GA approach needs a partial success in terms of total reward before it can make use of it, while the RL approach only needs a partial success in terms of reaching a high-utility intermediate state, which is easier to achieve.

It is not yet clear exactly when Genetic Algorithms are the preferred approach and when RL is, but in general, RL seems to do better as the problem becomes more complex. Genetic Algorithms may do well when there is a constrained, small space of policies to be searched.

Next: References Up: How to Make Software Previous: Conclusion