You're about to let a ``spider'' loose on the Internet. How do you know if it will seek out the information you want, without disrupting the net? You're in charge of writing the scheduling algorithm for a bank of elevators. How do you know when to go up and when to go down? In general, how do you make software do the right thing for its users? How do you even know what the right thing is?
Many people see agents and agent-based programming ushering in a new era in computing, particularly in the environment of the internet. The optimists believe that all the protocols for data transfer, encryption, security, and payment will be sorted out in a year or so, and we can then go about writing new and exciting agent-based applications. This article explains why programming agents is not just business-as-usual; rather it requires a new way of looking at problems and their solutions.
When you hire a human agent to do something for you, you rarely spell out a detailed plan of action. Instead, you define the state of the environment that you want to achieve (e.g., you tell a contractor that you want a new front porch with comfortable seating, for under $2000). In more complex and uncertain situations, you specify your preferences rather than stating outright goals, as when you tell a stock broker agent that the more money you make the better, but count capital gains as, say, 30% better than dividend income. Your hired agent then takes actions on your behalf, even negotiates with other agents, all to help you achieve your preferences. We would like our software agents to behave the same way.
That means we will need a way to describe our preferences to software agents, and a methodology for building agents that best satisfy our preferences. The pleasant surprise is that for many problems, once we know the preferences, we're almost done! Given the preferences, a list of possible actions, and enough time to practice taking actions, we can apply the formalism of Reinforcement Learning (or RL) to build an agent that acts according to the preferences in a near-optimal way. This article shows how.