Experts on human evolution still do not agree about how and why early humans evolved big brains. Did early humans evolve big brains first, and then start being successful as hunter-gatherers? Or did they first try to make a living as hunter-gatherers, which provided selective pressure in favor of bigger brains?
Before trying to answer this question, consider a simple thought experiment about evolution.
Let's imagine that random mutations created organisms with all possible combinations, connecting pain and sex to positive, neutral, or negative feelings. The critters for whom pain feels good (or even neutral) accumulate damage to the organism, tend to die early, and fail to have children. The critters for whom sex feels bad (or even neutral), also tend to have fewer children. Those with numerous children and grandchildren tend to be those for whom pain feels bad, and sex feels good.
Researchers in reinforcement learning sometimes explain learning as being indirectly rewarded by fundamental drives such as the positive reward from food and drink, and the negative reward from pain. However, the substantial effort involved in learning to walk, for example, is quite far removed from fundamental biological drives, but nonetheless seems to receive some sort of immediate reward (as shown in this ecstatic video). Thus, there seems to be an intrinsic reward from learning like this.
How would that intrinsic reward work? One possibility is to reward the ability to take some action and accurately predict the result. Unfortunately, for this reward function, hiding in the dark, doing and sensing nothing, and predicting more of the same, provides maximum reward. Another possibility is to reward exploring areas of the state space where predictive accuracy is low, in the hopes that learning will take place. This seems unduly optimistic.
A small variation makes an important difference. Suppose the intrinsic reward signal is proportional to the rate of change in the ability to accurately predict the result of an action. Where little is known, and little can be learned, predictive accuracy and its derivative are both low, so there is little intrinsic reward. However, if the agent happens across part of the state space where action leads to learning and improvement in predictive accuracy, then the agent gets a dose of intrinsic reward, and is motivated to spend more time in that region. As long as learning continues, the agent gets more reward, and continues practicing in that area. However, once the learning curve approaches its upper limit (either perfect prediction, or the best that can be done with available information), the derivative drops toward zero, and this activity yields less and less intrinsic reward. We might say that the agent gets bored.
On the other hand, bored or not, the agent has learned something new: the ability to perform some action and more accurately predict the result. This changes its ability to plan and to move around in the state space, as it searches for a new region where more intrinsic reward can be had. It seems to me that this is an important piece of the puzzle of how continual learning can be motivated in the individual.
(I first encountered this idea in [Oudeyer & Kaplan, 2004], but its first appearance in the AI literature appears to be [Schmidhuber, 1991]. It is closely related to Lev Vygotsky's (1978) concept of the "zone of proximal development".)
My hypothesis is that intrinsic reward based on learning rate can explain the evolution of big brains.
Suppose there was a point mutation in some early members of genus Homo that linked improved predictive accuracy to release of some intrinsically rewarding chemical, dopamine for example. This reward signal could be gotten from improvement in predicting the results of actions, which amounts to improvement in competence.
Members of genus Homo with this mutation would naturally increase their frequency of these pleasure-causing behaviors. Since this is tied to an increase in competence, these individuals would become more competent, and therefore more successful evolutionarily.
The ability of an individual to learn, and thus to reap the benefits of this source of reward, is limited in part by the size of his or her brain. Offspring with larger brains not only have a larger capacity to learn, and hence to reap the intrinsic sensual rewards of that learning, but as a side-effect they are able to obtain greater competence and hence greater evolutionary fitness.
Over evolutionary time, brains would continue to increase not only in size, but more importantly in learning capacity, until limited by the bad evolutionary consequences of not being able to fit through the birth canal. Thus we get the evolution of Homo Sapiens.
This question needs more research. However, I have the impression that there is a term, such as total activation, that is easy to derive from a neural network. When the network starts up, and has no particular predictive ability, this term has a small value. As the network learns, this term increases in value. But as the network converges to a state of expertise, the coefficients move towards zero and one, and the total activation decreases again. The magnitude of this total activation term thus corresponds approximately to the first derivative of predictive accuracy.
If total activation can be tied to, say, dopamine release, this would provide the required link between learning rate and reward.
This essay might be considered a Just So story. A great deal of careful research is needed to transform it into a scientific hypothesis and evaluate it properly. But what does it offer?
If this story is correct, then the key step in the evolution of Homo Sapiens can be explained by a single mutation, making the act of learning a source of intrinsic reward. (Go back and watch this ecstatic video.) The evolutionary fitness advantages of big brains are simply side-effects of an increasing ability to drink from that particular source of sensual pleasure.
[Schmidhuber, 1991] J. Schmidhuber. Curious model-building control systems. In Proc. Int. J. Conf. Neural Networks, 1991.
[Vygotsky, 1978] L. S. Vygotsky. Mind and Society: The Development of Higher Psychological Processes. Cambridge MA: Harvard University Press, 1978.