Jonathan E. Juett. 2021.
Towards Learning the Foundations of Manipulation Actions from Unguided Exploration.
PhD thesis, Computer Science & Engineering, University of Michigan, 2021.


Human infants are not born with the ability to reach and grasp. But after months of typical development, infants are capable of reaching and grasping reliably. During this time, the infant receives minimal guidance and learns primarily by observing its autonomous experience with its developing senses. How is it possible for this learning phenomenon to occur, especially when this experience begins with seemingly random motions?

We present a computational model that allows an embodied robotic agent to learn these foundational actions in a manner consistent with infant learning. By examining the model and the resulting behaviors, we can identify knowledge sufficient to perform these actions, and how this knowledge may be represented.

Our agent uses a graph representation for peripersonal space, the space surrounding the agent and in reach of its manipulators. The agent constructs the Peripersonal Space (PPS) Graph by performing random motions. These motions are performed with the table but no other nonself foreground objects present to facilitate simple image segmentation of an unoccluded view of the hand. For each pose visited, a node stores the joint angles that produced it and an image of the arm in this configuration. Edges connect each pair of nodes that have a feasible motion between them. Later in the learning process, the agent may use learned criteria to temporarily remove a node or edge from consideration if motion to it or along it is expected to cause a collision given the current position of a foreground object being treated as an obstacle. The PPS Graph provides a mapping between configuration space and image space, and the agent learns to use it as a powerful tool for planning manipulation actions.

Initially, the only known actions are moves to selected PPS Graph nodes. The agent begins learning by executing move actions, and will continue to learn by applying the same learning method with other actions once it has defined them. The agent selects a random node as target, and observes the typical results of moving to it. The action is performed in the presence of at least one nonself foreground object, and the results of each action trial can be described in terms of the object's qualitative state and any changes it undergoes. Clustering of the results of all trials may identify the large cluster of typical results and perhaps some small clusters corresponding to autonomously observed unusual events. If there is at least one such cluster, the agent defines a new action with the goal of achieving the same type of unusual result. Once a new action is defined, the agent learns features that help achieve the goal more reliably. This learning phase is the focus of this work. This phase resembles early action learning in human infant development, where a relatively small set of examples provides the necessary experience to learn to make the action reliable, though perhaps awkward and jerky in execution. These results prepare for a second learning phase to be carried out in future work, which corresponds with late action learning in humans, where actions become efficient with smooth trajectories. At the conclusion of this work, the move, reach, ungrasp, and place actions are fully reliable, and the grasp and pick-and-place actions are semi-reliable.