Grace Tsai. 2014.
On-line, incremental visual scene understanding
for an indoor navigating robot.
Doctoral dissertation,
Department of Electrical Engineering and Computer Science,
University of Michigan.
An indoor navigating robot must perceive its local environment in order to act. The robot must construct a model that captures critical navigation information from the stream of visual data that it acquires while traveling within the environment. Visual processing must be done on-line and efficiently to keep up with the robot's need. This thesis contributes both representations and algorithms toward solving the problem of modeling the local environment for an indoor navigating robot. Two representations, Planar Semantic Model (PSM) and Action Opportunity Star (AOS), are proposed to capture important navigation information of the local indoor environment. PSM models the geometric structure of the indoor environment in terms of ground plane and walls, and captures rich relationships among the wall segments. AOS is an abstracted representation that reasons about the navigation opportunities at a given pose. Both representations are capable of capturing incomplete knowledge where representations of unknown regions can be incrementally built as observations become available. An on-line generate-and-test framework is presented to construct the PSM from a stream of visual data. The framework includes two key elements, an incremental process of generating structural hypotheses and an on-line hypothesis testing mechanism using a Bayesian filter. Our framework is evaluated in three phases. First, we evaluate the effectiveness of the on-line hypothesis testing mechanism with an initially generated set of hypotheses in simple empty environments. We demonstrate that our method outperforms state-of-the-art methods on geometric reasoning both in terms of accuracy and applicability to a navigating robot. Second, we evaluate the incremental hypothesis generating process and demonstrate the expressive power of our proposed representations. At this phase, we also demonstrate an attention focusing method to efficiently discriminate among the active hypothesized models. Finally, we demonstrate a general metric to test the hypotheses with partial explanations in cluttered environments.