Grace Tsai, Changhai Xu, Jingen Liu and Benjamin Kuipers. 2011.
Real-time indoor scene understanding using Bayesian filtering with motion cues.
Int. Conf. on Computer Vision (ICCV), 2011.


We present a method whereby an embodied agent using visual perception can efficiently create a model of a local indoor environment from its experience of moving within it. Our method uses motion cues to compute likelihoods of indoor structure hypotheses, based on simple, generic geometric knowledge about points, lines, planes, and motion. We present a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment from an initial frame. We then use data from subsequent frames to update a Bayesian posterior probability distribution over the set of hypotheses. The likelihood function is efficiently computable by comparing the predicted location of point features on the environment model to their actual tracked locations in the image stream. Our method runs in real-time, and it avoids the need of extensive prior training and the Manhattan-world assumption which makes it more practical and efficient for an intelligent robot to understand its surroundings compared to most previous scene understanding methods. Experimental results on a collection of indoor videos suggest that our method is capable of an unprecedented combination of accuracy and efficiency.