Stella X. Yu : Papers / Google Scholar

Driving Scene Retrieval by Example from Large-Scale Data

Sascha Hornhauer and Baladitya Yellapragada and Arian Ranjbar and Stella X. Yu

IEEE Conference on Computer Vision and Pattern Recognition Workshop: Vision Meets Cognition, Long Beach, California, 16 June 2019

Paper | Poster

Abstract

Many machine learning approaches train networks with input from large datasets to reach high task performance. Collected datasets, such as Berkeley Deep Drive Video (BDD-V) for autonomous driving, contain a large variety of scenes and hence features. However, depending on the task, subsets, containing certain features more densely, sup- port training better than others. For example, training net- works on tasks such as image segmentation, bounding box detection or tracking requires an ample amount of objects in the input data. When training a network to perform optical flow estimation from first-person video, over-proportionally many straight driving scenes in the training data may lower generalization to turns. Even though some scenes of the BDD-V dataset are labeled with scene, weather or time of day information, these may be too coarse to filter the dataset best for a particular training task. Furthermore, even defin- ing an exhaustive list of good label-types is complicated as it requires choosing the most relevant concepts of the natu- ral world for a task. Alternatively, we investigate how to use examples of desired data to retrieve more similar data from a large-scale dataset. Following the paradigm of ”I know it when I see it”, we present a deep learning approach to use driving examples for retrieving similar scenes from the BDD-V dataset. Our method leverages only automatically collected labels. We show how we can reliably vary time of the day or objects in our query examples and retrieve near- est neighbors from the dataset. Using this method, already collected data can be filtered to remove bias from a dataset, removing scenes regarded too redundant to train on.

Keywords

scene retrieval, feature learning