-
Test-Time Canonicalization by Foundation Models for Robust Perception
-
Utkarsh Singhal and Ryan Feng and Stella X. Yu and Atul Prakash
-
International Conference on Machine Learning, Vancouver, Canada, 13-19 July 2025
-
Paper
|
Slides
|
Code
|
arXiv
-
Abstract
-
Real-world visual perception requires invariance to diverse transformations, yet current methods rely heavily on specialized architectures or training on predefined augmentations, limiting generalization. We propose FoCal, a test-time, data-driven framework that achieves robust perception by leveraging internet-scale visual priors from foundation models. By generating and optimizing candidate transformations toward visually typical, ``canonical'' views, FoCal enhances robustness without retraining or architectural changes. Experiments demonstrate improved robustness of CLIP and SAM across challenging transformations, including 2D/3D rotations, illumination shifts (contrast and color), and day-night variations. We also highlight potential applications in active vision. Our approach challenges the assumption that transform-specific training is necessary, instead offering a scalable path to invariance. Our code is available at: \href{https://github.com/sutkarsh/focal}{https://github.com/sutkarsh/focal}.
-
Keywords
-
canonicalization, test-time optimization, robust perception
|