Stella X. Yu : Papers / Google Scholar

Test-Time Canonicalization by Foundation Models for Robust Perception

Utkarsh Singhal and Ryan Feng and Stella X. Yu and Atul Prakash

International Conference on Machine Learning, Vancouver, Canada, 13-19 July 2025

Paper | Slides | Code | arXiv

Abstract

Real-world visual perception requires invariance to diverse transformations, yet current methods rely heavily on specialized architectures or training on predefined augmentations, limiting generalization. We propose FoCal, a test-time, data-driven framework that achieves robust perception by leveraging internet-scale visual priors from foundation models. By generating and optimizing candidate transformations toward visually typical, ``canonical'' views, FoCal enhances robustness without retraining or architectural changes. Experiments demonstrate improved robustness of CLIP and SAM across challenging transformations, including 2D/3D rotations, illumination shifts (contrast and color), and day-night variations. We also highlight potential applications in active vision. Our approach challenges the assumption that transform-specific training is necessary, instead offering a scalable path to invariance. Our code is available at: \href{https://github.com/sutkarsh/focal}{https://github.com/sutkarsh/focal}.

Keywords

canonicalization, test-time optimization, robust perception