Stella X. Yu : Papers / Google Scholar

Wholly Unsupervised! Segmenting Objects by Contrast and Context

Fei Pan and Yixing Wang and Sangryul Jeon and Stella X. Yu

Neural Information Processing Systems Workshop on Space in Vision, Language, and Embodied AI, San Diego, California, 7 December 2025

Paper | Poster

Abstract

We study \emph{unsupervised whole object segmentation} - identifying complete objects, including both distinctive and less salient parts, rather than only visually prominent fragments. Existing unsupervised methods often focus on salient regions (e.g., \emph{head} but not \emph{torso}), leading to incomplete object masks. Our insight is that whole objects emerge from the interplay of \emph{part-level similarity} and \emph{contrastive context}, both \emph{within} and \emph{across} images. This enables the grouping of heterogeneous regions into coherent object segments without any supervision or predefined templates.
We propose \emph{Contrastive Contextual Grouping} (CCG) in a three-step algorithm: {\bf 1)} identify semantically similar yet visually diverse image pairs; {\bf 2)} perform co-segmentation via joint graph cuts with contrastive part-context affinity; and {\bf 3)} distill the results into a single-image segmentation model. CCG achieves state-of-the-art results across \emph{unsupervised saliency detection, object discovery, video object segmentation}, and \emph{nuclei segmentation}. Remarkably, it could even \emph{surpass} SAM2, a supervised foundation model, at segmenting whole objects from box prompts.

Keywords

unsupervised segmentation, representation learning, co-segmentation