-
Wholly Unsupervised! Segmenting Objects by Contrast and Context
-
Fei Pan and Yixing Wang and Sangryul Jeon and Stella X. Yu
-
Neural Information Processing Systems Workshop on Space in Vision, Language, and Embodied AI, San Diego, California, 7 December 2025
-
Paper
|
Poster
-
Abstract
-
We study \emph{unsupervised whole object segmentation} - identifying complete objects, including both distinctive and less salient parts, rather than only visually prominent fragments. Existing unsupervised methods often focus on salient regions (e.g., \emph{head} but not \emph{torso}), leading to incomplete object masks. Our insight is that whole objects emerge from the interplay of \emph{part-level similarity} and \emph{contrastive context}, both \emph{within} and \emph{across} images. This enables the grouping of heterogeneous regions into coherent object segments without any supervision or predefined templates. We propose \emph{Contrastive Contextual Grouping} (CCG) in a three-step algorithm: {\bf 1)} identify semantically similar yet visually diverse image pairs; {\bf 2)} perform co-segmentation via joint graph cuts with contrastive part-context affinity; and {\bf 3)} distill the results into a single-image segmentation model. CCG achieves state-of-the-art results across \emph{unsupervised saliency detection, object discovery, video object segmentation}, and \emph{nuclei segmentation}. Remarkably, it could even \emph{surpass} SAM2, a supervised foundation model, at segmenting whole objects from box prompts.
-
Keywords
-
unsupervised segmentation, representation learning, co-segmentation
|