Towards Open-World Generation of Stereo Images and Unsupervised Matching

Washington University in St. Louis

Abstract

Stereo images are fundamental to numerous applications, including extended reality (XR) devices, autonomous driving, and robotics. Unfortunately, acquiring high-quality stereo images remains challenging due to the precise calibration requirements of dual-camera setups and the complexity of obtaining accurate, dense disparity maps. Existing stereo image generation methods typically focus on either visual quality for viewing or geometric accuracy for matching, but not both. We introduce GenStereo, a diffusion-based approach, to bridge this gap. The method includes two primary innovations (1) conditioning the diffusion process on a disparity-aware coordinate embedding and a warped input image, allowing for more precise stereo alignment than previous methods, and (2) an adaptive fusion mechanism that intelligently combines the diffusion-generated image with a warped image, improving both realism and disparity consistency. Through extensive training on 11 diverse stereo datasets, GenStereo demonstrates strong generalization ability. GenStereo achieves state-of-the-art performance in both stereo image generation and unsupervised stereo matching tasks. Our framework eliminates the need for complex hardware setups while enabling high-quality stereo image generation, making it valuable for both real-world applications and unsupervised learning scenarios.

ADE20K Dataset

COCO Dataset

Method Overview:

Given an arbitrary reference image, GenStereo generates the corresponding right-view image by enforcing constraints at three levels: input (disparity-aware coordinate and warped-image embeddings), feature (cross-view attention), and output (pixel-level loss with adaptive fusion). These constraints yield geometrically consistent and visually compelling stereo images. Our methods demonstrate state-of-the-art performance in both stereo image generation and unsupervised stereo matching.

BibTeX

@inproceedings{qiao2025genstereo, author = {Qiao, Feng and Xiong, Zhexiao and Xing, Eric and Jacobs, Nathan}, title = {Towards Open-World Generation of Stereo Images and Unsupervised Matching}, booktitle = {Proceedings of the {IEEE/CVF} International Conference on Computer Vision ({ICCV})}, year = {2025}, eprint = {2503.12720}, archiveprefix = {arXiv}, primaryclass = {cs.CV} }

Please cite our work if you find it useful.

Towards Open-World Generation of Stereo Images and Unsupervised Matching

ICCV 2025

Abstract

ADE20K Dataset

COCO Dataset

DIW Dataset

DIODE Dataset

Mapillary Dataset

KITTI 2015 Dataset - Scale Factors: 5, 10, 15, 20, 25

Method Overview:

BibTeX