Method Overview:

Given an arbitrary reference image, GenStereo generates the corresponding right-view image by enforcing constraints at three levels: input (disparity-aware coordinate and warped-image embeddings), feature (cross-view attention), and output (pixel-level loss with adaptive fusion). These constraints yield geometrically consistent and visually compelling stereo images. Our methods demonstrate state-of-the-art performance in both stereo image generation and unsupervised stereo matching.