Stereo images are fundamental to numerous applications, including extended reality (XR) devices,
autonomous driving, and robotics. Unfortunately, acquiring high-quality stereo images remains challenging
due to the precise calibration requirements of dual-camera setups and the complexity of obtaining
accurate, dense disparity maps. Existing stereo image generation methods typically focus on either visual
quality for viewing or geometric accuracy for matching, but not both. We introduce GenStereo, a
diffusion-based approach, to bridge this gap. The method includes two primary innovations (1) conditioning
the diffusion process on a disparity-aware coordinate embedding and a warped input image, allowing for
more precise stereo alignment than previous methods, and (2) an adaptive fusion mechanism that
intelligently combines the diffusion-generated image with a warped image, improving both realism and
disparity consistency. Through extensive training on 11 diverse stereo datasets, GenStereo demonstrates
strong generalization ability. GenStereo achieves state-of-the-art performance in both stereo image
generation and unsupervised stereo matching tasks. Our framework eliminates the need for complex hardware
setups while enabling high-quality stereo image generation, making it valuable for both real-world
applications and unsupervised learning scenarios. The code will be made publicly available upon
acceptance.