Event Actions
Integrating Geometric Understanding in Generative Diffusion Models With Text Instructions
Abstract:
Image generative models have achieved remarkable success in image synthesis and editing tasks. However, their ability to learn geometric transformations remains limited, as they often lack intrinsic spatial awareness and struggle to generalize transformations effectively. This thesis introduces a novel framework to address these challenges by enabling generative models to explicitly learn geometric transformations, such as rotations, with text instructions. Designed for transformation-aware image editing, transformation parameters are predicted directly from learned latent representations. This approach effectively incorporates geometric reasoning into the generative process, serving as an initial exploration into the generalization of more complex geometric transformations. It opens a door for advanced applications of generative image models in fields such as medical imaging, robotics, and augmented reality, where understanding object geometry is critical.
Committee:
- Prof. Tom Fletcher, Committee Chair (ECE, CS/SEAS/UVA)
- Prof. Miaomiao Zhang, Advisor (ECE, CS/SEAS/UVA)
- Prof. Zezhou Cheng (CS/SEAS/UVA)