On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Tackle 'typicality bias' in Text-to-Image (T2I) models that produce similar outputs for the same prompt. Implement 'On-the-fly Repulsion' within Diffusion Transformers' contextual space to promote rich visual diversity and unlock more creative AI applications.

advanced1 day5 steps

The play

Identify Typicality Bias in T2I Outputs
Analyze your current Text-to-Image model's outputs for a given prompt. Observe if it consistently produces visually similar results, indicating a 'typicality bias' that limits creative range.
Understand On-the-fly Repulsion Principle
Grasp the core concept of introducing a repulsion mechanism *during* the generation process. This mechanism aims to push model outputs away from each other in the latent space, actively promoting diversity.
Pinpoint Contextual Space for Intervention
Determine the specific 'contextual space' within your Diffusion Transformer (e.g., latent embeddings, attention mechanisms, or feature maps) where a repulsive force can be effectively applied to influence output variety.
Implement a Latent Repulsion Mechanism
Develop and integrate code that applies a repulsive force to latent representations or intermediate features during the diffusion process. This prevents the model from converging to overly typical or expected outputs.
Evaluate and Quantify Output Diversity
Generate multiple images with your modified model using the same prompt. Apply quantitative diversity metrics (e.g., FID, LPIPS, or perceptual distance between generated samples) to measure and confirm the improvement in visual variety.

Starter code

from diffusers import DiffusionPipeline
import torch

# Load a pre-trained stable diffusion model
# This is the base where you would integrate your 'On-the-fly Repulsion' logic
# within the model's forward pass, a custom scheduler, or a modified attention mechanism.
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.to("cuda")

# Generate an image with a prompt
prompt = "A futuristic city at sunset, highly detailed, cyberpunk style"
image = pipe(prompt).images[0]

# Save the image to observe initial diversity
image.save("futuristic_city_initial.png")

Source

Paperarxiv.org