Paper·arxiv.org
machine-learningcontent-creationresearchembeddingsevaluation
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Tackle 'typicality bias' in Text-to-Image (T2I) models that produce similar outputs for the same prompt. Implement 'On-the-fly Repulsion' within Diffusion Transformers' contextual space to promote rich visual diversity and unlock more creative AI applications.
advanced1 day5 steps
The play
- Identify Typicality Bias in T2I OutputsAnalyze your current Text-to-Image model's outputs for a given prompt. Observe if it consistently produces visually similar results, indicating a 'typicality bias' that limits creative range.
- Understand On-the-fly Repulsion PrincipleGrasp the core concept of introducing a repulsion mechanism *during* the generation process. This mechanism aims to push model outputs away from each other in the latent space, actively promoting diversity.
- Pinpoint Contextual Space for InterventionDetermine the specific 'contextual space' within your Diffusion Transformer (e.g., latent embeddings, attention mechanisms, or feature maps) where a repulsive force can be effectively applied to influence output variety.
- Implement a Latent Repulsion MechanismDevelop and integrate code that applies a repulsive force to latent representations or intermediate features during the diffusion process. This prevents the model from converging to overly typical or expected outputs.
- Evaluate and Quantify Output DiversityGenerate multiple images with your modified model using the same prompt. Apply quantitative diversity metrics (e.g., FID, LPIPS, or perceptual distance between generated samples) to measure and confirm the improvement in visual variety.
Starter code
from diffusers import DiffusionPipeline
import torch
# Load a pre-trained stable diffusion model
# This is the base where you would integrate your 'On-the-fly Repulsion' logic
# within the model's forward pass, a custom scheduler, or a modified attention mechanism.
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.to("cuda")
# Generate an image with a prompt
prompt = "A futuristic city at sunset, highly detailed, cyberpunk style"
image = pipe(prompt).images[0]
# Save the image to observe initial diversity
image.save("futuristic_city_initial.png")Source