Skip to main content
Paper·arxiv.org
machine-learningcontent-creationresearchembeddingsevaluation

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Tackle 'typicality bias' in Text-to-Image (T2I) models that produce similar outputs for the same prompt. Implement 'On-the-fly Repulsion' within Diffusion Transformers' contextual space to promote rich visual diversity and unlock more creative AI applications.

advanced1 day5 steps
The play
  1. Identify Typicality Bias in T2I Outputs
    Analyze your current Text-to-Image model's outputs for a given prompt. Observe if it consistently produces visually similar results, indicating a 'typicality bias' that limits creative range.
  2. Understand On-the-fly Repulsion Principle
    Grasp the core concept of introducing a repulsion mechanism *during* the generation process. This mechanism aims to push model outputs away from each other in the latent space, actively promoting diversity.
  3. Pinpoint Contextual Space for Intervention
    Determine the specific 'contextual space' within your Diffusion Transformer (e.g., latent embeddings, attention mechanisms, or feature maps) where a repulsive force can be effectively applied to influence output variety.
  4. Implement a Latent Repulsion Mechanism
    Develop and integrate code that applies a repulsive force to latent representations or intermediate features during the diffusion process. This prevents the model from converging to overly typical or expected outputs.
  5. Evaluate and Quantify Output Diversity
    Generate multiple images with your modified model using the same prompt. Apply quantitative diversity metrics (e.g., FID, LPIPS, or perceptual distance between generated samples) to measure and confirm the improvement in visual variety.
Starter code
from diffusers import DiffusionPipeline
import torch

# Load a pre-trained stable diffusion model
# This is the base where you would integrate your 'On-the-fly Repulsion' logic
# within the model's forward pass, a custom scheduler, or a modified attention mechanism.
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.to("cuda")

# Generate an image with a prompt
prompt = "A futuristic city at sunset, highly detailed, cyberpunk style"
image = pipe(prompt).images[0]

# Save the image to observe initial diversity
image.save("futuristic_city_initial.png")
Source
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers — Action Pack