paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Intermediate Text Representation Guided Text-to-Image Generation for Enhancing One-and-Only Alignment

Text-to-image (T2I) diffusion models often fail to faithfully render explicit textual descriptions, instead defaulting to strongly learned visual priors due to a phenomenon referred to as concept association bias. We show that such bias is particularly strong for one-and-only (OAO) objects, entities that exist in a single canonical form, such as celestial bodies, landmarks, and artworks. The deeply ingrained visual identity for these concepts often resists modification through prompting alone. Addressing this challenge, we first identify through an information-theoretic analysis that the final

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsDiffusionGemma: 4x faster text generation

Has model

modelDiffuse-XL modelsentence-transformers/all-MiniLM-L6-v2

Related across the graph

newsDiffusionGemma: 4x faster text generation modelsentence-transformers/all-MiniLM-L6-v2 modelDiffuse-XL

Topics

cs.CV