paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
The Human Creativity Benchmark
Modern AI evaluation frameworks treat evaluator disagreement as noise to be resolved. In creative domains, professional disagreement reflects genuine differences in taste, not measurement error. We argue that evaluating creative AI requires preserving two distinct signals: convergence, where professionals align around shared best practices, and divergence, where individual taste legitimately varies. We present the Human Creativity Benchmark (HCB), a benchmark that operationalizes this separation by collecting pairwise preferences, scalar ratings on prompt adherence, usability, and visual appea
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
