Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

The Human Creativity Benchmark

Modern AI evaluation frameworks treat evaluator disagreement as noise to be resolved. In creative domains, professional disagreement reflects genuine differences in taste, not measurement error. We argue that evaluating creative AI requires preserving two distinct signals: convergence, where professionals align around shared best practices, and divergence, where individual taste legitimately varies. We present the Human Creativity Benchmark (HCB), a benchmark that operationalizes this separation by collecting pairwise preferences, scalar ratings on prompt adherence, usability, and visual appea

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

Implements

Covers (incoming)

Related across the graph

Topics