Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

EduArt: An educational-level benchmark for evaluating art history knowledge in large language models

Large language models now score near ceiling on general benchmarks, but these aggregate measures reveal little about how models behave within single disciplines. Existing art-focused evaluations rely on synthetic questions and rarely report item-level properties. This paper introduces EduArt, an educational-level benchmark for art-historical knowledge and visual reasoning in multimodal LLMs. EduArt comprises 871 human-authored questions from Italian secondary-school exercises and US Advanced Placement Art History exams, spanning two languages and seven formats from multiple choice to in-text w

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorGianmarco Spinaci

    EduArt: An educational-level benchmark for evaluating art history knowledge in large language models

  • Linked via arxiv authorLukas Klic

    EduArt: An educational-level benchmark for evaluating art history knowledge in large language models

  • Linked via arxiv authorGiovanni Colavizza

    EduArt: An educational-level benchmark for evaluating art history knowledge in large language models

Covers

authored (incoming)

Related across the graph

Topics