paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

EduArt: An educational-level benchmark for evaluating art history knowledge in large language models

Large language models now score near ceiling on general benchmarks, but these aggregate measures reveal little about how models behave within single disciplines. Existing art-focused evaluations rely on synthetic questions and rarely report item-level properties. This paper introduces EduArt, an educational-level benchmark for art-historical knowledge and visual reasoning in multimodal LLMs. EduArt comprises 871 human-authored questions from Italian secondary-school exercises and US Advanced Placement Art History exams, spanning two languages and seven formats from multiple choice to in-text w

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorGianmarco Spinaci →
EduArt: An educational-level benchmark for evaluating art history knowledge in large language models
Linked via arxiv authorLukas Klic →
EduArt: An educational-level benchmark for evaluating art history knowledge in large language models
Linked via arxiv authorGiovanni Colavizza →
EduArt: An educational-level benchmark for evaluating art history knowledge in large language models

Covers

newsKnowledge Distillation of Black-Box Large Language Models

authored (incoming)

personGianmarco Spinaci personLukas Klic personGiovanni Colavizza

Related across the graph

newsKnowledge Distillation of Black-Box Large Language Models personGiovanni Colavizza personLukas Klic personGianmarco Spinaci

Topics

cs.CL