paperarXivTrust 82 · PrimaryPublished 7d agoLive · 4d ago
PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception
We introduce PerceptionRubrics, a rubric-based evaluation framework that addresses the gap between saturated benchmark scores and real-world brittleness. Shifting evaluation from holistic semantic matching to rigorous atomic auditing, PerceptionRubrics pairs 1,038 information-dense images with over 12,000 instance-specific rubrics. These criteria are derived from golden captions constructed via a novel Circular Peer-Review consensus pipeline and then distilled into a dual-stream system of Must-Right (essential facts) and Easy-Wrong (fine-grained details) rubrics. Crucially, PerceptionRubrics i
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
