paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
SHOVIR: A Benchmark for Evaluating Vision Shortcut Learning in Radiology Report Generation
Current evaluation protocols for Vision-Language Models (VLMs) in Radiology Report Generation (RRG) rely on report-level metrics that measure lexical overlap or aggregate clinical correctness. However, such metrics do not test whether individual diagnostic statements stem from the actual pathological evidence visible in the image. This allows models to achieve competitive scores by exploiting learned priors or spurious correlations, a failure mode we refer to as vision shortcut. We introduce SHOVIR, a benchmark for evaluating vision shortcut behavior in RRG. SHOVIR extends two spatially annota
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
