paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

SHOVIR: A Benchmark for Evaluating Vision Shortcut Learning in Radiology Report Generation

Current evaluation protocols for Vision-Language Models (VLMs) in Radiology Report Generation (RRG) rely on report-level metrics that measure lexical overlap or aggregate clinical correctness. However, such metrics do not test whether individual diagnostic statements stem from the actual pathological evidence visible in the image. This allows models to achieve competitive scores by exploiting learned priors or spurious correlations, a failure mode we refer to as vision shortcut. We introduce SHOVIR, a benchmark for evaluating vision shortcut behavior in RRG. SHOVIR extends two spatially annota

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

repovlm-starter

Has model

modelVioletVision-3B

Related across the graph

modelVioletVision-3B repovlm-starter

Topics

cs.CV