paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value (OV) circuit, the very mechanism that carries non-literal retrieval. We introduce Logit-Contribution

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorAryo Pradipta Gema →
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads
Linked via arxiv authorBeatrice Alex →
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads
Linked via arxiv authorPasquale Minervini →
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads