Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads
In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value (OV) circuit, the very mechanism that carries non-literal retrieval. We introduce Logit-Contribution
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorAryo Pradipta Gema →
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads
- Linked via arxiv authorBeatrice Alex →
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads
- Linked via arxiv authorPasquale Minervini →
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads
