paperarXivTrust 82 · PrimaryPublished 7d agoLive · 4d ago

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

Hybrid attention models that mix full and sliding-window attention across layers offer a promising approach to efficient long-context inference, but the critical question of \emph{which layers} should retain full attention remains unsolved. Existing methods use either fixed periodic patterns or attention-based heuristics that may not capture what matters for downstream accuracy. We propose NLL-guided layer selection, a training-free method that directly measures each layer's importance by computing the negative log-likelihood degradation on answer tokens when that layer uses sliding-window ins

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsBreakthrough in long-context efficiency announced

Implements

repoattention-zoo

Covers (incoming)

newsLooking for feedback on a small test SLM I built completely from scratch [P]

Related across the graph

newsLooking for feedback on a small test SLM I built completely from scratch [P]newsBreakthrough in long-context efficiency announced repoattention-zoo

Topics

cs.CL