paperarXivTrust 82 · PrimaryPublished 5d agoLive · 3d ago

Representational Depth of Evaluation Awareness Shifts With Scale in Open-Weight Language Models

Do language models know when they are being tested? This question matters for AI safety: a model that recognises an evaluation context could alter its behaviour strategically, making downstream benchmarks harder to interpret. Using 11 models spanning Qwen 2.5, Gemma 2, and Llama 3.2, we find a systematic size-dependent shift in representational depth: in both Qwen 2.5 and Gemma 2, the layer at which evaluation-awareness is most linearly recoverable moves from late layers in smaller models to early layers in larger ones. This suggests that scale changes not only the strength of evaluation-aware

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsIdentifying Interactions at Scale for LLMs newsWhat exactly does word2vec learn?newsNew Server Hopes to Break Through AI’s “Memory Wall”

Related across the graph

newsNew Server Hopes to Break Through AI’s “Memory Wall”newsWhat exactly does word2vec learn?newsIdentifying Interactions at Scale for LLMs

Topics

cs.CL