Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Forensic Trajectory Signatures for Agent Memory Poisoning Detection

We discover a behavioral invariant in LLM agents under persistent memory poisoning: in architectures where routing information is retrieved through observable memory-tool invocations, successful attacks require calling memory_recall_fact before email_send_email, a transition that non-exfiltrating sessions rarely exhibit. Under the evaluated architecture, this invariant follows from the attack's information-retrieval dependency rather than being merely an empirical correlation, and suppressing it breaks the attack. A simple rule exploiting this invariant alone achieves AUC = 0.9563. A Random Fo

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

Covers

Covers (incoming)

Related across the graph

Topics