newsReddit r/MachineLearningTrust 72 · CommunityPublished yesterdayLive · yesterday
Something keeps turning up in my prompt injection detection logs that I didn't expect. Curious if others doing LLM security work have seen it. [D]
Six months ago I put a rate limiter on the detection API and started logging every call that came back with a high adversarial confidence score. Expected bots, testing scripts, the usual. What I didn't expect was how many inputs look completely clean on the surface, pass every regex you'd think to write, and still score highly in the classifier. The pattern that keeps showing up is hard to describe without sounding like I'm overselling it, so I'll be spec
