It concerns me that anyone with anything important to protect might trust what this paper calls "Injection detectors deployed to protect LLM agents" - Llama Guard and the like.
There are unlimited combinations of tokens that can be used to attack an LLM system. The idea that some kind of "detector" can catch them all just feels inherently absurd to me.
It concerns me that anyone with anything important to protect might trust what this paper calls "Injection detectors deployed to protect LLM agents" - Llama Guard and the like.
There are unlimited combinations of tokens that can be used to attack an LLM system. The idea that some kind of "detector" can catch them all just feels inherently absurd to me.
This is an "uh oh" moment, isn't it?