Some context: we build runtime security for AI agents at Fabraix. We kept finding that our internal red-teaming only covers so much - the attack surface for agents with real capabilities is too broad for any single team.
So we opened it up. A few things that might be interesting to folks here:
- These aren't toy prompts hiding a secret word. The agents have actual tool access and behave like production agents would.
- Guardrail evaluation runs server-side to prevent client-side tampering.
- Anyone can propose a challenge - the scenario, the agent, the objective. Community votes on what goes live next.
We're genuinely looking for people to both break things and suggest ideas for what should be tested next. The agent runtime is being open-sourced separately.
Happy to answer questions about how any of it works.
Some context: we build runtime security for AI agents at Fabraix. We kept finding that our internal red-teaming only covers so much - the attack surface for agents with real capabilities is too broad for any single team.
So we opened it up. A few things that might be interesting to folks here:
- These aren't toy prompts hiding a secret word. The agents have actual tool access and behave like production agents would.
- System prompts and challenge configs are versioned in the open: https://github.com/fabraix/playground
- Guardrail evaluation runs server-side to prevent client-side tampering.
- Anyone can propose a challenge - the scenario, the agent, the objective. Community votes on what goes live next.
We're genuinely looking for people to both break things and suggest ideas for what should be tested next. The agent runtime is being open-sourced separately.
Happy to answer questions about how any of it works.