AgentThreatBench: OWASP Agentic Top 10 benchmark for indirect prompt injection and memory poisoning

## Proposal: Add AgentThreatBench to the OpenAI Evals registry

**AgentThreatBench** is an evaluation suite that operationalizes the [OWASP Top 10 for Agentic Applications (2026)](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) into executable benchmark tasks. It was recently merged into the official [UK AI Safety Institute's `inspect_evals` repository](https://github.com/UKGovernmentBEIS/inspect_evals/pull/1037).

### What it evaluates

Unlike existing evals that test chatbot-style safety, AgentThreatBench specifically targets **agentic threat vectors** — attacks that arrive through tool outputs, memory stores, and environment data rather than user prompts:

| Task | OWASP Category | Attack Vector |
|------|---------------|---------------|
| Memory Poisoning | ASI06 | Adversarial entries in RAG/memory store |
| Autonomy Hijack | ASI01 | Indirect prompt injection via `read_inbox` tool output |
| Data Exfiltration | ASI01 | Malicious payload in `lookup_customer` tool response |

### Dual-metric scoring

Each task scores on both:
- **Utility**: Did the agent complete the legitimate task?
- **Security**: Did the agent resist the attack?

An agent only "passes" if it scores 1.0 on *both* metrics — capturing the security/utility tradeoff that standard evals miss.

### How to run it today

```bash
pip install inspect_evals

# Memory poisoning against GPT-4o
inspect eval inspect_evals/agent_threat_bench_memory_poison --model openai/gpt-4o

# Autonomy hijack against GPT-4o
inspect eval inspect_evals/agent_threat_bench_autonomy_hijack --model openai/gpt-4o
```

### Resources
- **Benchmark docs**: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
- **Source code**: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench
- **OWASP standard**: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

Would love to discuss adding this to the OpenAI Evals registry or referencing it in the evals documentation as a resource for agentic security evaluation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentThreatBench: OWASP Agentic Top 10 benchmark for indirect prompt injection and memory poisoning #1668

Proposal: Add AgentThreatBench to the OpenAI Evals registry

What it evaluates

Dual-metric scoring

How to run it today

Resources

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Task	OWASP Category	Attack Vector
Memory Poisoning	ASI06	Adversarial entries in RAG/memory store
Autonomy Hijack	ASI01	Indirect prompt injection via `read_inbox` tool output
Data Exfiltration	ASI01	Malicious payload in `lookup_customer` tool response

AgentThreatBench: OWASP Agentic Top 10 benchmark for indirect prompt injection and memory poisoning #1668

Description

Proposal: Add AgentThreatBench to the OpenAI Evals registry

What it evaluates

Dual-metric scoring

How to run it today

Resources

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions