IPIBench: A Cross-Model Indirect Prompt Injection Benchmark

IPIBench is an empirical benchmark for studying indirect prompt injection (IPI) in LLM-based agent workflows. It focuses on how models behave when they are asked to process external content that may contain malicious embedded instructions.

The repository is designed for reproducible evaluation: fixed attack scenarios, explicit success criteria, and a scriptable execution pipeline.

What Is Indirect Prompt Injection?

Indirect prompt injection happens when an attacker places instructions inside content the model is asked to read, such as:

webpages
retrieved documents
tool output
copied text blocks

If the model treats those embedded instructions as valid commands, it can deviate from the user goal without obvious signs.

Benchmark Scope

This benchmark includes:

100 hand-crafted attack scenarios in benchmark.json
multiple attack goals and evasion styles per scenario
per-scenario success indicators for automatic scoring
defense-mode comparisons in the same evaluation run

The current runner script in this repository evaluates four defense modes:

none
prompt_warning
spotlighting
input_filter

Key Findings

Work-in-Progress

Repository Structure

README.md: project overview, setup, and reproducibility notes
benchmark.json: 100 attack scenarios with success criteria
run_benchmark.py: evaluation script
benchmark_scripts/: flat per-model runners plus shared helpers (_core.py, _google.py, _openrouter.py)
merge_results.py: merges per-model CSV outputs into one final file
results/csv/: per-model CSV output files
results/jsonl/: per-model JSONL output files
analysis/analysis.ipynb: visualization and analysis notebook
paper/draft.pdf: working paper draft
requirements.txt: Python dependencies

Setup

1) Create a virtual environment

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

macOS/Linux:

python3 -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install -r requirements.txt

3) Configure API keys

The benchmark runner reads API keys from environment variables. Use .env.example as a local template and keep your real .env file private.

Minimum providers currently used by the script:

Groq
Google GenAI

The requirements file also includes Mistral and OpenAI packages for future extension.

Windows PowerShell (current session):

$env:GROQ_API_KEY="your_groq_key"
$env:GOOGLE_API_KEY="your_google_key"

macOS/Linux (current session):

export GROQ_API_KEY="your_groq_key"
export GOOGLE_API_KEY="your_google_key"

If you also run scripts under benchmark_scripts/, you may need:

ANTHROPIC_API_KEY
OPENROUTER_API_KEY
GITHUB_TOKEN

Security Before Open-Sourcing

Use this checklist before pushing to GitHub:

Confirm only template config is committed: .env.example is tracked, but real .env is not.
Run a quick tracked-file secret scan:

git grep -nE '(api[_-]?key|apikey|secret|token|password)[[:space:]]*[:=][[:space:]]*["\x27][^"\x27]{8,}["\x27]|sk-[A-Za-z0-9]{20,}|ghp_[A-Za-z0-9]{20,}|github_pat_[A-Za-z0-9_]{20,}|AIza[0-9A-Za-z_-]{20,}|-----BEGIN [A-Z ]+PRIVATE KEY-----' -- .

If anything looks like a real credential, remove it and rotate that key before publishing.
Review notebook outputs and CSV artifacts for copied credentials before commit.

How To Run

Dry run (small sanity check):

python run_benchmark.py --dry-run

Full run:

python run_benchmark.py

Output Files

Model scripts write outputs directly to:

results/csv/ as per-model CSV files
results/jsonl/ as per-model JSONL files

To merge all per-model CSV files into one final table:

python merge_results.py

This produces results/results_final.csv.

Reproducibility Checklist

pin package versions before final reporting
document model names and API versions used
log run date and hardware/network context
keep the exact benchmark.json used for each reported result
keep raw per-model files in results/csv/ and results/jsonl/ for auditability

Dataset

The benchmark dataset can also be published on Hugging Face.

Placeholder:

huggingface.co/datasets/your-username/IPIBench

Citation

If you use this benchmark in research, cite this repository and add your formal citation after publication.

Example placeholder:

@misc{ipibench2026,
	title={IPIBench: A Cross-Model Indirect Prompt Injection Benchmark},
	author={Aditya L},
	year={2026},
	howpublished={GitHub repository}
}

Author

Aditya L

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IPIBench: A Cross-Model Indirect Prompt Injection Benchmark

What Is Indirect Prompt Injection?

Benchmark Scope

Key Findings

Repository Structure

Setup

1) Create a virtual environment

2) Install dependencies

3) Configure API keys

Security Before Open-Sourcing

How To Run

Output Files

Reproducibility Checklist

Dataset

Citation

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
benchmark_scripts		benchmark_scripts
paper		paper
results		results
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
benchmark.json		benchmark.json
merge_results.py		merge_results.py
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py

Folders and files

Latest commit

History

Repository files navigation

IPIBench: A Cross-Model Indirect Prompt Injection Benchmark

What Is Indirect Prompt Injection?

Benchmark Scope

Key Findings

Repository Structure

Setup

1) Create a virtual environment

2) Install dependencies

3) Configure API keys

Security Before Open-Sourcing

How To Run

Output Files

Reproducibility Checklist

Dataset

Citation

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages