IPIBench is an empirical benchmark for studying indirect prompt injection (IPI) in LLM-based agent workflows. It focuses on how models behave when they are asked to process external content that may contain malicious embedded instructions.
The repository is designed for reproducible evaluation: fixed attack scenarios, explicit success criteria, and a scriptable execution pipeline.
Indirect prompt injection happens when an attacker places instructions inside content the model is asked to read, such as:
- webpages
- retrieved documents
- tool output
- copied text blocks
If the model treats those embedded instructions as valid commands, it can deviate from the user goal without obvious signs.
This benchmark includes:
- 100 hand-crafted attack scenarios in benchmark.json
- multiple attack goals and evasion styles per scenario
- per-scenario success indicators for automatic scoring
- defense-mode comparisons in the same evaluation run
The current runner script in this repository evaluates four defense modes:
- none
- prompt_warning
- spotlighting
- input_filter
Work-in-Progress
- README.md: project overview, setup, and reproducibility notes
- benchmark.json: 100 attack scenarios with success criteria
- run_benchmark.py: evaluation script
- benchmark_scripts/: flat per-model runners plus shared helpers (
_core.py,_google.py,_openrouter.py) - merge_results.py: merges per-model CSV outputs into one final file
- results/csv/: per-model CSV output files
- results/jsonl/: per-model JSONL output files
- analysis/analysis.ipynb: visualization and analysis notebook
- paper/draft.pdf: working paper draft
- requirements.txt: Python dependencies
Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1macOS/Linux:
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtThe benchmark runner reads API keys from environment variables.
Use .env.example as a local template and keep your real .env file private.
Minimum providers currently used by the script:
- Groq
- Google GenAI
The requirements file also includes Mistral and OpenAI packages for future extension.
Windows PowerShell (current session):
$env:GROQ_API_KEY="your_groq_key"
$env:GOOGLE_API_KEY="your_google_key"macOS/Linux (current session):
export GROQ_API_KEY="your_groq_key"
export GOOGLE_API_KEY="your_google_key"If you also run scripts under benchmark_scripts/, you may need:
ANTHROPIC_API_KEYOPENROUTER_API_KEYGITHUB_TOKEN
Use this checklist before pushing to GitHub:
- Confirm only template config is committed: .env.example is tracked, but real
.envis not. - Run a quick tracked-file secret scan:
git grep -nE '(api[_-]?key|apikey|secret|token|password)[[:space:]]*[:=][[:space:]]*["\x27][^"\x27]{8,}["\x27]|sk-[A-Za-z0-9]{20,}|ghp_[A-Za-z0-9]{20,}|github_pat_[A-Za-z0-9_]{20,}|AIza[0-9A-Za-z_-]{20,}|-----BEGIN [A-Z ]+PRIVATE KEY-----' -- .- If anything looks like a real credential, remove it and rotate that key before publishing.
- Review notebook outputs and CSV artifacts for copied credentials before commit.
Dry run (small sanity check):
python run_benchmark.py --dry-runFull run:
python run_benchmark.pyModel scripts write outputs directly to:
- results/csv/ as per-model CSV files
- results/jsonl/ as per-model JSONL files
To merge all per-model CSV files into one final table:
python merge_results.pyThis produces results/results_final.csv.
- pin package versions before final reporting
- document model names and API versions used
- log run date and hardware/network context
- keep the exact benchmark.json used for each reported result
- keep raw per-model files in results/csv/ and results/jsonl/ for auditability
The benchmark dataset can also be published on Hugging Face.
Placeholder:
huggingface.co/datasets/your-username/IPIBench
If you use this benchmark in research, cite this repository and add your formal citation after publication.
Example placeholder:
@misc{ipibench2026,
title={IPIBench: A Cross-Model Indirect Prompt Injection Benchmark},
author={Aditya L},
year={2026},
howpublished={GitHub repository}
}
Aditya L