SafetyPairs

This repository accompanies the research paper SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation.

Overview

What exactly makes a particular image unsafe?

Subtle changes—such as an insulting gesture or a problematic symbol—can drastically alter the safety implications of an image. Yet, most existing image safety datasets only provide coarse labels without pinpointing the specific features that drive these differences. SafetyPairs introduces a scalable framework for generating counterfactual pairs of images that differ only in features relevant to a given safety policy. By leveraging image editing models, we systematically flip an image’s safety label while keeping safety-irrelevant details unchanged.

Key Contributions

We introduce SafetyPairs pipeline, a systematic approach to isolate safety-critical image features through counterfactual generation.
Our method addresses the limitation of existing safety datasets that provide only coarse labels without identifying specific problematic elements.
The resulting datasets serves as both an evaluation resource for vision-language models and an effective data augmentation strategy for training safety classifiers.

Repository Structure

ml-safetypairs/
├── pyproject.toml                    # Python project configuration and dependencies
├── README.md                         
├── LICENSE                           # Apple Sample Code License
├── CONTRIBUTING.md                   # Contribution guidelines
├── ACKNOWLEDGEMENTS                  # Acknowledgements
└── src/
    ├── safetypairs/                  # Main Python package
    │   ├── clip/                     # CLIP models for similarity measurement
    │   ├── datasets/                 # Dataset loaders (LlavaGuard, SafetyPairs)
    │   ├── editor/                   # Image editing models (Flux-Kontext)
    │   ├── llm/                      # Language model clients (GPT)
    │   ├── pipeline/                 # Core pipeline implementation
    │   └── vlm/                      # Vision-language model clients
    ├── run_example.py                # Quick start demo script
    └── generate_safetypairs.py       # Full data generation pipeline

Code

Data Synthesis Pipeline: A scalable framework for generating counterfactual safety pairs using image editing models, enabling targeted modifications that flip safety labels while preserving safety-irrelevant details. Check src/generate_safetypairs.py.

Installation

Requirements

Python 3.11, 3.12, or 3.13
CUDA-capable GPU (for Flux-Kontext model)
OpenAI API key
HuggingFace token

Setup

git clone https://github.com/apple/ml-safetypairs.git
cd ml-safetypairs

# Install dependencies using uv
uv sync

# Activate virtual environment
source .venv/bin/activate  

# Set required environment variables
export OPENAI_KEY=your_openai_api_key
export HF_TOKEN=your_huggingface_token  # Get access to the gated black-forest-labs/FLUX.1-Kontext-dev model on HuggingFace

Quick Start

Try the example script to edit a single image:

First, prepare your image:

# Create an examples directory and add your image
mkdir -p src/examples
cp /path/to/your/image.jpg src/examples/example_source.jpg

Then run the example script:

python src/run_example.py

This will:

Load your image from src/examples/example_source.jpg
Ask you to provide a rationale for why the image is harmful
Use the SafetyPairs pipeline to edit the image to make it safe
Save the original and edited images to src/examples/output/

Example:

$ python src/run_example.py
Loaded image from src/examples/example_source.jpg

Initializing pipeline...
Processing image...
Enter the rationale for why this image is harmful: The image shows a building on fire.

✅ Success!
Edit instruction: Remove the smoke, fire, debris, and airplane so the towers appear undamaged.
Saved results:
  Original: src/examples/output/original.png
  Edited: src/examples/output/edited.png

Full Pipeline

To run the data generation pipeline on the LlavaGuard dataset:

# Set the dataset directory
export DATASET_DIRECTORY=/path/to/llavaguard

# Run the pipeline
python src/generate_safetypairs.py --save_dir /path/to/output --dataset_split test

This will:

Download the LlavaGuard dataset (if not already present)
Process each image to generate safe counterfactual pairs
Save the results to the specified output directory

License

This software and accompanying data and models have been released under the following licenses:

Code: Apple Sample Code License (ASCL)

Citation

@article{helbling2025safetypairs,
  title={SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation},
  author={Helbling, Alec and Palaskar, Shruti and Krishna, Kundan and Chau, Polo and Gatys, Leon and Cheng, Joseph Yitan},
  journal={arXiv preprint arXiv:2510.18214},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SafetyPairs

Overview

Key Contributions

Repository Structure

Code

Installation

Requirements

Setup

Quick Start

Full Pipeline

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
ACKNOWLEDGEMENTS		ACKNOWLEDGEMENTS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

SafetyPairs

Overview

Key Contributions

Repository Structure

Code

Installation

Requirements

Setup

Quick Start

Full Pipeline

License

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages