Scan files for PHI (Protected Health Information) patterns and replace them with deterministic pseudonyms. Integrates seamlessly with pre-commit hooks.
pip install shred-guard
# or with uv
uv add shred-guardRun the interactive setup wizard:
shredguard initThis walks you through:
- Selecting PHI patterns to detect (SSNs, emails, MRNs, custom patterns)
- Configuring file restrictions
- Setting up pre-commit hooks
Interactive setup wizard. Creates your configuration and optionally sets up pre-commit integration.
Scan for PHI patterns:
shredguard check . # Scan current directory
shredguard check data/ notes.txt # Scan specific pathsOutput uses ruff-style formatting:
patient_notes.txt:1:9: SG001 Subject ID [SUB-1234]
patient_notes.txt:2:6: SG002 SSN [123-45-6789]
Replace PHI with pseudonyms:
shredguard fix . # Replace with REDACTED-0, REDACTED-1, ...
shredguard fix --prefix ANON . # Custom prefix: ANON-0, ANON-1, ...
shredguard fix --output-map mapping.json . # Save original -> pseudonym mappingReplacements are deterministic: and the same value always gets the same pseudonym within a run.
Scan every commit on every local branch for PHI patterns:
shredguard audit # Audit all local branches
shredguard audit --include-remotes # Also scan remote-tracking branches
shredguard audit --output report.json # Custom output file pathConfiguration and .gitignore are locked to the current working-tree state so results are reproducible. The config and .gitignore files must have no uncommitted changes before running.
Output is written to a timestamped JSON file (shredguard-audit-<timestamp>.json) containing per-commit match details, branch info, and a summary. Exits with code 1 if any commits contain matches.
Configuration lives in pyproject.toml (or /*/*.toml set with --config):
[tool.shredguard]
[[tool.shredguard.patterns]]
regex = "SUB-\\d{4,6}"
description = "Subject ID"
[[tool.shredguard.patterns]]
regex = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
description = "SSN"Each pattern can optionally include files and exclude_files globs to control which files are scanned.
Add to .pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: shredguard-check
name: shredguard check
entry: shredguard check
language: system
types: [text]Or let shredguard init set this up for you.
shredguard check [OPTIONS] [FILES]...
| Option | Description |
|---|---|
--all-files |
Scan all files recursively |
--no-gitignore |
Don't respect .gitignore patterns |
--config PATH |
Path to config file |
-v, --verbose |
Show verbose output (skipped files, etc.) |
shredguard fix [OPTIONS] [FILES]...
| Option | Description |
|---|---|
--prefix TEXT |
Prefix for pseudonyms (default: REDACTED) |
--output-map PATH |
Write JSON mapping of originals to pseudonyms |
--all-files |
Scan all files recursively |
--no-gitignore |
Don't respect .gitignore patterns |
--config PATH |
Path to config file |
-v, --verbose |
Show verbose output |
shredguard audit [OPTIONS]
| Option | Description |
|---|---|
--include-remotes |
Also scan remote-tracking branches |
--output PATH |
Path for audit JSON output (default: shredguard-audit-<timestamp>.json) |
--no-gitignore |
Don't respect .gitignore patterns |
--config PATH |
Path to config file |
-v, --verbose |
Show verbose output (skipped binary files, etc.) |
[[tool.shredguard.patterns]]
regex = "SUB-\\d{4,6}" # Required: regex pattern
description = "Subject ID" # Optional: shown in output
files = ["*.csv", "data/**"] # Optional: only scan matching files
exclude_files = ["*_test.*"] # Optional: skip matching filesWhen running shredguard init, you can choose from these common patterns:
| Pattern | Description |
|---|---|
SUB-\d{4,6} |
Subject ID |
\b\d{3}-\d{2}-\d{4}\b |
Social Security Number |
MRN\d{6,10} |
Medical Record Number |
[email pattern] |
Email addresses |
[phone pattern] |
Phone numbers (10 digits) |
\b\d{5}(?:-\d{4})?\b |
ZIP codes |
| Code | Meaning |
|---|---|
0 |
Success (no matches found for check) |
1 |
Matches found or error |
Binary files are automatically detected and skipped (null byte check in first 8KB). Use --verbose to see skipped files.