A local command-line and desktop tool that scans PDF and JPG files for U.S. Social Security Numbers and produces permanently redacted copies.
All processing runs 100% on your machine. No data is sent to any external service.
- Detects SSNs in three formats:
123-45-6789,123 45 6789,123456789 - Replaces detected SSNs with
XXX-XX-XXXX - Supports PDF files (text-layer extraction) and JPG/JPEG images (OCR via Tesseract)
- Preserves original files — redacted copies go to a separate output folder
- Two interfaces: GUI desktop app and CLI
- Prints a per-file redaction report
Double-click the executable:
dist\SSN Redactor\SSN Redactor.exe
- Click Browse and select the folder with your PDFs/JPGs
- Click Start Redaction
- Check the
redacted_pdfssubfolder for results
python -m ssn_redactor.cli "C:\path\to\your\folder"Example output:
Processing 3 file(s) in C:\Users\you\Documents\invoices ...
[1/3] invoice_01.pdf
[2/3] scan_02.jpg
[3/3] letter_03.pdf
========================================================
SSN Redaction Report
========================================================
invoice_01.pdf 2 SSN(s) redacted
scan_02.jpg 1 SSN(s) redacted
letter_03.pdf 0 SSNs found
========================================================
Total SSNs redacted: 3
Output: C:\Users\you\Documents\invoices\redacted_pdfs
========================================================
usage: ssn-redactor [-h] [-o OUTPUT_DIR] [-v] folder
positional arguments:
folder Path to a folder containing PDF and/or JPG files.
options:
-h, --help Show this help message and exit.
-o, --output-dir Name of the output subdirectory (default: "redacted_pdfs").
-v, --version Show version and exit.
One script does everything — installs Python, Tesseract, dependencies, and builds the exe:
git clone https://github.com/codepros100-dev/pdf-ssn-redactor.git
cd pdf-ssn-redactor
setup.bat
That's it. When it finishes, your exe is at dist\SSN Redactor.exe.
If you prefer to do it yourself:
Step 1: Install Python 3.10 or newer
Download from python.org or run:
winget install Python.Python.3.12
During install, check "Add Python to PATH".
Step 2: Install Tesseract OCR (only needed for JPG files)
winget install UB-Mannheim.TesseractOCR
Step 3: Clone and enter the project
git clone https://github.com/codepros100-dev/pdf-ssn-redactor.git
cd pdf-ssn-redactor
Step 4: Install Python packages
pip install -r requirements-dev.txt
Step 5: Build the exe
build.bat
The app is output to dist\SSN Redactor\SSN Redactor.exe. Copy the entire SSN Redactor folder anywhere to use it.
pdf-ssn-redactor/
ssn_redactor/
__init__.py Package metadata and version
__main__.py Enables `python -m ssn_redactor`
engine.py Core detection and redaction logic
cli.py Command-line interface
gui.py Desktop GUI (CustomTkinter)
docs/
USAGE_GUIDE.md Step-by-step guide for non-technical users
.gitignore
setup.bat One-click setup & build (installs everything)
build.bat Build script (if dependencies are already installed)
LICENSE MIT
README.md This file
requirements.txt Runtime dependencies
requirements-dev.txt Build dependencies (adds PyInstaller)
-
PDF files: Text is extracted with pdfplumber. SSNs are located using regex. PyMuPDF applies permanent redaction annotations over each match.
-
JPG files: Tesseract OCR extracts word-level bounding boxes. SSNs are detected in the OCR text, matched back to pixel coordinates, and covered with white rectangles plus placeholder text.
-
Output: Redacted copies are saved to a
redacted_pdfssubfolder. Original files are never modified.
- No network calls. The tool works fully offline. No telemetry, no analytics.
- Original files are never modified — only copies are written to the output folder.
- SSN data is never logged. Error messages are sanitized to prevent SSN leakage.
- Input validation:
- Files over 500 MB are rejected.
- PDFs over 10,000 pages are rejected.
- Images over 100 megapixels are rejected (decompression bomb guard).
- Output directory names are validated against path traversal attacks.
- SSN regex validates area/group/serial ranges per SSA rules (rejects 000, 666, 900-999 area numbers, 00 group, 0000 serial).
MIT — see LICENSE.