PDF Text Extractor

A beautiful, 100% offline desktop app for extracting text from PDF files. Available for macOS and Windows.

Save your token usage in LLMs by converting your PDFs to .txt files. You can also combine it with my document anonymization tool for 100% anonymity on LLMs (and EU GDPR compliance).

For users — install in 30 seconds

Grab the build for your OS from the latest Release:

macOS — `PDFTextExtractor.zip`

Download PDFTextExtractor.zip and unzip it (Finder unzips automatically on double-click).
Drag PDFTextExtractor.app into your Applications folder.
The very first time you launch it: right-click the app → Open, then click Open in the dialog. (macOS shows that warning for any app that isn't notarized by Apple. It's a one-time click — after that it opens normally from Launchpad/Spotlight.)

Requires macOS 11 (Big Sur) or newer, Apple Silicon or Intel.

Windows — `PDFTextExtractor.exe`

Download PDFTextExtractor.exe.
Double-click it.
The first time, Windows SmartScreen may show "Windows protected your PC" — click More info → Run anyway. (One-time bypass for any unsigned app.)

Requires Windows 10 (1809) or newer, x64.

That's it on either platform. No Python, no Terminal, no pip install. The app is fully self-contained and runs entirely offline — your PDFs never leave your computer.

Features

Feature	Description
Drag & Drop	Drop PDFs directly onto the window
Open dialog	⌘O — open one or many PDFs at once
Multi-file sidebar	Switch between docs instantly
Page markers	Text is split by page for easy navigation
Search	⌘F — highlights all matches in yellow
Copy text	⇧⌘C — copy all extracted text to clipboard
Save as .txt	⌘S — save extracted text as a plain-text file
Remove file	⌘W — remove current doc from the list
Show in Finder	Right-click any sidebar item
Dark mode	Automatically follows system appearance
Background loading	Large PDFs load without freezing the UI

Keyboard shortcuts

Shortcut	Action
⌘O	Open PDF(s)
⌘F	Show search bar
Escape	Close search bar
⇧⌘C	Copy extracted text
⌘S	Save as .txt
⌘W	Remove current file

For developers — building the apps

The same pdf_text_extractor.py source compiles to both platforms. You only need this section if you're rebuilding the binary yourself.

macOS .app — one command

cd build_app
./build_app.sh

Outputs dist/PDFTextExtractor.app (~120 MB). First run takes ~1–2 minutes (downloads PyQt6 + PyMuPDF + PyInstaller into build_app/.venv/); rebuilds take ~20 seconds. Requires macOS 11+ with Command Line Tools and Homebrew Python 3.10–3.12.

Windows .exe — one command

On a Windows machine:

cd windows
build_app.bat

Outputs windows\dist\PDFTextExtractor.exe (single self-contained file, ~80 MB). Requires Python 3.10–3.12 from python.org with "Add python.exe to PATH" checked. See windows/README.md for details.

Don't have a Windows machine? Push a tag like v2.1.0 and the build-windows.yml GitHub Actions workflow builds the .exe on a Windows runner and attaches it to the matching Release automatically.

Run from source instead (for quick iteration)

macOS:

chmod +x run.sh
./run.sh

Windows:

pip install -r requirements.txt
python pdf_text_extractor.py

This installs PyMuPDF + PyQt6 into your active Python env and launches the script directly — handy while editing pdf_text_extractor.py.

SwiftUI / Xcode version (alternative — fully native)

The SwiftUI_Xcode/ folder contains a SwiftUI + PDFKit version of the app. PDFKit ships with macOS, so this version has zero runtime dependencies and the resulting binary is only ~2–5 MB.

To build it:

Open Xcode → File → New → Project → macOS → App
Name it PDFTextExtractor, set Interface to SwiftUI, Language to Swift
Delete the auto-generated ContentView.swift
Drag all four .swift files from SwiftUI_Xcode/ into the project
Press ⌘R

Requires Xcode 14+ and macOS 13+.

How the Python version works

PDF text extraction uses PyMuPDF (the fitz library) — one of the fastest and most accurate PDF parsing libraries available. It pulls the actual text layer embedded in the PDF, so no OCR is needed for normal digital PDFs. Scanned-only PDFs without an embedded text layer will show "(No extractable text found)".

The SwiftUI version uses Apple's PDFKit framework, which ships with macOS and does the same thing natively.

Distributing to clients

The release pipeline is intentionally boring. To cut a new release covering both platforms:

Update the version in build_app/PDFTextExtractor.spec, windows/version_info.txt, and CHANGELOG.md.

Build and zip the macOS app:

cd build_app && ./build_app.sh
cd ../dist && zip -ry PDFTextExtractor.zip PDFTextExtractor.app

Tag and push:

git tag -a v2.1.1 -m "Release v2.1.1"
git push origin v2.1.1

GitHub Actions automatically builds PDFTextExtractor.exe on a Windows runner and attaches it to the v2.1.1 Release.

Upload the macOS zip to the same Release:

gh release upload v2.1.1 dist/PDFTextExtractor.zip

Clients then grab whichever file matches their OS from the Releases page.

Want to skip the SmartScreen / right-click→Open step on clients' machines? That requires code signing — Apple notarization for macOS (~~$99/yr Developer account) and an OV/EV code-signing certificate for Windows (~~$200–400/yr). Both build scripts have clearly marked spots to plug those in.

Repo layout

ChatReadyPDF/
├── pdf_text_extractor.py            ← Shared cross-platform app source
├── requirements.txt                 ← PyMuPDF + PyQt6
├── run.sh                           ← Run from source on macOS/Linux
├── AppIcon.png                      ← Source icon (used by both platforms)
├── ui_preview.png                   ← Screenshot for the README
├── generate_assets.py               ← Helper for regenerating screenshots
│
├── build_app/                       ← macOS build
│   ├── PDFTextExtractor.spec        ← PyInstaller config (.app bundle)
│   ├── build_app.sh                 ← One-command builder
│   ├── AppIcon.icns                 ← Generated by build_app.sh (gitignored)
│   └── .venv/                       ← Build venv (gitignored)
│
├── windows/                         ← Windows build
│   ├── PDFTextExtractor_win.spec    ← PyInstaller config (single .exe)
│   ├── version_info.txt             ← VERSIONINFO resource for the .exe
│   ├── build_app.bat                ← One-command builder
│   ├── AppIcon.ico                  ← Multi-resolution Windows icon
│   └── README.md                    ← Windows-specific build notes
│
├── .github/workflows/
│   └── build-windows.yml            ← Auto-builds .exe on tag push
│
├── dist/                            ← macOS build output (gitignored)
│   └── PDFTextExtractor.app
│
└── SwiftUI_Xcode/                   ← Alternative native macOS version
    ├── PDFTextExtractorApp.swift
    ├── DocumentStore.swift
    ├── ContentView.swift
    ├── SidebarView.swift
    └── TextDetailView.swift

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Text Extractor

For users — install in 30 seconds

macOS — `PDFTextExtractor.zip`

Windows — `PDFTextExtractor.exe`

Features

Keyboard shortcuts

For developers — building the apps

macOS .app — one command

Windows .exe — one command

Run from source instead (for quick iteration)

SwiftUI / Xcode version (alternative — fully native)

How the Python version works

Distributing to clients

Repo layout

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
SwiftUI_Xcode		SwiftUI_Xcode
build_app		build_app
windows		windows
.gitignore		.gitignore
AppIcon.png		AppIcon.png
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
generate_assets.py		generate_assets.py
pdf_text_extractor.py		pdf_text_extractor.py
requirements.txt		requirements.txt
run.sh		run.sh
ui_preview.png		ui_preview.png

Folders and files

Latest commit

History

Repository files navigation

PDF Text Extractor

For users — install in 30 seconds

macOS — PDFTextExtractor.zip

Windows — PDFTextExtractor.exe

Features

Keyboard shortcuts

For developers — building the apps

macOS .app — one command

Windows .exe — one command

Run from source instead (for quick iteration)

SwiftUI / Xcode version (alternative — fully native)

How the Python version works

Distributing to clients

Repo layout

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

macOS — `PDFTextExtractor.zip`

Windows — `PDFTextExtractor.exe`

Packages