Local-first transcription for YouTube URLs and audio/video files (mp3/mp4/m4a/wav) with optional PDF export.
Default (best free local model): Whisper large-v3
- YouTube URL → download audio (MP3) → transcribe to TXT (+ optional PDF)
- Local media (mp3/mp4/wav/m4a) → transcribe to TXT (+ optional PDF)
- Two backends:
- faster-whisper (recommended; fast, accurate; supports CPU int8)
- openai-whisper (fallback; slower)
- Progress bar with percentage in the CLI (best-effort, duration-based via
ffprobe) - Optional glossary prompt to improve recognition of names and jargon
You need ffmpeg (and ffprobe) installed and available in PATH.
Ubuntu/Pop!_OS:
sudo apt update
sudo apt install -y ffmpegVerify:
ffmpeg -version
ffprobe -versionRecommended: Python 3.12.
Create and activate a virtualenv:
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheelInstall with YouTube + recommended backend:
pip install -e ".[youtube,faster-whisper]"Fallback backend (no faster-whisper):
pip install -e ".[youtube,whisper]"Developer tools:
pip install -e ".[dev,youtube,faster-whisper]"scribebox url "https://www.youtube.com/watch?v=VIDEO_ID" --outdir out --pdfscribebox file path/to/audio.mp3 --outdir out --pdfscribebox file audio.mp3 --language enscribebox file audio.mp3 --compute-type int8General form:
scribebox [GLOBAL OPTIONS] {url,file} ...Global options can be placed before or after the subcommand.
-
scribebox url <youtube_url>- Downloads audio with
yt-dlpand produces an MP3 inside--outdir. - Then transcribes that MP3 and writes outputs to
--outdir.
- Downloads audio with
-
scribebox file <path>- Transcribes a local media file and writes outputs to
--outdir.
- Transcribes a local media file and writes outputs to
scribebox writes:
--outdir/<input_stem>.txt--outdir/<input_stem>.pdf(only if--pdfis provided)
For url, <input_stem> is the downloaded MP3 name (usually the video id).
-
--outdir PATH- Output directory. Default:
out
- Output directory. Default:
-
--pdf- Also export a PDF in addition to TXT.
-
--language CODE- Force a language code (example:
en,pt). - If omitted, the backend may auto-detect.
- Force a language code (example:
-
--translate- Translate speech to English when supported.
-
--model NAME_OR_PATH- Model name or local path. Default:
large-v3.
- Model name or local path. Default:
-
--beam-size N- Beam search size. Default:
5. - Higher values can improve accuracy but increase runtime.
- Beam search size. Default:
-
--backend {faster-whisper,whisper}- Default:
faster-whisper.
- Default:
-
--device DEVICE- Default:
cpu. - Example:
cuda(if you have a supported setup).
- Default:
-
--compute-type TYPEfaster-whisperonly. Default:int8.- Common values:
int8,float16,float32.
-
--no-vad- Disable VAD filtering (voice activity detection).
- Useful if VAD cuts audio or if you hit dependency errors related to VAD.
-
--prompt-file PATH- Optional “initial prompt” / glossary file to bias transcription toward correct names and domain terms.
- Best practice: short list of nouns/proper nouns, no storytelling.
Example prompt.txt:
The Adjustment Bureau
David Norris
Elise Sellas
Harry Mitchell
Chairman
-
--no-progress- Disable the progress bar.
Progress notes:
- When
ffprobeis available, scribebox reads total duration and displays percentage. - If duration cannot be read, progress falls back to “seconds processed” without a reliable percent.
scribebox file transc/movie_player_david_norris.mp3 \
--model large-v3 \
--language en \
--beam-size 8 \
--no-vad \
--prompt-file prompt.txtmkdir -p transc
scribebox url "https://www.youtube.com/watch?v=VIDEO_ID" --outdir transc --pdfscribebox file audio.mp3 --no-progressscribebox file audio.mp3 --backend whisper --model large-v3 --language enRun:
uvicorn scribebox.webapp:app --reloadOpen: http://127.0.0.1:8000/
The web app provides:
- A minimal page to submit a YouTube URL
- A minimal page to upload a file
Note: the web app does not currently provide live progress. If you want progress in the browser, typical approaches include background jobs + polling, SSE, or WebSockets.
Ensure ffprobe is available:
ffprobe -versionTry disabling VAD:
scribebox file audio.mp3 --no-vadOr use the fallback backend:
pip install -e ".[youtube,whisper]"
scribebox file audio.mp3 --backend whisperMIT
---