scribebox

Local-first transcription for YouTube URLs and audio/video files (mp3/mp4/m4a/wav) with optional PDF export.

Default (best free local model): Whisper large-v3

Features

YouTube URL → download audio (MP3) → transcribe to TXT (+ optional PDF)
Local media (mp3/mp4/wav/m4a) → transcribe to TXT (+ optional PDF)
Two backends:
- faster-whisper (recommended; fast, accurate; supports CPU int8)
- openai-whisper (fallback; slower)
Progress bar with percentage in the CLI (best-effort, duration-based via ffprobe)
Optional glossary prompt to improve recognition of names and jargon

Requirements

System dependency

You need ffmpeg (and ffprobe) installed and available in PATH.

Ubuntu/Pop!_OS:

sudo apt update
sudo apt install -y ffmpeg

Verify:

ffmpeg -version
ffprobe -version

Python

Recommended: Python 3.12.

Install

Create and activate a virtualenv:

python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel

Install with YouTube + recommended backend:

pip install -e ".[youtube,faster-whisper]"

Fallback backend (no faster-whisper):

pip install -e ".[youtube,whisper]"

Developer tools:

pip install -e ".[dev,youtube,faster-whisper]"

Quick start

Transcribe a YouTube URL

scribebox url "https://www.youtube.com/watch?v=VIDEO_ID" --outdir out --pdf

Transcribe a local file

scribebox file path/to/audio.mp3 --outdir out --pdf

Force language (English)

scribebox file audio.mp3 --language en

Reduce CPU RAM usage (int8)

scribebox file audio.mp3 --compute-type int8

CLI usage

General form:

scribebox [GLOBAL OPTIONS] {url,file} ...

Global options can be placed before or after the subcommand.

Subcommands

scribebox url <youtube_url>
- Downloads audio with yt-dlp and produces an MP3 inside --outdir.
- Then transcribes that MP3 and writes outputs to --outdir.
scribebox file <path>
- Transcribes a local media file and writes outputs to --outdir.

Output files

scribebox writes:

--outdir/<input_stem>.txt
--outdir/<input_stem>.pdf (only if --pdf is provided)

For url, <input_stem> is the downloaded MP3 name (usually the video id).

Parameters (all available options)

Output

--outdir PATH
- Output directory. Default: out
--pdf
- Also export a PDF in addition to TXT.

Language and translation

--language CODE
- Force a language code (example: en, pt).
- If omitted, the backend may auto-detect.
--translate
- Translate speech to English when supported.

Model and decoding

--model NAME_OR_PATH
- Model name or local path. Default: large-v3.
--beam-size N
- Beam search size. Default: 5.
- Higher values can improve accuracy but increase runtime.

Backend selection

--backend {faster-whisper,whisper}
- Default: faster-whisper.

Performance / device

--device DEVICE
- Default: cpu.
- Example: cuda (if you have a supported setup).
--compute-type TYPE
- faster-whisper only. Default: int8.
- Common values: int8, float16, float32.

VAD

--no-vad
- Disable VAD filtering (voice activity detection).
- Useful if VAD cuts audio or if you hit dependency errors related to VAD.

Prompting

--prompt-file PATH
- Optional “initial prompt” / glossary file to bias transcription toward correct names and domain terms.
- Best practice: short list of nouns/proper nouns, no storytelling.

Example prompt.txt:

The Adjustment Bureau
David Norris
Elise Sellas
Harry Mitchell
Chairman

Progress bar

--no-progress
- Disable the progress bar.

Progress notes:

When ffprobe is available, scribebox reads total duration and displays percentage.
If duration cannot be read, progress falls back to “seconds processed” without a reliable percent.

Examples

Higher-accuracy English decode (slower)

scribebox file transc/movie_player_david_norris.mp3 \
  --model large-v3 \
  --language en \
  --beam-size 8 \
  --no-vad \
  --prompt-file prompt.txt

YouTube transcription to a dedicated folder

mkdir -p transc
scribebox url "https://www.youtube.com/watch?v=VIDEO_ID" --outdir transc --pdf

Disable progress

scribebox file audio.mp3 --no-progress

Use fallback backend

scribebox file audio.mp3 --backend whisper --model large-v3 --language en

Web app

Run:

uvicorn scribebox.webapp:app --reload

Open: http://127.0.0.1:8000/

The web app provides:

A minimal page to submit a YouTube URL
A minimal page to upload a file

Note: the web app does not currently provide live progress. If you want progress in the browser, typical approaches include background jobs + polling, SSE, or WebSockets.

Troubleshooting

No percentage / progress looks wrong

Ensure ffprobe is available:

ffprobe -version

`onnxruntime` / VAD-related issues

Try disabling VAD:

scribebox file audio.mp3 --no-vad

Or use the fallback backend:

pip install -e ".[youtube,whisper]"
scribebox file audio.mp3 --backend whisper

License

MIT

---

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
src/scribebox		src/scribebox
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

scribebox

Features

Requirements

System dependency

Python

Install

Quick start

Transcribe a YouTube URL

Transcribe a local file

Force language (English)

Reduce CPU RAM usage (int8)

CLI usage

Subcommands

Output files

Parameters (all available options)

Output

Language and translation

Model and decoding

Backend selection

Performance / device

VAD

Prompting

Progress bar

Examples

Higher-accuracy English decode (slower)

YouTube transcription to a dedicated folder

Disable progress

Use fallback backend

Web app

Troubleshooting

No percentage / progress looks wrong

onnxruntime / VAD-related issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`onnxruntime` / VAD-related issues

Packages