🎙 transcribe.py

A clean, terminal-based video and audio transcription tool powered by OpenAI Whisper — fully open-source, runs locally, no API keys required.

Features

🎬 Transcribe video or audio files directly from the terminal
⚡ Choose from 5 Whisper model sizes — from blazing-fast to highly accurate
📄 Export to plain text, SRT, WebVTT, or JSON (with timestamps)
🌍 Automatic language detection — no configuration needed
🔒 Runs entirely offline — your files never leave your machine
💅 Clean, interactive UI powered by Rich

Requirements

Python 3.8+
ffmpeg (system-level)

Installation

1. Clone the repo

git clone https://github.com/WiseArts/transcribe.git
cd transcribe

2. Install ffmpeg

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows (via Chocolatey)
choco install ffmpeg

3. Install Python dependencies

pip install openai-whisper rich

Note: The first time you run the tool, Whisper will automatically download the selected model weights and cache them locally. This is a one-time download per model.

Usage

python transcribe.py

The tool walks you through three steps:

File — enter the path to your video or audio file (drag & drop into the terminal works on most systems)
Model — pick a size based on how fast vs. accurate you need it
Output format — choose how you want the transcript saved

The output file is saved alongside your source file (e.g. interview.mp4 → interview.srt).

Model Options

#	Model	Speed	Quality	VRAM	Best for
1	tiny	██████████	███░░░░░░░	~1 GB	Quick drafts, fast machines
2	base	████████░░	█████░░░░░	~1 GB	Everyday use (default)
3	small	██████░░░░	███████░░░	~2 GB	Better accuracy, still fast
4	medium	████░░░░░░	█████████░	~5 GB	High quality, multilingual
5	large	██░░░░░░░░	██████████	~10 GB	Best possible accuracy

Supported File Formats

Video: .mp4 .mov .avi .mkv .webm .flv

Audio: .mp3 .wav .m4a .aac .ogg .flac

Output Formats

Format	Extension	Description
Plain text	`.txt`	Clean transcript, one line per segment
SRT	`.srt`	Subtitles with timestamps (video players, Premiere, etc.)
WebVTT	`.vtt`	Web subtitles for HTML5 `<video>` tags
JSON	`.json`	Full Whisper output with segment-level confidence data

Performance Notes

CPU vs GPU: The script uses CPU by default (fp16=False) so it works on any machine. If you have an NVIDIA GPU with CUDA, remove the fp16=False flag in the transcribe() call for a significant speedup.
Speed: As a rough guide on CPU, base transcribes roughly 4–8× real-time speed. A 10-minute video takes around 2–3 minutes.
Accuracy: Whisper performs best on clear speech with minimal background noise. The medium and large models handle accents and technical vocabulary noticeably better.

Dependencies

Package	Purpose
openai-whisper	Speech-to-text transcription
rich	Terminal UI
ffmpeg	Audio extraction from video files

License

MIT — do whatever you like with it.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙 transcribe.py

Features

Requirements

Installation

1. Clone the repo

2. Install ffmpeg

3. Install Python dependencies

Usage

Model Options

Supported File Formats

Output Formats

Performance Notes

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙 transcribe.py

Features

Requirements

Installation

1. Clone the repo

2. Install ffmpeg

3. Install Python dependencies

Usage

Model Options

Supported File Formats

Output Formats

Performance Notes

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages