Skip to content

WiseArts/transcribe.py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

🎙 transcribe.py

A clean, terminal-based video and audio transcription tool powered by OpenAI Whisper — fully open-source, runs locally, no API keys required.

Python 3.8+ License: MIT


Features

  • 🎬 Transcribe video or audio files directly from the terminal
  • ⚡ Choose from 5 Whisper model sizes — from blazing-fast to highly accurate
  • 📄 Export to plain text, SRT, WebVTT, or JSON (with timestamps)
  • 🌍 Automatic language detection — no configuration needed
  • 🔒 Runs entirely offline — your files never leave your machine
  • 💅 Clean, interactive UI powered by Rich

Requirements

  • Python 3.8+
  • ffmpeg (system-level)

Installation

1. Clone the repo

git clone https://github.com/WiseArts/transcribe.git
cd transcribe

2. Install ffmpeg

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows (via Chocolatey)
choco install ffmpeg

3. Install Python dependencies

pip install openai-whisper rich

Note: The first time you run the tool, Whisper will automatically download the selected model weights and cache them locally. This is a one-time download per model.


Usage

python transcribe.py

The tool walks you through three steps:

  1. File — enter the path to your video or audio file (drag & drop into the terminal works on most systems)
  2. Model — pick a size based on how fast vs. accurate you need it
  3. Output format — choose how you want the transcript saved

The output file is saved alongside your source file (e.g. interview.mp4interview.srt).


Model Options

# Model Speed Quality VRAM Best for
1 tiny ██████████ ███░░░░░░░ ~1 GB Quick drafts, fast machines
2 base ████████░░ █████░░░░░ ~1 GB Everyday use (default)
3 small ██████░░░░ ███████░░░ ~2 GB Better accuracy, still fast
4 medium ████░░░░░░ █████████░ ~5 GB High quality, multilingual
5 large ██░░░░░░░░ ██████████ ~10 GB Best possible accuracy

Supported File Formats

Video: .mp4 .mov .avi .mkv .webm .flv

Audio: .mp3 .wav .m4a .aac .ogg .flac


Output Formats

Format Extension Description
Plain text .txt Clean transcript, one line per segment
SRT .srt Subtitles with timestamps (video players, Premiere, etc.)
WebVTT .vtt Web subtitles for HTML5 <video> tags
JSON .json Full Whisper output with segment-level confidence data

Performance Notes

  • CPU vs GPU: The script uses CPU by default (fp16=False) so it works on any machine. If you have an NVIDIA GPU with CUDA, remove the fp16=False flag in the transcribe() call for a significant speedup.
  • Speed: As a rough guide on CPU, base transcribes roughly 4–8× real-time speed. A 10-minute video takes around 2–3 minutes.
  • Accuracy: Whisper performs best on clear speech with minimal background noise. The medium and large models handle accents and technical vocabulary noticeably better.

Dependencies

Package Purpose
openai-whisper Speech-to-text transcription
rich Terminal UI
ffmpeg Audio extraction from video files

License

MIT — do whatever you like with it.

About

A clean, terminal-based video and audio transcription tool powered by OpenAI Whisper — fully open-source, runs locally, no API keys required.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages