Skip to content

kestavanik/hotkey-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hotkey Whisper

Voice-to-text with a single hotkey
Press a key, speak, text appears. Local & private.

Linux NVIDIA CUDA MIT License

Hotkey Whisper runs OpenAI's Whisper model 100% locally on your GPU. No cloud. No subscription. No data leaving your machine. Just press a hotkey, speak, and watch your words appear wherever your cursor is.

Demo

How It Works

  1. Press hotkey → starts recording, text appears as you speak
  2. Speak naturally → words are transcribed in realtime
  3. Stop talking → automatically stops after 5 seconds of silence
  4. Server auto-stops after 5 minutes of inactivity (saves GPU memory)

Requirements

  • Ubuntu 22.04+ (other distros may work)
  • NVIDIA GPU with 4GB+ VRAM
  • CUDA drivers installed

Installation

git clone https://github.com/kestavanik/hotkey-whisper
cd hotkey-whisper
./install.sh

Then set up a keyboard shortcut:

  1. SettingsKeyboardCustom Shortcuts
  2. Add new shortcut with command: hotkey-whisper
  3. Assign your preferred key (I use Ctrl+Alt+W)

Usage

hotkey-whisper              # Toggle recording (bind to hotkey)
hotkey-whisper status       # Check server status
hotkey-whisper stop         # Shutdown server
hotkey-whisper log          # Watch server logs

First run downloads the Whisper model (~1.5GB) and takes ~30 seconds. After that, it's instant.

Configuration

Edit ~/.config/hotkey-whisper/config.json to customize settings:

{
  "model": "base",
  "language": "en",
  "device": "cuda",
  "compute_type": "float16",
  "idle_timeout": 300,
  "silence_timeout": 5,
  "silero_sensitivity": 0.4,
  "post_speech_silence_duration": 0.5,
  "min_length_of_recording": 0.3,
  "realtime_processing_pause": 0.1,
  "typing_delay": 10
}

Settings

Setting Default Description
model "base" Whisper model: tiny, base, small, medium, large-v3
language "en" Language code (e.g., en, es, fr, de)
device "cuda" cuda for GPU, cpu for CPU-only
compute_type "float16" float16 for speed, float32 for accuracy
idle_timeout 300 Seconds before server auto-stops (0 = never)
silence_timeout 5 Seconds of silence before recording auto-stops
typing_delay 10 Milliseconds between keystrokes

Model Comparison

Model VRAM Speed Accuracy
tiny ~1GB Fastest Basic
base ~1GB Fast Good
small ~2GB Medium Better
medium ~5GB Slower Great
large-v3 ~10GB Slowest Best

Uninstall

./uninstall.sh

Then remove your keyboard shortcut from Settings.

License

MIT

About

Voice-to-text with a single hotkey. Press a key, speak, text appears. Local & private using OpenAI Whisper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors