This fork adds Wayland support, evdev global hotkeys, and related Linux/service fixes.
A simple push-to-talk voice dictation tool for Linux using faster-whisper. Hold a key to record, release to transcribe, and it automatically copies to clipboard and types into the active input.
- Python 3.10+
- Poetry
- Linux with ALSA audio
- X11, or Wayland with access to the
inputgroup for global hotkeys
- Ubuntu / Pop!_OS / Debian (apt)
- Fedora (dnf)
- Arch Linux (pacman)
- openSUSE (zypper)
git clone https://github.com/ksred/soupawhisper.git
cd soupawhisper
chmod +x install.sh
./install.shThe installer will:
- Detect your package manager
- Install system dependencies
- Install Python dependencies via Poetry
- Set up the config file
- Optionally install as a systemd service
# Ubuntu/Debian
sudo apt install alsa-utils xclip xdotool libnotify-bin
# Fedora
sudo dnf install alsa-utils xclip xdotool libnotify
# Arch
sudo pacman -S alsa-utils xclip xdotool libnotify
# Then install Python deps
poetry installFor NVIDIA GPU acceleration, install cuDNN 9:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install libcudnn9-cuda-12Then edit ~/.config/soupawhisper/config.ini:
device = cuda
compute_type = float16For AMD GPU acceleration, install ROCm and a ROCm-enabled build of ctranslate2, then edit ~/.config/soupawhisper/config.ini:
device = amd
compute_type = float16device = amd, device = rocm, and device = hip are accepted aliases in SoupaWhisper. Internally, these map to the CTranslate2 GPU backend, which requires a ROCm-capable install on AMD hardware.
poetry run python dictate.py- Hold F12 to record
- Release to transcribe → copies to clipboard and types into active input
- Press Ctrl+C to quit (when running manually)
- On media-key-first keyboards, you may need Fn+F12 unless you switch the top row to function-key mode
- Hotkeys are best configured from
make debug-keysoutput using exactKEY_*names in.env - To inspect what key your system is actually sending, run
poetry run python dictate.py --debug-keys
The installer can set this up automatically. If you skipped it, run:
./install.sh # Select 'y' when prompted for systemdsystemctl --user start soupawhisper # Start
systemctl --user stop soupawhisper # Stop
systemctl --user restart soupawhisper # Restart
systemctl --user status soupawhisper # Status
journalctl --user -u soupawhisper -f # View logsEdit ~/.config/soupawhisper/config.ini:
[whisper]
# Model size: tiny.en, base.en, small.en, medium.en, large-v3
model = base.en
# Device: cpu, auto, cuda/nvidia, or amd/rocm
device = cpu
# Compute type: int8 for CPU, float16 for GPU
compute_type = int8
[hotkey]
# Optional fallback when .env is not used. Prefer KEY_* names from --debug-keys.
key = f12
[behavior]
# Type text into active input field
auto_type = true
# Show desktop notification
notifications = trueCreate the config directory and file if it doesn't exist:
mkdir -p ~/.config/soupawhisper
# ./ is '/path/to/soupawhisper/'
cp ./config.example.ini ~/.config/soupawhisper/config.iniTo override hotkeys from the repo checkout instead, create .env from .env.example:
cp .env.example .envExample:
SOUPAWHISPER_KEYS=KEY_F12,KEY_LEFTCTRL+KEY_SPACEWhen .env is present, SOUPAWHISPER_KEYS overrides [hotkey] key from ~/.config/soupawhisper/config.ini.
No audio recording:
# Check your input device
arecord -l
# Test recording
arecord -d 3 test.wav && aplay test.wavPermission issues with keyboard:
sudo usermod -aG input $USER
# Then log out completely and back in before restarting the serviceWayland notes:
make debug-keysUse this to find the exact key names your keyboard is sending, then paste them into .env.
Examples:
SOUPAWHISPER_KEYS=KEY_F12
SOUPAWHISPER_KEYS=KEY_LEFTCTRL+KEY_SPACE
SOUPAWHISPER_KEYS=KEY_F12,KEY_LEFTCTRL+KEY_SPACEOn Wayland, SoupaWhisper only watches keyboard events for the configured hotkey. It does not grab or replay your keyboard input, because partial grabs can leave mismatched key press/release state behind.
On Wayland, clipboard copy should still work, but xdotool auto-typing may not work in native Wayland apps.
cuDNN errors with GPU:
Unable to load any of {libcudnn_ops.so.9...}
Install cuDNN 9 (see GPU Support section above) or switch to CPU mode.
AMD ROCm errors with GPU:
If device = amd fails at startup, the most common cause is that ctranslate2 is still using the default non-ROCm wheel or ROCm is not installed on the system. In that case, switch back to device = cpu until the ROCm stack is installed.
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny.en | ~75MB | Fastest | Basic |
| base.en | ~150MB | Fast | Good |
| small.en | ~500MB | Medium | Better |
| medium.en | ~1.5GB | Slower | Great |
| large-v3 | ~3GB | Slowest | Best |
For dictation, base.en or small.en is usually the sweet spot.