Zero-Cloud Voice-to-Text for Wayland
WayWhisper records your voice with a keyboard shortcut, transcribes it locally using whisper.cpp, and instantly types the result into your active window. No cloud, no latency, no privacy trade-offs.
Works on any modern Wayland desktop: COSMIC, Hyprland, GNOME, Sway, and more.
- Universal Wayland support — uses
wtypefor direct typing, withwl-clipboardas fallback for long text - NVIDIA CUDA acceleration — GPU-accelerated transcription for near-instant results
- Multi-language — pass any Whisper language code as an argument, or set a default at install time
- Translation mode — speak in one language, receive output in English (e.g.
waywhisper de:en) - Smart notifications — animated spinner while recording; notification closes instantly via DBus when typing starts
- PipeWire concurrent safety — records cleanly alongside other active audio streams (Zoom, Discord, etc.)
Debian, Ubuntu, or Pop!_OS (or any apt-based distro). An NVIDIA GPU is optional but recommended for fast transcription.
The included setup.sh handles everything from scratch:
git clone https://github.com/dcwigk/WayWhisper.git
cd WayWhisper
bash setup.shIt will:
- Install runtime dependencies via
apt(wtype,wl-clipboard,libnotify-bin,libglib2.0-bin,pipewire-audio-client-libraries, build tools) - Detect your GPU — if
nvidia-smiis found, CUDA is suggested automatically - Ask your hardware preference —
[1] NVIDIA GPU (CUDA)or[2] CPU only - Ask your default language — enter any Whisper language code (e.g.
en,de,fr); defaults toen - Clone and build whisper.cpp into
~/.local/opt/whisper.cpp(compilation takes ~3–5 minutes) - Download the model —
mediumfor CUDA,smallfor CPU (best fit for each) - Deploy the
waywhisperscript to~/.local/bin/waywhisperwith your chosen language and model baked in
Note
The installer is idempotent. Running it again will skip the clone step if whisper.cpp is already present, and rebuilds cleanly to ensure the correct CUDA/CPU configuration.
If you prefer to set things up yourself:
sudo apt update
sudo apt install -y cmake build-essential git \
wtype wl-clipboard libnotify-bin libglib2.0-bin \
pipewire-audio-client-libraries
# NVIDIA GPU only:
sudo apt install -y nvidia-cuda-toolkitmkdir -p ~/.local/opt && cd ~/.local/opt
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cppPick one — CUDA (NVIDIA GPU) or CPU:
# Option A: CUDA
cmake -B build -DGGML_CUDA=ON
bash ./models/download-ggml-model.sh medium
# Option B: CPU only
cmake -B build
bash ./models/download-ggml-model.sh smallThen build (takes ~3–5 minutes):
cmake --build build -j"$(nproc)" --config Releasegit clone https://github.com/dcwigk/WayWhisper.git
mkdir -p ~/.local/bin
cp WayWhisper/waywhisper ~/.local/bin/waywhisper
chmod +x ~/.local/bin/waywhisperImportant
If you chose CPU only (Option B), update the model path in the installed script:
sed -i 's|ggml-medium\.bin|ggml-small.bin|' ~/.local/bin/waywhisperMake sure ~/.local/bin is in your PATH.
mkdir -p ~/.config/waywhisper
cat > ~/.config/waywhisper/config <<EOF
# Full path to the whisper model
WHISPER_MODEL="$HOME/.local/opt/whisper.cpp/models/ggml-small.bin"
# Character limit before falling back to clipboard paste (default: 800)
# WTYPE_CHAR_LIMIT=800
EOFImportant
Use ggml-medium.bin here if you built with CUDA, ggml-small.bin for CPU.
- Press your shortcut — a spinner notification appears, recording begins
- Speak
- Press the shortcut again — transcription runs, text is typed into your active window
If no speech is detected, a brief "No speech detected" notification is shown instead.
waywhisper de # transcribe in German
waywhisper de:en # speak German, receive English text (translation mode)
waywhisper --profile deen # same, using a named profile from config
waywhisper cancel # abort recording without transcribing
waywhisper --help # show all options (lists defined profiles)Note
Translation mode only supports output in English, as this is a native constraint of whisper.cpp.
Create ~/.config/waywhisper/config to set persistent defaults:
# ~/.config/waywhisper/config
WTYPE_CHAR_LIMIT=800 # character limit before falling back to clipboard
WHISPER_MODEL="/path/to/custom/model.bin" # override the default model
WHISPER_PROMPT="Dr. Smith, JSON, API" # vocabulary/style hint passed to whisper-cli
# Named profiles — invoke with: waywhisper --profile <name>
PROFILE_deen="de:en" # speak German, receive English
PROFILE_code="en" # English, can pair with WHISPER_PROMPTWHISPER_PROMPT is passed verbatim to whisper.cpp's --prompt flag before each transcription. Use it to nudge the model toward domain-specific spelling, punctuation style, or terminology. It does not filter or rewrite output — it only biases the decoder. Keep prompts short; a handful of key terms or a one-sentence style hint is sufficient.
Profiles map a short name to any lang argument the script already accepts (LANG or LANG:en). They exist solely to give keyboard shortcut bindings a stable, readable command — waywhisper --profile deen is equivalent to waywhisper de:en. Running waywhisper --help lists all profiles currently defined in your config.
Map waywhisper (or waywhisper <lang>) to a shortcut in your desktop environment. Examples:
Settings → Keyboard → Custom Shortcuts → Add Shortcut
| Action | Command | Shortcut |
|---|---|---|
| Voice to text (English) | waywhisper en |
Super+V |
| Voice to text (German) | waywhisper de |
Super+Shift+V |
Add to ~/.config/hypr/hyprland.conf:
bind = SUPER, V, exec, waywhisper en
bind = SUPER SHIFT, V, exec, waywhisper deSettings → Keyboard → View and Customize Shortcuts → Custom Shortcuts
| Action | Command | Shortcut |
|---|---|---|
| Voice to text (English) | waywhisper en |
Super+V |
| Voice to text (German) | waywhisper de |
Super+Shift+V |
Tip
You can define as many language shortcuts as you need. WayWhisper accepts any language code supported by Whisper.
WayWhisper is a toggle script. The first invocation starts pw-record (PipeWire) in the background and writes a state file. The second invocation signals pw-record to stop, feeds the audio to whisper-cli, and types the text directly with wtype (or falls back to clipboard paste via wl-copy for longer transcriptions). The notification is recycled rather than re-created, and closed instantly over DBus once typing begins.
Inspired by the original hyprflow concept — rewritten for universal Wayland support, CUDA acceleration, and multi-language handling.