Skip to content

dcwigk/WayWhisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

WayWhisper

Zero-Cloud Voice-to-Text for Wayland

WayWhisper records your voice with a keyboard shortcut, transcribes it locally using whisper.cpp, and instantly types the result into your active window. No cloud, no latency, no privacy trade-offs.

Works on any modern Wayland desktop: COSMIC, Hyprland, GNOME, Sway, and more.

Features

  • Universal Wayland support — uses wtype for direct typing, with wl-clipboard as fallback for long text
  • NVIDIA CUDA acceleration — GPU-accelerated transcription for near-instant results
  • Multi-language — pass any Whisper language code as an argument, or set a default at install time
  • Translation mode — speak in one language, receive output in English (e.g. waywhisper de:en)
  • Smart notifications — animated spinner while recording; notification closes instantly via DBus when typing starts
  • PipeWire concurrent safety — records cleanly alongside other active audio streams (Zoom, Discord, etc.)

Prerequisites

Debian, Ubuntu, or Pop!_OS (or any apt-based distro). An NVIDIA GPU is optional but recommended for fast transcription.

Interactive Installer

The included setup.sh handles everything from scratch:

git clone https://github.com/dcwigk/WayWhisper.git
cd WayWhisper
bash setup.sh

It will:

  1. Install runtime dependencies via apt (wtype, wl-clipboard, libnotify-bin, libglib2.0-bin, pipewire-audio-client-libraries, build tools)
  2. Detect your GPU — if nvidia-smi is found, CUDA is suggested automatically
  3. Ask your hardware preference[1] NVIDIA GPU (CUDA) or [2] CPU only
  4. Ask your default language — enter any Whisper language code (e.g. en, de, fr); defaults to en
  5. Clone and build whisper.cpp into ~/.local/opt/whisper.cpp (compilation takes ~3–5 minutes)
  6. Download the modelmedium for CUDA, small for CPU (best fit for each)
  7. Deploy the waywhisper script to ~/.local/bin/waywhisper with your chosen language and model baked in

Note

The installer is idempotent. Running it again will skip the clone step if whisper.cpp is already present, and rebuilds cleanly to ensure the correct CUDA/CPU configuration.

Manual Setup

If you prefer to set things up yourself:

1. Install dependencies

sudo apt update
sudo apt install -y cmake build-essential git \
  wtype wl-clipboard libnotify-bin libglib2.0-bin \
  pipewire-audio-client-libraries
# NVIDIA GPU only:
sudo apt install -y nvidia-cuda-toolkit

2. Build whisper.cpp

mkdir -p ~/.local/opt && cd ~/.local/opt
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

Pick one — CUDA (NVIDIA GPU) or CPU:

# Option A: CUDA
cmake -B build -DGGML_CUDA=ON
bash ./models/download-ggml-model.sh medium

# Option B: CPU only
cmake -B build
bash ./models/download-ggml-model.sh small

Then build (takes ~3–5 minutes):

cmake --build build -j"$(nproc)" --config Release

3. Clone this repository and install the script

git clone https://github.com/dcwigk/WayWhisper.git
mkdir -p ~/.local/bin
cp WayWhisper/waywhisper ~/.local/bin/waywhisper
chmod +x ~/.local/bin/waywhisper

Important

If you chose CPU only (Option B), update the model path in the installed script:

sed -i 's|ggml-medium\.bin|ggml-small.bin|' ~/.local/bin/waywhisper

Make sure ~/.local/bin is in your PATH.

4. Create the config file

mkdir -p ~/.config/waywhisper
cat > ~/.config/waywhisper/config <<EOF
# Full path to the whisper model
WHISPER_MODEL="$HOME/.local/opt/whisper.cpp/models/ggml-small.bin"

# Character limit before falling back to clipboard paste (default: 800)
# WTYPE_CHAR_LIMIT=800
EOF

Important

Use ggml-medium.bin here if you built with CUDA, ggml-small.bin for CPU.

Usage

  1. Press your shortcut — a spinner notification appears, recording begins
  2. Speak
  3. Press the shortcut again — transcription runs, text is typed into your active window

If no speech is detected, a brief "No speech detected" notification is shown instead.

waywhisper de              # transcribe in German
waywhisper de:en           # speak German, receive English text (translation mode)
waywhisper --profile deen  # same, using a named profile from config
waywhisper cancel          # abort recording without transcribing
waywhisper --help          # show all options (lists defined profiles)

Note

Translation mode only supports output in English, as this is a native constraint of whisper.cpp.

Config File

Create ~/.config/waywhisper/config to set persistent defaults:

# ~/.config/waywhisper/config

WTYPE_CHAR_LIMIT=800                       # character limit before falling back to clipboard
WHISPER_MODEL="/path/to/custom/model.bin"  # override the default model
WHISPER_PROMPT="Dr. Smith, JSON, API"      # vocabulary/style hint passed to whisper-cli

# Named profiles — invoke with: waywhisper --profile <name>
PROFILE_deen="de:en"                       # speak German, receive English
PROFILE_code="en"                          # English, can pair with WHISPER_PROMPT

WHISPER_PROMPT is passed verbatim to whisper.cpp's --prompt flag before each transcription. Use it to nudge the model toward domain-specific spelling, punctuation style, or terminology. It does not filter or rewrite output — it only biases the decoder. Keep prompts short; a handful of key terms or a one-sentence style hint is sufficient.

Profiles map a short name to any lang argument the script already accepts (LANG or LANG:en). They exist solely to give keyboard shortcut bindings a stable, readable command — waywhisper --profile deen is equivalent to waywhisper de:en. Running waywhisper --help lists all profiles currently defined in your config.

Keyboard Shortcuts

Map waywhisper (or waywhisper <lang>) to a shortcut in your desktop environment. Examples:

COSMIC

Settings → Keyboard → Custom Shortcuts → Add Shortcut

Action Command Shortcut
Voice to text (English) waywhisper en Super+V
Voice to text (German) waywhisper de Super+Shift+V

Hyprland

Add to ~/.config/hypr/hyprland.conf:

bind = SUPER, V, exec, waywhisper en
bind = SUPER SHIFT, V, exec, waywhisper de

GNOME

Settings → Keyboard → View and Customize Shortcuts → Custom Shortcuts

Action Command Shortcut
Voice to text (English) waywhisper en Super+V
Voice to text (German) waywhisper de Super+Shift+V

Tip

You can define as many language shortcuts as you need. WayWhisper accepts any language code supported by Whisper.

How It Works

WayWhisper is a toggle script. The first invocation starts pw-record (PipeWire) in the background and writes a state file. The second invocation signals pw-record to stop, feeds the audio to whisper-cli, and types the text directly with wtype (or falls back to clipboard paste via wl-copy for longer transcriptions). The notification is recycled rather than re-created, and closed instantly over DBus once typing begins.


Inspired by the original hyprflow concept — rewritten for universal Wayland support, CUDA acceleration, and multi-language handling.

About

Zero-cloud, Wayland-native speech-to-text. Records your voice via a shortcut, transcribes locally with whisper.cpp and types the result into your active window. Supports CUDA acceleration, multi-language and works on COSMIC, Hyprland, GNOME & Sway.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages