Voice Mode

GarraIA supports end-to-end voice conversation with speech-to-text and text-to-speech.

Overview

┌─────────────────────────────────────────────────────────────┐
│                    VOICE PIPELINE                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  User Audio → STT → LLM → TTS → Audio Response             │
│                                                              │
│  STT: Whisper (local or API)                               │
│  TTS: Chatterbox, Hibiki, OpenAI TTS                       │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Setup

Prerequisites

FFmpeg installed
TTS server (Chatterbox or Hibiki) for TTS
Optional: Whisper for local STT

Wizard integration (plan 0126)

When garraia init runs on a machine with an NVIDIA GPU (nvidia-smi detected) and the user opts into voice mode, the wizard pre-fills the configuration block below and prints the install instructions for both servers — but it does not auto-install the Python TTS/STT stacks. Run those commands yourself once, then start the gateway with garraia start (no --with-voice flag needed when voice.enabled is already true in config.yml):

# TTS — Chatterbox Multilingual on :7860
pip install chatterbox-tts
chatterbox-tts serve --host 127.0.0.1 --port 7860

# STT — faster-whisper-server on :9090
pip install faster-whisper-server
fwsh serve --host 127.0.0.1 --port 9090

The wizard writes voice.tts_endpoint=http://127.0.0.1:7860, voice.stt_endpoint=http://127.0.0.1:9090, voice.tts_provider=chatterbox, and voice.language=pt into the emitted config.yml. CPU-only machines skip the voice prompt entirely.

Configuration

voice:
  enabled: true
  tts_endpoint: "http://127.0.0.1:7860"  # Chatterbox/Hibiki
  stt_provider: whisper  # whisper or openai
  language: "pt"  # pt, en, es, fr, de, it, hi

TTS Providers

Chatterbox (Recommended)

Docker-based GPU TTS:

docker run -d --gpus all -p 7860:7860 ghcr.io/garraia/chatterbox:latest

Features:

Multilingual (pt, en, es, fr, de, it, hi)
GPU accelerated
Low latency

Hibiki

Alternative GPU TTS:

docker run -d --gpus all -p 7861:7860 ghcr.io/garraia/hibiki:latest

OpenAI TTS

Cloud-based TTS:

voice:
  enabled: true
  tts_provider: openai
  tts_model: "tts-1-hd"
  tts_voice: "alloy"

STT Providers

Local Whisper

voice:
  stt_provider: whisper
  whisper_model: "base"  # tiny, base, small, medium, large

OpenAI Whisper API

voice:
  stt_provider: openai
  openai_api_key: "sk-..."

Usage

Starting with Voice

garraia start --with-voice

Voice Commands

/voz or /voice - Toggle voice mode for current session
Voice responses are automatic when enabled

Telegram Voice

Send voice messages and receive voice responses automatically when voice mode is enabled.

API Endpoints

TTS Endpoint

curl -X POST http://127.0.0.1:3888/api/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, how can I help you?", "language": "en"}'

Returns audio file (WAV/MP3).

STT Endpoint

curl -X POST http://127.0.0.1:3888/api/stt \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Returns transcribed text.

Health Checks

Voice services are checked at startup:

garraia health

Output includes TTS and STT status.

Troubleshooting

TTS not responding

Check TTS server:

curl http://127.0.0.1:7860/health

Audio quality issues

Increase TTS quality setting
Check network latency to TTS server
Use local TTS (Chatterbox/Hibiki)

STT errors

Check FFmpeg installation
Verify audio format (16kHz mono recommended)
Try different Whisper model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice Mode

Overview

Setup

Prerequisites

Wizard integration (plan 0126)

Configuration

TTS Providers

Chatterbox (Recommended)

Hibiki

OpenAI TTS

STT Providers

Local Whisper

OpenAI Whisper API

Usage

Starting with Voice

Voice Commands

Telegram Voice

API Endpoints

TTS Endpoint

STT Endpoint

Health Checks

Troubleshooting

TTS not responding

Audio quality issues

STT errors

FilesExpand file tree

voice.md

Latest commit

History

voice.md

File metadata and controls

Voice Mode

Overview

Setup

Prerequisites

Wizard integration (plan 0126)

Configuration

TTS Providers

Chatterbox (Recommended)

Hibiki

OpenAI TTS

STT Providers

Local Whisper

OpenAI Whisper API

Usage

Starting with Voice

Voice Commands

Telegram Voice

API Endpoints

TTS Endpoint

STT Endpoint

Health Checks

Troubleshooting

TTS not responding

Audio quality issues

STT errors