This project implements an offline Text-to-Speech (TTS) and Speech-to-Text (STT) module designed for industrial public address and alert systems. The module operates completely offline, making it suitable for environments with limited or no internet connectivity, such as warehouses, production facilities, and remote sites.
The solution is built on Raspberry Pi / Orange Pi hardware (ARM architecture) but is fully cross-platform and also runs on Windows x86_64 for development and testing purposes.
Russian and English language support
Natural-sounding speech using Piper TTS models
Adjustable speech rate (length_scale parameter)
Real-time playback via PortAudio
Russian and English language recognition using NeMo CTC models
Runtime language switching (no restart required)
Voice command detection via text post-processing (KWS emulation)
Support for hotwords with Transducer models
Silero VAD – Voice Activity Detection for energy-efficient operation
GTCRN – Lightweight noise suppression (23.7K parameters)
PortAudio – Cross-platform audio capture and playback (16 kHz mono PCM)
Command-line interface (CLI) with colored prompts
Voice commands (listen, stop, repeat)
Console commands for status, language switching, and result retrieval
The executable file is located at source\build\start.exe (for Windows x86)
Team FGLπ Peter the Great Case Championship 2026