Overview

This project implements an offline Text-to-Speech (TTS) and Speech-to-Text (STT) module designed for industrial public address and alert systems. The module operates completely offline, making it suitable for environments with limited or no internet connectivity, such as warehouses, production facilities, and remote sites.

The solution is built on Raspberry Pi / Orange Pi hardware (ARM architecture) but is fully cross-platform and also runs on Windows x86_64 for development and testing purposes.

Features

Speech Synthesis (TTS)

Russian and English language support

Natural-sounding speech using Piper TTS models

Adjustable speech rate (length_scale parameter)

Real-time playback via PortAudio

Speech Recognition (STT)

Russian and English language recognition using NeMo CTC models

Runtime language switching (no restart required)

Voice command detection via text post-processing (KWS emulation)

Support for hotwords with Transducer models

Audio Processing

Silero VAD – Voice Activity Detection for energy-efficient operation

GTCRN – Lightweight noise suppression (23.7K parameters)

PortAudio – Cross-platform audio capture and playback (16 kHz mono PCM)

User Interface

Command-line interface (CLI) with colored prompts

Voice commands (listen, stop, repeat)

Console commands for status, language switching, and result retrieval

Start

The executable file is located at source\build\start.exe (for Windows x86)

Team FGLπ Peter the Great Case Championship 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Features

Speech Synthesis (TTS)

Speech Recognition (STT)

Audio Processing

User Interface

Start

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Overview

Features

Speech Synthesis (TTS)

Speech Recognition (STT)

Audio Processing

User Interface

Start