RapidSpeech.cpp

On-device speech AI runtime for ASR, TTS, VAD, and voice cloning. Python-simple, C++-native, GGUF-powered.

RapidSpeech.cpp runs speech recognition, text-to-speech, VAD, speaker embedding, and voice cloning on-device. It gives Python developers a simple API while keeping the runtime pure C/C++, backed by ggml and a unified GGUF model format. No cloud API, no speech server, no heavyweight Python model stack.

Python In 60 Seconds

Install

pip install rapidspeech

GPU wheels:

pip install rapidspeech-metal   # macOS / Apple Silicon
pip install rapidspeech-cuda    # Linux / NVIDIA

Text to speech

python python-api-examples/tts/tts-offline.py \
  --model /path/to/omnivoice-f16.gguf \
  --text "Hello, welcome to RapidSpeech." \
  --output output.wav

Speech to text

python python-api-examples/asr/asr-offline.py \
  --model /path/to/funasr-nano-fp16.gguf \
  --audio /path/to/audio.wav

Python API

import rapidspeech

tts = rapidspeech.tts_synthesizer("/path/to/omnivoice-f16.gguf")
tts.set_params(instruct="male, young adult", language="English", seed=42)
pcm = tts.synthesize("Hello from a native speech engine.")
sample_rate = tts.get_sample_rate()

import rapidspeech

asr = rapidspeech.asr_offline("/path/to/funasr-nano-fp16.gguf")
sample_rate = asr.get_model_meta()["audio_sample_rate"]
pcm = ...  # 1-D float32 mono PCM at sample_rate
asr.push_audio(pcm)
asr.process()
print(asr.get_text())

Why RapidSpeech.cpp

Built for the edge: run speech models locally on laptops, servers, browsers, and device-class hardware.
Python-simple, C++-native: write Python, run a C++/ggml engine underneath.
One model format: ASR, TTS, VAD, and speaker models use GGUF.
NumPy in, NumPy out: ASR takes float32 PCM; TTS returns float32 PCM.
Edge-first backends: CPU, Metal, CUDA, Vulkan, CANN, OpenCL, and WebGPU.

Performance Snapshot

Test environment: Apple M1 Pro, funasr-nano-fp16.gguf, 15s audio.

Configuration	RTF	Wall Time	Notes
CPU -t 4	0.465	12.4s	CPU-only inference
GPU -t 4	0.170	5.2s	Metal acceleration
GPU -t 4 Q4_K	0.756	-	Quantized model: GPU dequant overhead
CPU -t 4 Q4_K	0.530	-	Quantized model CPU inference, 596 MB (3.3x compression)

RTF is processing time divided by audio duration. Lower is faster; RTF < 1 is faster than real time.

Supported Today

Task	Models	Status
ASR	SenseVoice-small, FunASR-nano, X-ASR (Zipformer2, streaming)	Stable
VAD	Silero VAD, FireRedVAD	Stable
TTS	OmniVoice, OpenVoice2, Kokoro, IndexTTS-2	Active
Speaker	CAMPPlus	Stable

X-ASR — Chinese/English Zipformer2 transducer (icefall/k2). One GGUF serves both offline full-context decoding and true chunked streaming (per-layer left-context caches, sub-second partials, --chunk-len 16/32/48/96/192 fbank frames). Punctuation and casing, greedy transducer decode, runs on CPU / Metal / CUDA / Vulkan and quantizes to q4_k_m (99.5 MB).

IndexTTS-2 — expressive zero-shot voice-cloning TTS (GPT + S2Mel CFM + BigVGAN-v2 vocoder) with 4-mode emotion control (reference audio / vector / text / Qwen). See docs/index2tts.md.

In Progress

CosyVoice3, Qwen3-ASR, Qwen3-TTS.

Documentation

Python examples
Technical Notes: architecture, design tradeoffs, backends, model conversion, and binding surfaces.
Model guides:
- ASR — X-ASR (Zipformer2, streaming) · SenseVoice · FunASR-Nano
- TTS — IndexTTS-2 (voice clone + emotion) · CosyVoice3 · OmniVoice · OpenVoice2 · Kokoro
- VAD — Silero / FireRedVAD
- Speaker — CAMPPlus
Browser / WASM examples
Node.js example

Native C++ CLI

Download Models

Models are available on:

🤗 Hugging Face: https://huggingface.co/RapidAI/RapidSpeech
ModelScope: https://www.modelscope.cn/models/RapidAI/RapidSpeech

Build from Source

git clone https://github.com/RapidAI/RapidSpeech.cpp
cd RapidSpeech.cpp
git submodule sync && git submodule update --init --recursive
cmake -B build
cmake --build build --config Release

Build artifacts are located in the build/ directory:

rs-asr-offline — Offline ASR command-line tool
rs-asr-vad-online — VAD-segmented quasi-streaming ASR command-line tool
rs-asr-online — True chunked streaming ASR (X-ASR; mic or WAV, live partials)
rs-tts-offline — Offline TTS command-line tool
rs-quantize — Model quantization tool

Core Commands

Offline ASR

./build/rs-asr-offline \
  -m /path/to/funasr-nano-fp16.gguf \
  -w /path/to/audio.wav \
  -t 4 \
  --gpu true

VAD-segmented ASR

./build/rs-asr-offline \
  -m /path/to/funasr-nano-fp16.gguf \
  -v /path/to/silero_vad_v6.gguf \
  -w /path/to/audio.wav \
  -t 4 \
  --vad-threshold 0.5 \
  --silence-ms 600

Streaming ASR (X-ASR)

# WAV, real-time paced with live partials (or --fast to run as fast as possible)
./build/rs-asr-online -m /path/to/xasr-q4_k_m.gguf -w /path/to/audio.wav --chunk-len 32
# Microphone
./build/rs-asr-online -m /path/to/xasr-q4_k_m.gguf --mic --chunk-len 16

See docs/x-asr.md for the model, chunk-size / latency tradeoffs, and GGUF conversion.

Text to speech

./build/rs-tts-offline \
  -m /path/to/omnivoice-f16.gguf \
  -t "Hello, welcome to RapidSpeech!" \
  --instruct "male, young adult, moderate pitch" \
  --lang English \
  --n-steps 32 \
  -o output.wav

Quantization

./build/rs-quantize /path/to/input-f16.gguf /path/to/output-q4_k.gguf q4_k

Python

See Python examples for offline ASR, streaming ASR, offline TTS, streaming TTS, VAD, and voice cloning.

🤝 Contributing

If you are interested in the following areas, we welcome your PRs or participation in discussions:

Adapting more models to the framework.
Refining and optimizing the project architecture.
Improving inference performance.

Acknowledgements

Fun-ASR
llama.cpp
ggml
cppjieba — Chinese word segmentation
WeText — text normalization (ITN/TN)
miniaudio — single-file audio I/O
X-ASR Streaming-focused automatic speech recognition models

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.github/workflows		.github/workflows
assets		assets
cmake		cmake
docs		docs
examples		examples
ggml @ 57ea0bc		ggml @ 57ea0bc
include		include
node-api-example		node-api-example
python-api-examples		python-api-examples
rapidspeech		rapidspeech
scripts		scripts
tests		tests
third_party		third_party
wasm-examples		wasm-examples
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README-CN.md		README-CN.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RapidSpeech.cpp

Python In 60 Seconds

Install

Text to speech

Speech to text

Python API

Why RapidSpeech.cpp

Performance Snapshot

Supported Today

In Progress

Documentation

Native C++ CLI

Download Models

Build from Source

Core Commands

Python

🤝 Contributing

Acknowledgements

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

RapidSpeech.cpp

Python In 60 Seconds

Install

Text to speech

Speech to text

Python API

Why RapidSpeech.cpp

Performance Snapshot

Supported Today

In Progress

Documentation

Native C++ CLI

Download Models

Build from Source

Core Commands

Python

🤝 Contributing

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages