Open-source real-time digital-human pipeline: LLM, TTS, WebRTC, character voices, and pluggable model backends
中文 · Documentation · GitHub
Demos · Deployment · Quickstart · Models · Roadmap · Docs & Community
OpenTalking is an open-source orchestration framework for real-time digital-human conversations. It covers the core path of a digital-human conversational product: frontend interaction, session state, LLM replies, STT, TTS and voice selection, interruption control, subtitle events, WebRTC audio/video playback, and calls into local or remote model services.
OpenTalking is designed as a practical digital-human production stack. The WebUI, avatar and voice asset libraries, knowledge bases, memory, multi-session state, LLM / STT / TTS providers, WebRTC playback, and model backends are organized in one project. You can start with the lightweight Mock mode, connect local QuickTalk / Wav2Lip, or use OmniRT for FlashTalk, FasterLivePortrait, and other higher-quality or more complex model workflows.
- Fast trial:
mock / driverless mode, useful for validating the API, TTS, and WebRTC path before downloading video model weights. - Real-time conversation: connect
QuickTalk,Wav2Lip,FlashTalk, and other models for interactive digital-human dialogue. - Video creation and cloning: reuse FasterLivePortrait runtime for audio/text-driven video creation and camera/uploaded-video-driven video clone workflows.
- Private deployment: supports local STT/TTS, OpenAI-compatible LLMs, knowledge bases, memory, OmniRT remote inference, Docker, and distributed deployment.
More documentation:
- Documentation site: https://datascale-ai.github.io/opentalking/latest/en/
- Chinese docs: https://datascale-ai.github.io/opentalking/latest/
OpenTalking provides a Web service interface for managing the digital-human conversation pipeline. You can select or create avatars, configure voices, LLM, TTS, STT, and digital-human driver models, inspect model connection status, and validate real-time conversation, subtitles, and audio/video playback on the same page.
These demos cover three common frontend workflows: real-time conversation, video creation, and video clone.
| A. Real-time Conversation | ||
|---|---|---|
E-commerce livestreameCommerce.mp4 |
Companion charactercompanion.mp4 |
News anchornewscaster.mp4 |
| B. Video Creation | ||
|---|---|---|
Audio drivenaudio_drive.mp4 |
Text driventext_drive.mp4 |
Cloned voice drivencloned_voice_drive.mp4 |
| C. Video Clone | |
|---|---|
Realtime camera imitationcamera_drive.mp4 |
Uploaded video imitationvideo_drive.mp4 |
OpenTalking's orchestration layer (API / Worker / frontend) and digital-human synthesis backend (mock, local, direct_ws, or OmniRT) can be deployed independently. If you are new to the project, start with Mock mode to validate the full path, then switch to a real rendering model based on your GPU, model, and private-deployment requirements.
| Path | Recommended model / backend | Device reference | Best for | Details |
|---|---|---|---|---|
| Fast trial | mock |
CPU / no GPU | Validate API, LLM, TTS, WebRTC, and browser playback without downloading model weights | Quickstart |
| Entry validation | quicktalk / wav2lip |
RTX 3050 Laptop, RTX 3060, RTX 4060 | Run real video rendering for demos and deployment validation; lower the resolution on low-memory devices | QuickTalk / Wav2Lip |
| Consumer-GPU single machine | quicktalk / wav2lip / musetalk |
RTX 3090, RTX 4090 | Closer to real-time local demos, private validation, and lightweight pre-production evaluation | Model deployment |
| Fully local private path | sensevoice + local_cosyvoice + quicktalk |
RTX 3090 / 4090 or similar GPU | Run STT, TTS, and video driving with local models to reduce external dependencies | Local STT/TTS + QuickTalk |
| High-quality remote inference | flashtalk / flashhead / fasterliveportrait + OmniRT |
Multi-GPU, Ascend 910B2, remote GPU service | Multi-card, GPU/NPU, production isolation, higher visual quality, or video clone workflows | FlashTalk / FasterLivePortrait |
| Docker / production deployment | API, Web, Worker, external model services | Single GPU, remote GPU, distributed cluster | Service deployment, remote GPU, distributed runtime, and production validation | Deployment |
Choose one of the two quickstart paths first:
| Path | Use when | What you need | What it validates |
|---|---|---|---|
| Compshare image | You want to try OpenTalking before setting up dependencies or downloading model weights. | A Compshare instance created from the published image, with port 5173 open. |
WebUI, LLM replies, streaming TTS, subtitle events, WebRTC delivery, and the prebuilt image workflow. |
| Self deployment | You want to run the repo on your own machine or server, customize config, or continue into local/remote model deployment. | Python, Node.js, FFmpeg, .env provider config; real models also need GPU/runtime/model weights. |
Mock first-run path, then local QuickTalk or remote OmniRT model paths. |
If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human path before setting up everything manually, use the community image we published on Compshare:
- Image URL: https://www.compshare.cn/images/TdDwmKZUZebI
- Exposed port:
5173(WebUI; API traffic is proxied internally) - Guide: Compshare image quick experience
The image includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. After deploying an instance, open port 5173 and visit the instance URL provided by the platform. If you need to restart services manually, follow the commands in the guide.
Use this path when you want to run OpenTalking from source. Start with Mock mode if you do not want to download video model weights yet: Mock mode uses the built-in static frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.
git clone https://github.com/datascale-ai/opentalking.git
cd opentalking
uv sync --extra dev --python 3.11
source .venv/bin/activate
cp .env.example .envEdit .env and configure at least an LLM. The default TTS can use the keyless edge voice. LLM, STT, and TTS are independent providers; see Configuration and LLM / STT.
bash scripts/start_unified.sh --mockThe default frontend URL is http://localhost:5173. To specify ports:
bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280Stop services:
bash scripts/quickstart/stop_all.shAfter Mock mode works, choose a real model path based on your machine. Weight downloads, directory layout, mirrors, checks, and troubleshooting are maintained in the docs; the README keeps only the startup entrypoints:
# Local QuickTalk: consumer-GPU single-machine path
export OPENTALKING_TORCH_DEVICE=cuda:0
export OPENTALKING_QUICKTALK_ASSET_ROOT="$PWD/models/quicktalk"
export OPENTALKING_QUICKTALK_WORKER_CACHE=1
bash scripts/start_unified.sh --backend local --model quicktalk --api-port 8210 --web-port 5280
# Remote OmniRT / FlashTalk: high-quality or multi-card path
bash scripts/start_unified.sh \
--backend omnirt \
--model flashtalk \
--api-port 8210 \
--web-port 5280 \
--omnirt http://<gpu-server>:9000More entrypoints:
- QuickTalk local deployment
- Wav2Lip local deployment
- FasterLivePortrait / JoyVASA
- Video clone guide
- WebUI guide
- Docker Compose and production deployment
| Model | Input | Recommended backend | Resource guidance |
|---|---|---|---|
mock |
Reference image / static frame | mock |
No GPU required |
quicktalk |
Template video + audio | local |
CUDA GPU, RTX 3090 / 4090 recommended |
wav2lip |
Reference image / frames + audio | local / omnirt |
>= 8 GB GPU / NPU memory |
musetalk |
Full frames + audio | omnirt / local |
>= 12 GB GPU memory |
soulx-flashtalk-14b |
Portrait + audio | omnirt |
Multi-GPU / NPU |
soulx-flashhead-1.3b |
Portrait + audio | omnirt |
Multi-GPU / NPU |
fasterliveportrait |
Portrait / driving video / audio | omnirt |
Single-GPU real-time portrait paste-back, video creation, video clone |
| Model | Hardware | Input | Output | VRAM | Throughput |
|---|---|---|---|---|---|
quicktalk |
RTX 3090 | Template video + audio | 720x900 / 25fps | About 3.8 GiB | About 35 fps |
For weight downloads, Docker, troubleshooting, and model configuration, see Model deployment.
Atlas Cloud is an all-modal AI inference platform. One API gives you access to video generation, image generation, and LLMs, so you do not need to integrate multiple vendors separately. A single integration can route to 300+ curated all-modal models.
OpenTalking uses an OpenAI-compatible interface for LLMs. Point OPENTALKING_LLM_BASE_URL to https://api.atlascloud.ai/v1 to use Atlas-hosted DeepSeek / Qwen models. See LLM and STT. For budget-friendly API options, see Atlas Cloud's coding plan.
-
More natural real-time conversations Improve interruption handling, low-latency response, audio/video sync, long-session recovery, and runtime visibility.
-
Consumer-GPU multi-model path Improve asset checks, prewarm, cache reuse, low-memory parameters, and more RTX 3090 / 4090 / WSL2 benchmarks for QuickTalk / Wav2Lip / MuseTalk local paths, while filling in more FasterLivePortrait video creation and video clone measurements.
-
One-command Windows / WSL2 deployment Continue lowering the barrier for model downloads, runtime installation, environment checks, and diagnostics based on the current Windows docs and test records.
-
High-quality private deployment Improve external OmniRT inference services, multi-model endpoints, capacity scheduling, health checks, production monitoring, and GPU / NPU deployment guidance.
-
More cloud voice and multimodal providers Extend pluggable STT / TTS / LLM providers, unified frontend selection, and provider-level health checks on top of the current OpenAI-compatible, DashScope, and Xiaomi MiMo profiles.
-
Agent, memory, and platform capabilities Productize the asset library, knowledge bases, memory, multi-session scheduling, tool calling, and OpenClaw / external Agent integrations, then fill in observability, safety, licensed voices, and synthetic-content labeling.
-
2026-06-12: QuickTalk local asset fixes and Apple Silicon support Organized QuickTalk local weights, HuBERT, InsightFace paths, missing-asset checks, cache preparation, and health checks. Added Apple Silicon deployment docs for validating
quicktalk-cpuwith MPS / CPU on macOS arm64. -
2026-06-12: IndexTTS, QuickTalk, and FlashTalk video creation improvements Added local IndexTTS and OmniRT IndexTTS providers, system voices, voice preview, and voice labels. Improved the QuickTalk / IndexTTS video creation path, and added FlashTalk reference-video generation with a default reference driver.
-
2026-06-02/10: Persona Package, knowledge retrieval, and character memory Added Persona Package API / CLI / WebUI entrypoints for reusable role settings, knowledge materials, and prompts. Added LightRAG knowledge retrieval, session-level knowledge selection, a character memory panel, and BM25 / mem0 / SQLite memory providers.
-
2026-06-05: Asset library and knowledge-base workflow Extended the WebUI asset library to connect avatar assets, knowledge materials, session selection, and Agent context building. Added audio/video exports so demos, reviews, and reusable materials can stay in the same workspace.
-
2026-06-05/06: OpenAI-compatible audio providers and MuseTalk deployment updates Added OpenAI-compatible STT / TTS adapters, Xiaomi MiMo STT / TTS / voice clone profiles, frontend provider selection, and voice lists. Reworked
.env.exampleinto separate LLM / STT / TTS profile templates. Also improved MuseTalk local / OmniRT deployment docs, asset preparation scripts, and quickstart scripts. -
2026-06-04: FasterLivePortrait video creation and video clone Added the FasterLivePortrait video creation parameter panel, video clone page, custom source-asset upload, camera / uploaded-video driving input, and docs screenshots, reusing the OmniRT + FasterLivePortrait runtime path.
-
2026-06-03: Web recording exports, asset library, and video workflows Added Web recording exports, export storage, video creation entrypoints, and the asset library workspace, connecting real-time conversation, material management, and video generation.
-
2026-06-12/13: Homepage analytics, GitHub traffic, and deployment docs Added the English homepage, deployment-route presentation, site analytics, GitHub traffic statistics, chart style updates, and statistics-interval fixes. Added the WSL2 network-mode selection guide for Windows deployment and continued updating README demo videos and docs-site links.
-
Earlier foundation: real-time conversation path and backend decoupling Built the Web console, LLM conversation, TTS, subtitle events, WebRTC audio/video playback, Avatar prewarm and cache, unified audio2video runner, and pluggable
mock/local/direct_ws/omnirtmodel backends.
- Quickstart
- Models (weight downloads, mirrors, startup, validation)
- Architecture
- Configuration
- Deployment (Docker Compose, distributed deployment)
- Model adapter
- Contributing (dev environment, CLI tools, ruff / mypy / pytest)
Join the QQ community to discuss real-time digital humans, FlashTalk, OmniRT, model deployment, and product scenarios.
AI Digital Human Community · Group ID: 1103327938
OpenTalking references and benefits from excellent projects in the real-time digital-human ecosystem:

