Skip to content

datascale-ai/opentalking

OpenTalking

Open-source real-time digital-human pipeline: LLM, TTS, WebRTC, character voices, and pluggable model backends

中文 · Documentation · GitHub

License Python React FastAPI WebRTC

Visit OpenTalking Website

Demos · Deployment · Quickstart · Models · Roadmap · Docs & Community


Overview

OpenTalking is an open-source orchestration framework for real-time digital-human conversations. It covers the core path of a digital-human conversational product: frontend interaction, session state, LLM replies, STT, TTS and voice selection, interruption control, subtitle events, WebRTC audio/video playback, and calls into local or remote model services.

OpenTalking is designed as a practical digital-human production stack. The WebUI, avatar and voice asset libraries, knowledge bases, memory, multi-session state, LLM / STT / TTS providers, WebRTC playback, and model backends are organized in one project. You can start with the lightweight Mock mode, connect local QuickTalk / Wav2Lip, or use OmniRT for FlashTalk, FasterLivePortrait, and other higher-quality or more complex model workflows.

  • Fast trial: mock / driverless mode, useful for validating the API, TTS, and WebRTC path before downloading video model weights.
  • Real-time conversation: connect QuickTalk, Wav2Lip, FlashTalk, and other models for interactive digital-human dialogue.
  • Video creation and cloning: reuse FasterLivePortrait runtime for audio/text-driven video creation and camera/uploaded-video-driven video clone workflows.
  • Private deployment: supports local STT/TTS, OpenAI-compatible LLMs, knowledge bases, memory, OmniRT remote inference, Docker, and distributed deployment.

More documentation:

WebUI And Demos

OpenTalking provides a Web service interface for managing the digital-human conversation pipeline. You can select or create avatars, configure voices, LLM, TTS, STT, and digital-human driver models, inspect model connection status, and validate real-time conversation, subtitles, and audio/video playback on the same page.

OpenTalking WebUI

Demo Videos

These demos cover three common frontend workflows: real-time conversation, video creation, and video clone.

A. Real-time Conversation
E-commerce livestream
eCommerce.mp4

Companion character
companion.mp4

News anchor
newscaster.mp4

B. Video Creation
Audio driven
audio_drive.mp4

Text driven
text_drive.mp4

Cloned voice driven
cloned_voice_drive.mp4

C. Video Clone
Realtime camera imitation
camera_drive.mp4

Uploaded video imitation
video_drive.mp4

Choose A Deployment Path

OpenTalking's orchestration layer (API / Worker / frontend) and digital-human synthesis backend (mock, local, direct_ws, or OmniRT) can be deployed independently. If you are new to the project, start with Mock mode to validate the full path, then switch to a real rendering model based on your GPU, model, and private-deployment requirements.

Path Recommended model / backend Device reference Best for Details
Fast trial mock CPU / no GPU Validate API, LLM, TTS, WebRTC, and browser playback without downloading model weights Quickstart
Entry validation quicktalk / wav2lip RTX 3050 Laptop, RTX 3060, RTX 4060 Run real video rendering for demos and deployment validation; lower the resolution on low-memory devices QuickTalk / Wav2Lip
Consumer-GPU single machine quicktalk / wav2lip / musetalk RTX 3090, RTX 4090 Closer to real-time local demos, private validation, and lightweight pre-production evaluation Model deployment
Fully local private path sensevoice + local_cosyvoice + quicktalk RTX 3090 / 4090 or similar GPU Run STT, TTS, and video driving with local models to reduce external dependencies Local STT/TTS + QuickTalk
High-quality remote inference flashtalk / flashhead / fasterliveportrait + OmniRT Multi-GPU, Ascend 910B2, remote GPU service Multi-card, GPU/NPU, production isolation, higher visual quality, or video clone workflows FlashTalk / FasterLivePortrait
Docker / production deployment API, Web, Worker, external model services Single GPU, remote GPU, distributed cluster Service deployment, remote GPU, distributed runtime, and production validation Deployment

Quickstart

Choose one of the two quickstart paths first:

Path Use when What you need What it validates
Compshare image You want to try OpenTalking before setting up dependencies or downloading model weights. A Compshare instance created from the published image, with port 5173 open. WebUI, LLM replies, streaming TTS, subtitle events, WebRTC delivery, and the prebuilt image workflow.
Self deployment You want to run the repo on your own machine or server, customize config, or continue into local/remote model deployment. Python, Node.js, FFmpeg, .env provider config; real models also need GPU/runtime/model weights. Mock first-run path, then local QuickTalk or remote OmniRT model paths.

1. Compshare Image

If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human path before setting up everything manually, use the community image we published on Compshare:

The image includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. After deploying an instance, open port 5173 and visit the instance URL provided by the platform. If you need to restart services manually, follow the commands in the guide.

2. Self Deployment

Use this path when you want to run OpenTalking from source. Start with Mock mode if you do not want to download video model weights yet: Mock mode uses the built-in static frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.

git clone https://github.com/datascale-ai/opentalking.git
cd opentalking

uv sync --extra dev --python 3.11
source .venv/bin/activate
cp .env.example .env

Edit .env and configure at least an LLM. The default TTS can use the keyless edge voice. LLM, STT, and TTS are independent providers; see Configuration and LLM / STT.

bash scripts/start_unified.sh --mock

The default frontend URL is http://localhost:5173. To specify ports:

bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280

Stop services:

bash scripts/quickstart/stop_all.sh

Real Model Entrypoints

After Mock mode works, choose a real model path based on your machine. Weight downloads, directory layout, mirrors, checks, and troubleshooting are maintained in the docs; the README keeps only the startup entrypoints:

# Local QuickTalk: consumer-GPU single-machine path
export OPENTALKING_TORCH_DEVICE=cuda:0
export OPENTALKING_QUICKTALK_ASSET_ROOT="$PWD/models/quicktalk"
export OPENTALKING_QUICKTALK_WORKER_CACHE=1
bash scripts/start_unified.sh --backend local --model quicktalk --api-port 8210 --web-port 5280

# Remote OmniRT / FlashTalk: high-quality or multi-card path
bash scripts/start_unified.sh \
  --backend omnirt \
  --model flashtalk \
  --api-port 8210 \
  --web-port 5280 \
  --omnirt http://<gpu-server>:9000

More entrypoints:

Supported Models

Model Input Recommended backend Resource guidance
mock Reference image / static frame mock No GPU required
quicktalk Template video + audio local CUDA GPU, RTX 3090 / 4090 recommended
wav2lip Reference image / frames + audio local / omnirt >= 8 GB GPU / NPU memory
musetalk Full frames + audio omnirt / local >= 12 GB GPU memory
soulx-flashtalk-14b Portrait + audio omnirt Multi-GPU / NPU
soulx-flashhead-1.3b Portrait + audio omnirt Multi-GPU / NPU
fasterliveportrait Portrait / driving video / audio omnirt Single-GPU real-time portrait paste-back, video creation, video clone

Consumer-GPU Reference

Model Hardware Input Output VRAM Throughput
quicktalk RTX 3090 Template video + audio 720x900 / 25fps About 3.8 GiB About 35 fps

For weight downloads, Docker, troubleshooting, and model configuration, see Model deployment.

Cloud Model API: Atlas Cloud

Atlas Cloud

Atlas Cloud is an all-modal AI inference platform. One API gives you access to video generation, image generation, and LLMs, so you do not need to integrate multiple vendors separately. A single integration can route to 300+ curated all-modal models.

OpenTalking uses an OpenAI-compatible interface for LLMs. Point OPENTALKING_LLM_BASE_URL to https://api.atlascloud.ai/v1 to use Atlas-hosted DeepSeek / Qwen models. See LLM and STT. For budget-friendly API options, see Atlas Cloud's coding plan.

Progress And Roadmap

  • More natural real-time conversations Improve interruption handling, low-latency response, audio/video sync, long-session recovery, and runtime visibility.

  • Consumer-GPU multi-model path Improve asset checks, prewarm, cache reuse, low-memory parameters, and more RTX 3090 / 4090 / WSL2 benchmarks for QuickTalk / Wav2Lip / MuseTalk local paths, while filling in more FasterLivePortrait video creation and video clone measurements.

  • One-command Windows / WSL2 deployment Continue lowering the barrier for model downloads, runtime installation, environment checks, and diagnostics based on the current Windows docs and test records.

  • High-quality private deployment Improve external OmniRT inference services, multi-model endpoints, capacity scheduling, health checks, production monitoring, and GPU / NPU deployment guidance.

  • More cloud voice and multimodal providers Extend pluggable STT / TTS / LLM providers, unified frontend selection, and provider-level health checks on top of the current OpenAI-compatible, DashScope, and Xiaomi MiMo profiles.

  • Agent, memory, and platform capabilities Productize the asset library, knowledge bases, memory, multi-session scheduling, tool calling, and OpenClaw / external Agent integrations, then fill in observability, safety, licensed voices, and synthetic-content labeling.

Recent Progress

  • 2026-06-12: QuickTalk local asset fixes and Apple Silicon support Organized QuickTalk local weights, HuBERT, InsightFace paths, missing-asset checks, cache preparation, and health checks. Added Apple Silicon deployment docs for validating quicktalk-cpu with MPS / CPU on macOS arm64.

  • 2026-06-12: IndexTTS, QuickTalk, and FlashTalk video creation improvements Added local IndexTTS and OmniRT IndexTTS providers, system voices, voice preview, and voice labels. Improved the QuickTalk / IndexTTS video creation path, and added FlashTalk reference-video generation with a default reference driver.

  • 2026-06-02/10: Persona Package, knowledge retrieval, and character memory Added Persona Package API / CLI / WebUI entrypoints for reusable role settings, knowledge materials, and prompts. Added LightRAG knowledge retrieval, session-level knowledge selection, a character memory panel, and BM25 / mem0 / SQLite memory providers.

  • 2026-06-05: Asset library and knowledge-base workflow Extended the WebUI asset library to connect avatar assets, knowledge materials, session selection, and Agent context building. Added audio/video exports so demos, reviews, and reusable materials can stay in the same workspace.

  • 2026-06-05/06: OpenAI-compatible audio providers and MuseTalk deployment updates Added OpenAI-compatible STT / TTS adapters, Xiaomi MiMo STT / TTS / voice clone profiles, frontend provider selection, and voice lists. Reworked .env.example into separate LLM / STT / TTS profile templates. Also improved MuseTalk local / OmniRT deployment docs, asset preparation scripts, and quickstart scripts.

  • 2026-06-04: FasterLivePortrait video creation and video clone Added the FasterLivePortrait video creation parameter panel, video clone page, custom source-asset upload, camera / uploaded-video driving input, and docs screenshots, reusing the OmniRT + FasterLivePortrait runtime path.

  • 2026-06-03: Web recording exports, asset library, and video workflows Added Web recording exports, export storage, video creation entrypoints, and the asset library workspace, connecting real-time conversation, material management, and video generation.

  • 2026-06-12/13: Homepage analytics, GitHub traffic, and deployment docs Added the English homepage, deployment-route presentation, site analytics, GitHub traffic statistics, chart style updates, and statistics-interval fixes. Added the WSL2 network-mode selection guide for Windows deployment and continued updating README demo videos and docs-site links.

  • Earlier foundation: real-time conversation path and backend decoupling Built the Web console, LLM conversation, TTS, subtitle events, WebRTC audio/video playback, Avatar prewarm and cache, unified audio2video runner, and pluggable mock / local / direct_ws / omnirt model backends.

Documentation And Community

Join the QQ community to discuss real-time digital humans, FlashTalk, OmniRT, model deployment, and product scenarios.

AI digital human QQ group QR code

AI Digital Human Community · Group ID: 1103327938

Acknowledgements

OpenTalking references and benefits from excellent projects in the real-time digital-human ecosystem:

License

Apache License 2.0

About

OpenTalking: An industrial-grade open-source AI digital human framework that supports real-time conversation, private deployment, and pluggable models.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors