System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Documentation | Playground | Blog | Publications | Hugging Face
In the LLM era, the number of models is exploding. Different models vary across capability, scale, cost, and privacy boundaries. Choosing and connecting the right models to build semantic AI infrastructure is a system problem.
vLLM Semantic Router is a signal-driven intelligent router for that problem. It helps teams build model systems that are more efficient, safer, and more adaptive across cloud, data center, and edge environments.
It delivers three core values:
- Token economics: reduce wasted tokens, increase effective output, and maximize the value of every token.
- LLM safety: detect jailbreaks, sensitive leakage, and hallucinations so agents remain controllable, trustworthy, and auditable.
- Fullmesh intelligence: build personal AI at the edge and intelligent MaaS in the cloud by coordinating local, private, and frontier models across cost, privacy, and capability boundaries.
curl -fsSL https://vllm-semantic-router.com/install.sh | bashFor platform notes, detailed setup options, and troubleshooting, see the Installation Guide.
- [2026/03/24] Vision Paper Released: The Workload-Router-Pool Architecture for LLM Inference Optimization
- [2026/03/10] v0.2 Released: vLLM Semantic Router v0.2 Athena Release
- [2026/02/27] White Paper Released: Signal Driven Decision Routing for Mixture-of-Modality Models
- [2026/01/05] Iris v0.1 Released: vLLM Semantic Router v0.1 Iris: The First Major Release
- [2025/12/16] Collaboration: AMD × vLLM Semantic Router: Building the System Intelligence Together
- [2025/11/19] New Blog: Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale
- [2025/11/03] Paper Published: Category-Aware Semantic Caching for Heterogeneous LLM Workloads
- [2025/10/12] Paper Accepted: When to Reason: Semantic Router for vLLM
Earlier announcements
- [2025/12/15] New Blog: Token-Level Truth: Real-Time Hallucination Detection for Production LLMs
- [2025/10/27] New Blog: Scaling Semantic Routing with Extensible LoRA
- [2025/10/08] Collaboration: vLLM Semantic Router with vLLM Production Stack Team.
- [2025/09/01] Released the project: vLLM Semantic Router: Next Phase in LLM inference.
More announcements are available on the Blog and Publications pages.
For questions, feedback, or to contribute, please join the #semantic-router channel in vLLM Slack.
We host bi-weekly community meetings to sync with contributors across different time zones:
- First Tuesday of the month: 9:00-10:00 AM EST (accommodates US EST, EU, and Asia Pacific contributors)
- Third Tuesday of the month: 1:00-2:00 PM EST (accommodates US EST and California contributors)
- Meeting recordings: YouTube
If you want to contribute, start with CONTRIBUTING.md.
For repository-native development workflow and validation commands, use AGENTS.md as the entrypoint and docs/agent/README.md as the canonical index.
If you find Semantic Router helpful in your research or projects, please consider citing it:
@misc{semanticrouter2025,
title={vLLM Semantic Router},
author={vLLM Semantic Router Team},
year={2025},
howpublished={\url{https://github.com/vllm-project/semantic-router}},
}
We are grateful to our sponsors who support us:
AMD provides us with GPU resources and ROCm™ software for training and researching frontier router models, enhancing E2E testing, and building the online models playground.
