LLM API Gateway with protocol-transparent proxy, AI-driven intelligent load balancing, and visual management.
一个 LLM API 网关:协议透传代理 + AI 驱动的智能负载调度 + 可视化管理。
LLM Way is a lightweight LLM API gateway built on Pingora. It transparently proxies requests to multiple upstream LLM providers, isolates API keys from clients, and uses an AI-powered agent to dynamically tune node weights based on real-time performance metrics.
Client (Claude Code / OpenAI SDK / curl)
│
▼
┌──────────────────────────────────────────┐
│ LLM Way Gateway │
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Transparent │ │ Agent Tuning │ │
│ │ Proxy │ │ │ │
│ │ · Passthrough│ │ · AI analysis │ │
│ │ · Key swap │ │ · Auto weight │ │
│ │ · Failover │ │ · Latency-aware │ │
│ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │
│ ┌──────┴───────────────────┴──────────┐ │
│ │ Web Dashboard │ │
│ │ · Add/remove nodes │ │
│ │ · Real-time weight adjustment │ │
│ │ · Stats overview │ │
│ │ · Agent node selection │ │
│ └─────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
│
┌──────────┼──────────┬──────────┐
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌──────────┐
│DeepSeek│ │Pumpkin│ │ DDSST │ │ PPChat │
└───────┘ └───────┘ └───────┘ └──────────┘
- Protocol Transparent — whatever format the client sends, the upstream receives. Anthropic
/v1/messages, OpenAI/v1/chat/completions, or any HTTP API. - API Key Isolation — clients use arbitrary tokens; the gateway swaps them for real upstream keys.
- Weighted Round-Robin — requests distributed by weight, with least-active-requests tie-breaking.
- Failover — retry on 5xx / 429 across other nodes; hide 4xx behind
503to prevent client-side vendor rejection. - Health Cooldown — nodes with >3 consecutive failures enter cooldown; auto-recover on success.
- Path Prefix —
base_urlsupports path prefixes (e.g.https://api.deepseek.com/anthropic), auto-prepended by the gateway.
The gateway uses weighted round-robin combined with least-active-requests tie-breaking:
1. Filter: exclude disabled nodes and nodes in cooldown (>3 consecutive failures)
2. Fallback: if all nodes are excluded, fall back to all enabled nodes (still cooldown-safe)
3. Sort candidates by: active_requests ASC, then weight DESC
4. Pick the best candidate (fewest active requests; if tied, highest weight)
This ensures that:
- Healthy nodes with higher weight get proportionally more traffic
- Nodes with fewer in-flight requests are preferred (avoids overloading a single node)
- Failing nodes are automatically excluded
Select a node as the Tuning Agent by clicking the ☆ Agent button in the dashboard. The agent runs every 15 seconds:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Collect stats│ ──▶ │ Build prompt│ ──▶ │ Call AI │
│ succ/fail/ │ │ Send to │ │ Get weight │
│ latency/err │ │ Agent node │ │ suggestions │
└──────────────┘ └──────────────┘ └──────┬───────┘
▲ ▼
│ ┌──────────────┐
│ Every 15s │ Apply new │
└────────────────────────────────│ weights │
└──────────────┘
How it works:
- Collect — gather every node's
success_count,failure_count,http_error_count,consecutive_failures, andavg_latency_ms(exponential moving average) - Build prompt — construct a structured prompt describing each node's current state
- Call AI — use the agent node's own API key to invoke its AI endpoint with the prompt
- Parse response — extract the AI's JSON weight suggestions (supports markdown code blocks)
- Apply — update node weights immediately (capped at 1–10)
When no agent is set, falls back to rule-based tuning:
| Condition | Action |
|---|---|
| Consecutive failures > 0 | Weight -= consecutive_failures (min 1) |
| Consecutive failures = 0 AND latency < 3s | Weight += 1 (max 10) |
Latency tracking uses exponential moving average (EMA):
avg_latency = (avg_latency * 7 + current_latency) / 8
This smooths out spikes while still responding to sustained changes.
Open http://127.0.0.1:29697/admin:
| Feature | Description |
|---|---|
| Node list | Real-time stats for all nodes |
| Add node | Fill in base_url, api_key, weight |
| Weight tuning | Inline input, instant save |
| Toggle | Enable/disable with one click |
| Agent select | Click ☆/★ to set AI tuning agent |
| Stats overview | Total nodes, success/failure/active counts |
# macOS prerequisite
brew install cmake
cargo build --releaseRequires Rust 1.75+.
cp nodes.json.example nodes.json
# Edit nodes.json with real API keys
./target/release/llm_wayProxy on 0.0.0.0:29696, dashboard on 127.0.0.1:29697.
curl -X POST http://127.0.0.1:29697/admin/upstreams \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://api.deepseek.com/anthropic",
"api_key": "sk-your-deepseek-key",
"weight": 3
}'curl -X POST http://127.0.0.1:29697/admin/upstreams/<node-id>/set-agent# Anthropic format
curl http://127.0.0.1:29696/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-sonnet-4-6","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}'
# OpenAI format
curl http://127.0.0.1:29696/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'{
"env": {
"ANTHROPIC_AUTH_TOKEN": "any-token",
"ANTHROPIC_BASE_URL": "http://10.41.7.157:29696"
},
"model": "sonnet"
}
ANTHROPIC_BASE_URLshould behttp://host:portwithout/v1.
from openai import OpenAI
client = OpenAI(
base_url="http://10.41.7.157:29696/v1",
api_key="any-token",
)All requests are transparently forwarded; no format conversion.
| Method | Path | Description |
|---|---|---|
GET |
/admin |
Web dashboard |
GET |
/admin/upstreams |
List all nodes |
POST |
/admin/upstreams |
Add a node |
DELETE |
/admin/upstreams/{id} |
Remove a node |
POST |
/admin/upstreams/{id}/toggle |
Toggle enable/disable |
POST |
/admin/upstreams/{id}/weight |
Set weight {"weight":5} |
POST |
/admin/upstreams/{id}/set-agent |
Set/unset as agent |
GET |
/admin/stats |
Stats summary |
| Field | Meaning |
|---|---|
success_count |
Successful requests |
failure_count |
Failed requests |
active_requests |
In-flight requests |
total_requests |
Total requests |
http_error_count |
Non-2xx responses |
consecutive_failures |
Consecutive failures (cooldown at >3) |
avg_latency_ms |
Average latency (EMA) |
[Unit]
Description=LLM Way Gateway
After=network.target
[Service]
Type=simple
User=llmway
WorkingDirectory=/opt/llm_way
Environment="RUST_LOG=info"
Environment="LLM_WAY_CONFIG=/etc/llm_way/nodes.json"
ExecStart=/opt/llm_way/llm_way
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target| Port | Purpose | Bind |
|---|---|---|
| 29696 | Proxy | 0.0.0.0 (public) |
| 29697 | Admin | 127.0.0.1 (local only) |
- 503 on all nodes: check
base_urlpath prefixes and API keys - Agent not working: verify agent node is set via dashboard; check logs:
grep auto_tune /var/log/llm_way.log - Claude Code retries: ensure
ANTHROPIC_BASE_URLhas no/v1suffix
- Pingora — HTTP proxy engine
- Axum — admin API
- Tokio — async runtime
- Reqwest — agent AI calls
- Serde — JSON serialization
LLM Way 是一个基于 Pingora 的轻量级 LLM API 网关。它将客户端请求透明代理到多个上游 LLM 供应商,隔离 API Key,并使用 AI 驱动的 Agent 根据实时性能指标动态调整节点权重。
客户端 (Claude Code / OpenAI SDK / curl)
│
▼
┌──────────────────────────────────────────┐
│ LLM Way Gateway │
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ 协议透传代理 │ │ Agent 智能调度 │ │
│ │ · 原样转发 │ │ · AI 分析节点 │ │
│ │ · API Key 隔离│ │ · 自动调节权重 │ │
│ │ · 故障转移 │ │ · 延迟/错误感知 │ │
│ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │
│ ┌──────┴───────────────────┴──────────┐ │
│ │ Web 管理台 │ │
│ │ · 节点增删/启停 · 实时权重调节 │ │
│ │ · 统计大盘 · Agent 节点指定 │ │
│ └─────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
│
┌──────────┼──────────┬──────────┐
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌──────────┐
│DeepSeek│ │Pumpkin│ │ DDSST │ │ PPChat │
└───────┘ └───────┘ └───────┘ └──────────┘
- 协议透传 — 客户端发送什么格式,上游就收到什么格式。支持 Anthropic
/v1/messages、OpenAI/v1/chat/completions等任意 HTTP API - API Key 隔离 — 客户端可使用任意 token,网关自动替换为上游节点的真实 API Key
- 加权轮询 — 按权重分配请求,同时以最少活跃请求数作为 tie-breaking
- 故障转移 — 上游返回 5xx / 429 时自动重试其他节点;4xx 返回
503隐藏错误避免客户端拒绝供应商 - 健康冷却 — 连续失败超过 3 次的节点自动进入冷却期,恢复后自动重新上线
- 路径前缀 —
base_url支持带路径前缀(如https://api.deepseek.com/anthropic),网关自动拼接
网关使用 加权轮询(Weighted Round-Robin) + 最少活跃请求数(Least Active Requests) 双因子调度:
1. 过滤:排除禁用节点和冷却中的节点(连续失败 > 3 次)
2. 兜底:如果所有节点都被排除,回退到所有已启用节点(仍排除冷却中)
3. 排序:按 active_requests ASC(最少活跃优先),然后 weight DESC(高权重优先)
4. 选择:活跃请求最少的节点;平局时选权重最高的
这种设计确保:
- 高权重健康节点获得更多流量
- 避免单一节点被请求压垮(优先选空闲节点)
- 故障节点自动排除,恢复后自动重新加入
在管理台点击节点的 ☆ Agent 按钮将其指定为调度 Agent,每 15 秒 自动运行:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ 收集节点统计 │ ──▶ │ 构造 Prompt │ ──▶ │ 调用 AI │
│ succ/fail/ │ │ 发送给 Agent │ │ 返回权重建议 │
│ latency/err │ └──────────────┘ └──────┬───────┘
└──────────────┘ │
▲ ▼
│ ┌──────────────┐
│ 每 15s 循环 │ 应用新权重 │
└────────────────────────────────│ 即时生效 │
└──────────────┘
工作流程:
- 收集 — 采集所有节点的
success_count、failure_count、http_error_count、consecutive_failures、avg_latency_ms(指数移动平均) - 构造 Prompt — 生成结构化提示词,描述每个节点的当前状态
- 调用 AI — 使用 Agent 节点自身的 API Key,调用其 AI 端点
- 解析响应 — 提取 AI 返回的 JSON 权重建议(支持 markdown 代码块格式)
- 应用 — 立即更新节点权重(限制在 1–10 范围)
未设置 Agent 时,fallback 到规则式调节:
| 条件 | 动作 |
|---|---|
| 连续失败 > 0 | 权重 -= 连续失败数(最低 1) |
| 连续失败 = 0 且延迟 < 3s | 权重 += 1(最高 10) |
延迟追踪使用指数移动平均(EMA)平滑:
avg_latency = (avg_latency * 7 + current_latency) / 8
平滑突刺的同时快速响应持续变化。
打开 http://127.0.0.1:29697/admin:
| 功能 | 说明 |
|---|---|
| 节点列表 | 实时显示所有节点及统计 |
| 添加节点 | 填写 base_url、api_key、权重 |
| 权重调节 | 行内输入框直接修改权重,即时生效 |
| 启停节点 | 一键切换启用/禁用 |
| Agent 指定 | 点击 ☆/★ 按钮指定智能调度 Agent |
| 统计大盘 | 总节点数、成功/失败/活跃请求数 |
# 前置依赖(macOS)
brew install cmake
# 构建
cargo build --release要求 Rust 1.75+。
cp nodes.json.example nodes.json # 编辑 nodes.json 填入真实 API Key
./target/release/llm_way代理端口 0.0.0.0:29696,管理台 127.0.0.1:29697。
curl -X POST http://127.0.0.1:29697/admin/upstreams \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://api.deepseek.com/anthropic",
"api_key": "sk-your-deepseek-key",
"weight": 3
}'curl -X POST http://127.0.0.1:29697/admin/upstreams/<node-id>/set-agent# Anthropic 格式
curl http://127.0.0.1:29696/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-sonnet-4-6","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}'
# OpenAI 格式
curl http://127.0.0.1:29696/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'{
"env": {
"ANTHROPIC_AUTH_TOKEN": "any-token",
"ANTHROPIC_BASE_URL": "http://10.41.7.157:29696"
},
"model": "sonnet"
}
ANTHROPIC_BASE_URL只需写http://host:port,不要带/v1。
from openai import OpenAI
client = OpenAI(
base_url="http://10.41.7.157:29696/v1",
api_key="any-token",
)所有请求原样透传到上游节点,不做格式转换。
| 方法 | 路径 | 说明 |
|---|---|---|
GET |
/admin |
Web 管理台 |
GET |
/admin/upstreams |
列出所有节点 |
POST |
/admin/upstreams |
添加节点 |
DELETE |
/admin/upstreams/{id} |
删除节点 |
POST |
/admin/upstreams/{id}/toggle |
启停节点 |
POST |
/admin/upstreams/{id}/weight |
更新权重 {"weight":5} |
POST |
/admin/upstreams/{id}/set-agent |
设为/取消 Agent 节点 |
GET |
/admin/stats |
统计概览 |
| 字段 | 含义 |
|---|---|
success_count |
成功请求数 |
failure_count |
失败请求数 |
active_requests |
进行中请求数 |
total_requests |
总请求数 |
http_error_count |
HTTP 错误数(非 2xx) |
retry_count |
触发上游重试次数 |
consecutive_failures |
连续失败次数,>3 进入冷却 |
avg_latency_ms |
平均延迟(指数移动平均) |
[Unit]
Description=LLM Way Gateway
After=network.target
[Service]
Type=simple
User=llmway
WorkingDirectory=/opt/llm_way
Environment="RUST_LOG=info"
Environment="LLM_WAY_CONFIG=/etc/llm_way/nodes.json"
ExecStart=/opt/llm_way/llm_way
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target| 端口 | 用途 | 建议绑定 |
|---|---|---|
| 29696 | 代理服务 | 0.0.0.0(对外) |
| 29697 | 管理 API | 127.0.0.1(仅本地) |
- 节点一直 503:检查
base_url路径前缀和 API Key 有效性 - Agent 不工作:确认管理台已指定 Agent 节点;查日志
grep auto_tune /var/log/llm_way.log - Claude Code 重试:确认
ANTHROPIC_BASE_URL不带/v1;确认至少一个节点启用且有成功记录
Apache-2.0

