LLM Way / LLM 网关

LLM API Gateway with protocol-transparent proxy, AI-driven intelligent load balancing, and visual management.

一个 LLM API 网关：协议透传代理 + AI 驱动的智能负载调度 + 可视化管理。

English

Overview

LLM Way is a lightweight LLM API gateway built on Pingora. It transparently proxies requests to multiple upstream LLM providers, isolates API keys from clients, and uses an AI-powered agent to dynamically tune node weights based on real-time performance metrics.

Architecture

Client (Claude Code / OpenAI SDK / curl)
       │
       ▼
┌──────────────────────────────────────────┐
│              LLM Way Gateway             │
│                                          │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │  Transparent  │  │  Agent Tuning    │  │
│  │  Proxy        │  │                  │  │
│  │  · Passthrough│  │  · AI analysis   │  │
│  │  · Key swap   │  │  · Auto weight   │  │
│  │  · Failover   │  │  · Latency-aware │  │
│  └──────┬───────┘  └────────┬─────────┘  │
│         │                   │            │
│  ┌──────┴───────────────────┴──────────┐ │
│  │           Web Dashboard             │ │
│  │  · Add/remove nodes                 │ │
│  │  · Real-time weight adjustment      │ │
│  │  · Stats overview                   │ │
│  │  · Agent node selection             │ │
│  └─────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
               │
    ┌──────────┼──────────┬──────────┐
    ▼          ▼          ▼          ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌──────────┐
│DeepSeek│ │Pumpkin│ │ DDSST │ │ PPChat   │
└───────┘ └───────┘ └───────┘ └──────────┘

Features

Gateway Core

Protocol Transparent — whatever format the client sends, the upstream receives. Anthropic /v1/messages, OpenAI /v1/chat/completions, or any HTTP API.
API Key Isolation — clients use arbitrary tokens; the gateway swaps them for real upstream keys.
Weighted Round-Robin — requests distributed by weight, with least-active-requests tie-breaking.
Failover — retry on 5xx / 429 across other nodes; hide 4xx behind 503 to prevent client-side vendor rejection.
Health Cooldown — nodes with >3 consecutive failures enter cooldown; auto-recover on success.
Path Prefix — base_url supports path prefixes (e.g. https://api.deepseek.com/anthropic), auto-prepended by the gateway.

Load Balancing Algorithm

The gateway uses weighted round-robin combined with least-active-requests tie-breaking:

1. Filter: exclude disabled nodes and nodes in cooldown (>3 consecutive failures)
2. Fallback: if all nodes are excluded, fall back to all enabled nodes (still cooldown-safe)
3. Sort candidates by: active_requests ASC, then weight DESC
4. Pick the best candidate (fewest active requests; if tied, highest weight)

This ensures that:

Healthy nodes with higher weight get proportionally more traffic
Nodes with fewer in-flight requests are preferred (avoids overloading a single node)
Failing nodes are automatically excluded

Agent Auto-Tuning

Select a node as the Tuning Agent by clicking the ☆ Agent button in the dashboard. The agent runs every 15 seconds:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Collect stats│ ──▶ │  Build prompt│ ──▶ │  Call AI     │
│  succ/fail/  │     │  Send to     │     │  Get weight   │
│  latency/err │     │  Agent node  │     │  suggestions  │
└──────────────┘     └──────────────┘     └──────┬───────┘
       ▲                                         ▼
       │                                ┌──────────────┐
       │       Every 15s                │  Apply new    │
       └────────────────────────────────│  weights      │
                                        └──────────────┘

How it works:

Collect — gather every node's success_count, failure_count, http_error_count, consecutive_failures, and avg_latency_ms (exponential moving average)
Build prompt — construct a structured prompt describing each node's current state
Call AI — use the agent node's own API key to invoke its AI endpoint with the prompt
Parse response — extract the AI's JSON weight suggestions (supports markdown code blocks)
Apply — update node weights immediately (capped at 1–10)

When no agent is set, falls back to rule-based tuning:

Condition	Action
Consecutive failures > 0	Weight -= consecutive_failures (min 1)
Consecutive failures = 0 AND latency < 3s	Weight += 1 (max 10)

Latency tracking uses exponential moving average (EMA):

avg_latency = (avg_latency * 7 + current_latency) / 8

This smooths out spikes while still responding to sustained changes.

Web Dashboard

Open http://127.0.0.1:29697/admin:

Feature	Description
Node list	Real-time stats for all nodes
Add node	Fill in base_url, api_key, weight
Weight tuning	Inline input, instant save
Toggle	Enable/disable with one click
Agent select	Click ☆/★ to set AI tuning agent
Stats overview	Total nodes, success/failure/active counts

Quick Start

Build

# macOS prerequisite
brew install cmake

cargo build --release

Requires Rust 1.75+.

Run

cp nodes.json.example nodes.json
# Edit nodes.json with real API keys

./target/release/llm_way

Proxy on 0.0.0.0:29696, dashboard on 127.0.0.1:29697.

Add Upstream Nodes

curl -X POST http://127.0.0.1:29697/admin/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://api.deepseek.com/anthropic",
    "api_key": "sk-your-deepseek-key",
    "weight": 3
  }'

Set Tuning Agent

curl -X POST http://127.0.0.1:29697/admin/upstreams/<node-id>/set-agent

Test

# Anthropic format
curl http://127.0.0.1:29696/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}'

# OpenAI format
curl http://127.0.0.1:29696/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

Usage Scenarios

Claude Code

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "any-token",
    "ANTHROPIC_BASE_URL": "http://10.41.7.157:29696"
  },
  "model": "sonnet"
}

ANTHROPIC_BASE_URL should be http://host:port without /v1.

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://10.41.7.157:29696/v1",
    api_key="any-token",
)

REST API

Proxy `0.0.0.0:29696`

All requests are transparently forwarded; no format conversion.

Admin API `127.0.0.1:29697`

Method	Path	Description
`GET`	`/admin`	Web dashboard
`GET`	`/admin/upstreams`	List all nodes
`POST`	`/admin/upstreams`	Add a node
`DELETE`	`/admin/upstreams/{id}`	Remove a node
`POST`	`/admin/upstreams/{id}/toggle`	Toggle enable/disable
`POST`	`/admin/upstreams/{id}/weight`	Set weight `{"weight":5}`
`POST`	`/admin/upstreams/{id}/set-agent`	Set/unset as agent
`GET`	`/admin/stats`	Stats summary

Node Stats

Field	Meaning
`success_count`	Successful requests
`failure_count`	Failed requests
`active_requests`	In-flight requests
`total_requests`	Total requests
`http_error_count`	Non-2xx responses
`consecutive_failures`	Consecutive failures (cooldown at >3)
`avg_latency_ms`	Average latency (EMA)

Production Deployment

systemd Service

[Unit]
Description=LLM Way Gateway
After=network.target

[Service]
Type=simple
User=llmway
WorkingDirectory=/opt/llm_way
Environment="RUST_LOG=info"
Environment="LLM_WAY_CONFIG=/etc/llm_way/nodes.json"
ExecStart=/opt/llm_way/llm_way
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Port	Purpose	Bind
29696	Proxy	`0.0.0.0` (public)
29697	Admin	`127.0.0.1` (local only)

Troubleshooting

503 on all nodes: check base_url path prefixes and API keys
Agent not working: verify agent node is set via dashboard; check logs: grep auto_tune /var/log/llm_way.log
Claude Code retries: ensure ANTHROPIC_BASE_URL has no /v1 suffix

Tech Stack

Pingora — HTTP proxy engine
Axum — admin API
Tokio — async runtime
Reqwest — agent AI calls
Serde — JSON serialization

中文

概述

LLM Way 是一个基于 Pingora 的轻量级 LLM API 网关。它将客户端请求透明代理到多个上游 LLM 供应商，隔离 API Key，并使用 AI 驱动的 Agent 根据实时性能指标动态调整节点权重。

架构

客户端 (Claude Code / OpenAI SDK / curl)
       │
       ▼
┌──────────────────────────────────────────┐
│              LLM Way Gateway             │
│                                          │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │  协议透传代理  │  │  Agent 智能调度   │  │
│  │  · 原样转发    │  │  · AI 分析节点    │  │
│  │  · API Key 隔离│  │  · 自动调节权重   │  │
│  │  · 故障转移    │  │  · 延迟/错误感知  │  │
│  └──────┬───────┘  └────────┬─────────┘  │
│         │                   │            │
│  ┌──────┴───────────────────┴──────────┐ │
│  │           Web 管理台                 │ │
│  │  · 节点增删/启停 · 实时权重调节      │ │
│  │  · 统计大盘 · Agent 节点指定        │ │
│  └─────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
               │
    ┌──────────┼──────────┬──────────┐
    ▼          ▼          ▼          ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌──────────┐
│DeepSeek│ │Pumpkin│ │ DDSST │ │ PPChat   │
└───────┘ └───────┘ └───────┘ └──────────┘

功能特性

网关核心

协议透传 — 客户端发送什么格式，上游就收到什么格式。支持 Anthropic /v1/messages、OpenAI /v1/chat/completions 等任意 HTTP API
API Key 隔离 — 客户端可使用任意 token，网关自动替换为上游节点的真实 API Key
加权轮询 — 按权重分配请求，同时以最少活跃请求数作为 tie-breaking
故障转移 — 上游返回 5xx / 429 时自动重试其他节点；4xx 返回 503 隐藏错误避免客户端拒绝供应商
健康冷却 — 连续失败超过 3 次的节点自动进入冷却期，恢复后自动重新上线
路径前缀 — base_url 支持带路径前缀（如 https://api.deepseek.com/anthropic），网关自动拼接

负载均衡算法

网关使用 加权轮询（Weighted Round-Robin） + 最少活跃请求数（Least Active Requests） 双因子调度：

1. 过滤：排除禁用节点和冷却中的节点（连续失败 > 3 次）
2. 兜底：如果所有节点都被排除，回退到所有已启用节点（仍排除冷却中）
3. 排序：按 active_requests ASC（最少活跃优先），然后 weight DESC（高权重优先）
4. 选择：活跃请求最少的节点；平局时选权重最高的

这种设计确保：

高权重健康节点获得更多流量
避免单一节点被请求压垮（优先选空闲节点）
故障节点自动排除，恢复后自动重新加入

Agent 智能调度

在管理台点击节点的 ☆ Agent 按钮将其指定为调度 Agent，每 15 秒 自动运行：

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  收集节点统计  │ ──▶ │  构造 Prompt  │ ──▶ │  调用 AI     │
│  succ/fail/  │     │  发送给 Agent │     │  返回权重建议  │
│  latency/err │     └──────────────┘     └──────┬───────┘
└──────────────┘                                 │
       ▲                                         ▼
       │                                ┌──────────────┐
       │       每 15s 循环               │  应用新权重   │
       └────────────────────────────────│  即时生效     │
                                        └──────────────┘

工作流程：

收集 — 采集所有节点的 success_count、failure_count、http_error_count、consecutive_failures、avg_latency_ms（指数移动平均）
构造 Prompt — 生成结构化提示词，描述每个节点的当前状态
调用 AI — 使用 Agent 节点自身的 API Key，调用其 AI 端点
解析响应 — 提取 AI 返回的 JSON 权重建议（支持 markdown 代码块格式）
应用 — 立即更新节点权重（限制在 1–10 范围）

未设置 Agent 时，fallback 到规则式调节：

条件	动作
连续失败 > 0	权重 -= 连续失败数（最低 1）
连续失败 = 0 且延迟 < 3s	权重 += 1（最高 10）

延迟追踪使用指数移动平均（EMA）平滑：

avg_latency = (avg_latency * 7 + current_latency) / 8

平滑突刺的同时快速响应持续变化。

Web 管理台

打开 http://127.0.0.1:29697/admin：

功能	说明
节点列表	实时显示所有节点及统计
添加节点	填写 base_url、api_key、权重
权重调节	行内输入框直接修改权重，即时生效
启停节点	一键切换启用/禁用
Agent 指定	点击 ☆/★ 按钮指定智能调度 Agent
统计大盘	总节点数、成功/失败/活跃请求数

快速开始

构建

# 前置依赖（macOS）
brew install cmake

# 构建
cargo build --release

要求 Rust 1.75+。

启动

cp nodes.json.example nodes.json   # 编辑 nodes.json 填入真实 API Key
./target/release/llm_way

代理端口 0.0.0.0:29696，管理台 127.0.0.1:29697。

添加上游节点

curl -X POST http://127.0.0.1:29697/admin/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://api.deepseek.com/anthropic",
    "api_key": "sk-your-deepseek-key",
    "weight": 3
  }'

指定调度 Agent

curl -X POST http://127.0.0.1:29697/admin/upstreams/<node-id>/set-agent

测试

# Anthropic 格式
curl http://127.0.0.1:29696/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}'

# OpenAI 格式
curl http://127.0.0.1:29696/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

使用场景

Claude Code

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "any-token",
    "ANTHROPIC_BASE_URL": "http://10.41.7.157:29696"
  },
  "model": "sonnet"
}

ANTHROPIC_BASE_URL 只需写 http://host:port，不要带 /v1。

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://10.41.7.157:29696/v1",
    api_key="any-token",
)

REST API

代理 `0.0.0.0:29696`

所有请求原样透传到上游节点，不做格式转换。

管理 API `127.0.0.1:29697`

方法	路径	说明
`GET`	`/admin`	Web 管理台
`GET`	`/admin/upstreams`	列出所有节点
`POST`	`/admin/upstreams`	添加节点
`DELETE`	`/admin/upstreams/{id}`	删除节点
`POST`	`/admin/upstreams/{id}/toggle`	启停节点
`POST`	`/admin/upstreams/{id}/weight`	更新权重 `{"weight":5}`
`POST`	`/admin/upstreams/{id}/set-agent`	设为/取消 Agent 节点
`GET`	`/admin/stats`	统计概览

节点统计

字段	含义
`success_count`	成功请求数
`failure_count`	失败请求数
`active_requests`	进行中请求数
`total_requests`	总请求数
`http_error_count`	HTTP 错误数（非 2xx）
`retry_count`	触发上游重试次数
`consecutive_failures`	连续失败次数，>3 进入冷却
`avg_latency_ms`	平均延迟（指数移动平均）

生产部署

systemd 服务

[Unit]
Description=LLM Way Gateway
After=network.target

[Service]
Type=simple
User=llmway
WorkingDirectory=/opt/llm_way
Environment="RUST_LOG=info"
Environment="LLM_WAY_CONFIG=/etc/llm_way/nodes.json"
ExecStart=/opt/llm_way/llm_way
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

端口	用途	建议绑定
29696	代理服务	`0.0.0.0`（对外）
29697	管理 API	`127.0.0.1`（仅本地）

故障排查

节点一直 503：检查 base_url 路径前缀和 API Key 有效性
Agent 不工作：确认管理台已指定 Agent 节点；查日志 grep auto_tune /var/log/llm_way.log
Claude Code 重试：确认 ANTHROPIC_BASE_URL 不带 /v1；确认至少一个节点启用且有成功记录

技术栈

Pingora — HTTP 代理引擎
Axum — 管理 API
Tokio — 异步运行时
Reqwest — Agent AI 调用
Serde — JSON 序列化

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs/images		docs/images
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
nodes.json.example		nodes.json.example
requirements.md		requirements.md

Folders and files

Latest commit

History

Repository files navigation

LLM Way / LLM 网关

English

Overview

Architecture

Features

Gateway Core

Load Balancing Algorithm

Agent Auto-Tuning

Web Dashboard

Quick Start

Build

Run

Add Upstream Nodes

Set Tuning Agent

Test

Usage Scenarios

Claude Code

OpenAI SDK

REST API

Proxy 0.0.0.0:29696

Admin API 127.0.0.1:29697

Node Stats

Production Deployment

systemd Service

Troubleshooting

Tech Stack

中文

概述

架构

功能特性

网关核心

负载均衡算法

Agent 智能调度

Web 管理台

快速开始

构建

启动

添加上游节点

指定调度 Agent

测试

使用场景

Claude Code

OpenAI SDK

REST API

代理 0.0.0.0:29696

管理 API 127.0.0.1:29697

节点统计

生产部署

systemd 服务

故障排查

技术栈

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Proxy `0.0.0.0:29696`

Admin API `127.0.0.1:29697`

代理 `0.0.0.0:29696`

管理 API `127.0.0.1:29697`

Packages