Skip to content

Arc-AI-Infra/llm_way

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Way / LLM 网关

LLM API Gateway with protocol-transparent proxy, AI-driven intelligent load balancing, and visual management.

一个 LLM API 网关:协议透传代理 + AI 驱动的智能负载调度 + 可视化管理。

English | 中文


English

Overview

LLM Way is a lightweight LLM API gateway built on Pingora. It transparently proxies requests to multiple upstream LLM providers, isolates API keys from clients, and uses an AI-powered agent to dynamically tune node weights based on real-time performance metrics.

Architecture

Client (Claude Code / OpenAI SDK / curl)
       │
       ▼
┌──────────────────────────────────────────┐
│              LLM Way Gateway             │
│                                          │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │  Transparent  │  │  Agent Tuning    │  │
│  │  Proxy        │  │                  │  │
│  │  · Passthrough│  │  · AI analysis   │  │
│  │  · Key swap   │  │  · Auto weight   │  │
│  │  · Failover   │  │  · Latency-aware │  │
│  └──────┬───────┘  └────────┬─────────┘  │
│         │                   │            │
│  ┌──────┴───────────────────┴──────────┐ │
│  │           Web Dashboard             │ │
│  │  · Add/remove nodes                 │ │
│  │  · Real-time weight adjustment      │ │
│  │  · Stats overview                   │ │
│  │  · Agent node selection             │ │
│  └─────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
               │
    ┌──────────┼──────────┬──────────┐
    ▼          ▼          ▼          ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌──────────┐
│DeepSeek│ │Pumpkin│ │ DDSST │ │ PPChat   │
└───────┘ └───────┘ └───────┘ └──────────┘

Features

Gateway Core

  • Protocol Transparent — whatever format the client sends, the upstream receives. Anthropic /v1/messages, OpenAI /v1/chat/completions, or any HTTP API.
  • API Key Isolation — clients use arbitrary tokens; the gateway swaps them for real upstream keys.
  • Weighted Round-Robin — requests distributed by weight, with least-active-requests tie-breaking.
  • Failover — retry on 5xx / 429 across other nodes; hide 4xx behind 503 to prevent client-side vendor rejection.
  • Health Cooldown — nodes with >3 consecutive failures enter cooldown; auto-recover on success.
  • Path Prefixbase_url supports path prefixes (e.g. https://api.deepseek.com/anthropic), auto-prepended by the gateway.

Load Balancing Algorithm

The gateway uses weighted round-robin combined with least-active-requests tie-breaking:

1. Filter: exclude disabled nodes and nodes in cooldown (>3 consecutive failures)
2. Fallback: if all nodes are excluded, fall back to all enabled nodes (still cooldown-safe)
3. Sort candidates by: active_requests ASC, then weight DESC
4. Pick the best candidate (fewest active requests; if tied, highest weight)

This ensures that:

  • Healthy nodes with higher weight get proportionally more traffic
  • Nodes with fewer in-flight requests are preferred (avoids overloading a single node)
  • Failing nodes are automatically excluded

Agent Auto-Tuning

Select a node as the Tuning Agent by clicking the ☆ Agent button in the dashboard. The agent runs every 15 seconds:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Collect stats│ ──▶ │  Build prompt│ ──▶ │  Call AI     │
│  succ/fail/  │     │  Send to     │     │  Get weight   │
│  latency/err │     │  Agent node  │     │  suggestions  │
└──────────────┘     └──────────────┘     └──────┬───────┘
       ▲                                         ▼
       │                                ┌──────────────┐
       │       Every 15s                │  Apply new    │
       └────────────────────────────────│  weights      │
                                        └──────────────┘

How it works:

  1. Collect — gather every node's success_count, failure_count, http_error_count, consecutive_failures, and avg_latency_ms (exponential moving average)
  2. Build prompt — construct a structured prompt describing each node's current state
  3. Call AI — use the agent node's own API key to invoke its AI endpoint with the prompt
  4. Parse response — extract the AI's JSON weight suggestions (supports markdown code blocks)
  5. Apply — update node weights immediately (capped at 1–10)

When no agent is set, falls back to rule-based tuning:

Condition Action
Consecutive failures > 0 Weight -= consecutive_failures (min 1)
Consecutive failures = 0 AND latency < 3s Weight += 1 (max 10)

Latency tracking uses exponential moving average (EMA):

avg_latency = (avg_latency * 7 + current_latency) / 8

This smooths out spikes while still responding to sustained changes.

Web Dashboard

Open http://127.0.0.1:29697/admin:

LLM Way dashboard overview

Agent tuning and routing plane

Feature Description
Node list Real-time stats for all nodes
Add node Fill in base_url, api_key, weight
Weight tuning Inline input, instant save
Toggle Enable/disable with one click
Agent select Click ☆/★ to set AI tuning agent
Stats overview Total nodes, success/failure/active counts

Quick Start

Build

# macOS prerequisite
brew install cmake

cargo build --release

Requires Rust 1.75+.

Run

cp nodes.json.example nodes.json
# Edit nodes.json with real API keys

./target/release/llm_way

Proxy on 0.0.0.0:29696, dashboard on 127.0.0.1:29697.

Add Upstream Nodes

curl -X POST http://127.0.0.1:29697/admin/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://api.deepseek.com/anthropic",
    "api_key": "sk-your-deepseek-key",
    "weight": 3
  }'

Set Tuning Agent

curl -X POST http://127.0.0.1:29697/admin/upstreams/<node-id>/set-agent

Test

# Anthropic format
curl http://127.0.0.1:29696/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}'

# OpenAI format
curl http://127.0.0.1:29696/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

Usage Scenarios

Claude Code

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "any-token",
    "ANTHROPIC_BASE_URL": "http://10.41.7.157:29696"
  },
  "model": "sonnet"
}

ANTHROPIC_BASE_URL should be http://host:port without /v1.

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://10.41.7.157:29696/v1",
    api_key="any-token",
)

REST API

Proxy 0.0.0.0:29696

All requests are transparently forwarded; no format conversion.

Admin API 127.0.0.1:29697

Method Path Description
GET /admin Web dashboard
GET /admin/upstreams List all nodes
POST /admin/upstreams Add a node
DELETE /admin/upstreams/{id} Remove a node
POST /admin/upstreams/{id}/toggle Toggle enable/disable
POST /admin/upstreams/{id}/weight Set weight {"weight":5}
POST /admin/upstreams/{id}/set-agent Set/unset as agent
GET /admin/stats Stats summary

Node Stats

Field Meaning
success_count Successful requests
failure_count Failed requests
active_requests In-flight requests
total_requests Total requests
http_error_count Non-2xx responses
consecutive_failures Consecutive failures (cooldown at >3)
avg_latency_ms Average latency (EMA)

Production Deployment

systemd Service

[Unit]
Description=LLM Way Gateway
After=network.target

[Service]
Type=simple
User=llmway
WorkingDirectory=/opt/llm_way
Environment="RUST_LOG=info"
Environment="LLM_WAY_CONFIG=/etc/llm_way/nodes.json"
ExecStart=/opt/llm_way/llm_way
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
Port Purpose Bind
29696 Proxy 0.0.0.0 (public)
29697 Admin 127.0.0.1 (local only)

Troubleshooting

  • 503 on all nodes: check base_url path prefixes and API keys
  • Agent not working: verify agent node is set via dashboard; check logs: grep auto_tune /var/log/llm_way.log
  • Claude Code retries: ensure ANTHROPIC_BASE_URL has no /v1 suffix

Tech Stack


中文

概述

LLM Way 是一个基于 Pingora 的轻量级 LLM API 网关。它将客户端请求透明代理到多个上游 LLM 供应商,隔离 API Key,并使用 AI 驱动的 Agent 根据实时性能指标动态调整节点权重。

架构

客户端 (Claude Code / OpenAI SDK / curl)
       │
       ▼
┌──────────────────────────────────────────┐
│              LLM Way Gateway             │
│                                          │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │  协议透传代理  │  │  Agent 智能调度   │  │
│  │  · 原样转发    │  │  · AI 分析节点    │  │
│  │  · API Key 隔离│  │  · 自动调节权重   │  │
│  │  · 故障转移    │  │  · 延迟/错误感知  │  │
│  └──────┬───────┘  └────────┬─────────┘  │
│         │                   │            │
│  ┌──────┴───────────────────┴──────────┐ │
│  │           Web 管理台                 │ │
│  │  · 节点增删/启停 · 实时权重调节      │ │
│  │  · 统计大盘 · Agent 节点指定        │ │
│  └─────────────────────────────────────┘ │
└──────────────┬───────────────────────────┘
               │
    ┌──────────┼──────────┬──────────┐
    ▼          ▼          ▼          ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌──────────┐
│DeepSeek│ │Pumpkin│ │ DDSST │ │ PPChat   │
└───────┘ └───────┘ └───────┘ └──────────┘

功能特性

网关核心

  • 协议透传 — 客户端发送什么格式,上游就收到什么格式。支持 Anthropic /v1/messages、OpenAI /v1/chat/completions 等任意 HTTP API
  • API Key 隔离 — 客户端可使用任意 token,网关自动替换为上游节点的真实 API Key
  • 加权轮询 — 按权重分配请求,同时以最少活跃请求数作为 tie-breaking
  • 故障转移 — 上游返回 5xx / 429 时自动重试其他节点;4xx 返回 503 隐藏错误避免客户端拒绝供应商
  • 健康冷却 — 连续失败超过 3 次的节点自动进入冷却期,恢复后自动重新上线
  • 路径前缀base_url 支持带路径前缀(如 https://api.deepseek.com/anthropic),网关自动拼接

负载均衡算法

网关使用 加权轮询(Weighted Round-Robin) + 最少活跃请求数(Least Active Requests) 双因子调度:

1. 过滤:排除禁用节点和冷却中的节点(连续失败 > 3 次)
2. 兜底:如果所有节点都被排除,回退到所有已启用节点(仍排除冷却中)
3. 排序:按 active_requests ASC(最少活跃优先),然后 weight DESC(高权重优先)
4. 选择:活跃请求最少的节点;平局时选权重最高的

这种设计确保:

  • 高权重健康节点获得更多流量
  • 避免单一节点被请求压垮(优先选空闲节点)
  • 故障节点自动排除,恢复后自动重新加入

Agent 智能调度

在管理台点击节点的 ☆ Agent 按钮将其指定为调度 Agent,每 15 秒 自动运行:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  收集节点统计  │ ──▶ │  构造 Prompt  │ ──▶ │  调用 AI     │
│  succ/fail/  │     │  发送给 Agent │     │  返回权重建议  │
│  latency/err │     └──────────────┘     └──────┬───────┘
└──────────────┘                                 │
       ▲                                         ▼
       │                                ┌──────────────┐
       │       每 15s 循环               │  应用新权重   │
       └────────────────────────────────│  即时生效     │
                                        └──────────────┘

工作流程

  1. 收集 — 采集所有节点的 success_countfailure_counthttp_error_countconsecutive_failuresavg_latency_ms(指数移动平均)
  2. 构造 Prompt — 生成结构化提示词,描述每个节点的当前状态
  3. 调用 AI — 使用 Agent 节点自身的 API Key,调用其 AI 端点
  4. 解析响应 — 提取 AI 返回的 JSON 权重建议(支持 markdown 代码块格式)
  5. 应用 — 立即更新节点权重(限制在 1–10 范围)

未设置 Agent 时,fallback 到规则式调节:

条件 动作
连续失败 > 0 权重 -= 连续失败数(最低 1)
连续失败 = 0 且延迟 < 3s 权重 += 1(最高 10)

延迟追踪使用指数移动平均(EMA)平滑:

avg_latency = (avg_latency * 7 + current_latency) / 8

平滑突刺的同时快速响应持续变化。

Web 管理台

打开 http://127.0.0.1:29697/admin

LLM Way 管理台总览

Agent 调度与路由策略

功能 说明
节点列表 实时显示所有节点及统计
添加节点 填写 base_url、api_key、权重
权重调节 行内输入框直接修改权重,即时生效
启停节点 一键切换启用/禁用
Agent 指定 点击 ☆/★ 按钮指定智能调度 Agent
统计大盘 总节点数、成功/失败/活跃请求数

快速开始

构建

# 前置依赖(macOS)
brew install cmake

# 构建
cargo build --release

要求 Rust 1.75+。

启动

cp nodes.json.example nodes.json   # 编辑 nodes.json 填入真实 API Key
./target/release/llm_way

代理端口 0.0.0.0:29696,管理台 127.0.0.1:29697

添加上游节点

curl -X POST http://127.0.0.1:29697/admin/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://api.deepseek.com/anthropic",
    "api_key": "sk-your-deepseek-key",
    "weight": 3
  }'

指定调度 Agent

curl -X POST http://127.0.0.1:29697/admin/upstreams/<node-id>/set-agent

测试

# Anthropic 格式
curl http://127.0.0.1:29696/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}'

# OpenAI 格式
curl http://127.0.0.1:29696/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

使用场景

Claude Code

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "any-token",
    "ANTHROPIC_BASE_URL": "http://10.41.7.157:29696"
  },
  "model": "sonnet"
}

ANTHROPIC_BASE_URL 只需写 http://host:port,不要带 /v1

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://10.41.7.157:29696/v1",
    api_key="any-token",
)

REST API

代理 0.0.0.0:29696

所有请求原样透传到上游节点,不做格式转换。

管理 API 127.0.0.1:29697

方法 路径 说明
GET /admin Web 管理台
GET /admin/upstreams 列出所有节点
POST /admin/upstreams 添加节点
DELETE /admin/upstreams/{id} 删除节点
POST /admin/upstreams/{id}/toggle 启停节点
POST /admin/upstreams/{id}/weight 更新权重 {"weight":5}
POST /admin/upstreams/{id}/set-agent 设为/取消 Agent 节点
GET /admin/stats 统计概览

节点统计

字段 含义
success_count 成功请求数
failure_count 失败请求数
active_requests 进行中请求数
total_requests 总请求数
http_error_count HTTP 错误数(非 2xx)
retry_count 触发上游重试次数
consecutive_failures 连续失败次数,>3 进入冷却
avg_latency_ms 平均延迟(指数移动平均)

生产部署

systemd 服务

[Unit]
Description=LLM Way Gateway
After=network.target

[Service]
Type=simple
User=llmway
WorkingDirectory=/opt/llm_way
Environment="RUST_LOG=info"
Environment="LLM_WAY_CONFIG=/etc/llm_way/nodes.json"
ExecStart=/opt/llm_way/llm_way
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
端口 用途 建议绑定
29696 代理服务 0.0.0.0(对外)
29697 管理 API 127.0.0.1(仅本地)

故障排查

  • 节点一直 503:检查 base_url 路径前缀和 API Key 有效性
  • Agent 不工作:确认管理台已指定 Agent 节点;查日志 grep auto_tune /var/log/llm_way.log
  • Claude Code 重试:确认 ANTHROPIC_BASE_URL 不带 /v1;确认至少一个节点启用且有成功记录

技术栈

License

Apache-2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages