Skip to content

Commit 8af59b0

Browse files
committed
add interactive gateway demo
Introduces demos/gateway/ with a step-by-step shell demo for the model-cli gateway command. Each step pauses for Enter, types commands character-by-character as 'docker model gateway', and covers health, auth, chat completions, streaming, embeddings, load balancing, fallbacks, and OpenAI SDK compatibility.
1 parent b48efb3 commit 8af59b0

File tree

4 files changed

+663
-0
lines changed

4 files changed

+663
-0
lines changed

demos/gateway/README.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# model-cli gateway demo
2+
3+
Demonstrates the `model-cli gateway` command — a lightweight,
4+
OpenAI-compatible LLM proxy that sits in front of Docker Model Runner
5+
(and other providers) and adds routing, load balancing, retries,
6+
fallbacks, and auth.
7+
8+
## Prerequisites
9+
10+
1. Docker Desktop with Model Runner enabled
11+
2. The `model-cli` binary built:
12+
```bash
13+
cd model-cli && cargo build --release
14+
```
15+
3. Models pulled:
16+
```bash
17+
docker model pull ai/smollm2
18+
docker model pull ai/gemma3
19+
docker model pull ai/qwen3:0.6B-Q4_0
20+
docker model pull ai/nomic-embed-text-v1.5
21+
```
22+
4. Python `openai` package (for step 11):
23+
```bash
24+
pip install openai
25+
```
26+
27+
## Run the demo
28+
29+
```bash
30+
./demos/gateway/demo.sh
31+
```
32+
33+
The script starts the gateway on `http://localhost:4000`, runs through
34+
every feature, then shuts the gateway down on exit.
35+
36+
## Files
37+
38+
| File | Purpose |
39+
|------|---------|
40+
| `config-basic.yaml` | Single-provider config with two models and bearer-token auth |
41+
| `config-advanced.yaml` | Multi-deployment config showing load balancing and fallbacks |
42+
| `demo.sh` | Full end-to-end demo script |
43+
44+
## What is demonstrated
45+
46+
| # | Feature | Config |
47+
|---|---------|--------|
48+
| 1 | Start gateway | basic |
49+
| 2 | `/health` endpoint | basic |
50+
| 3 | `/v1/models` — OpenAI-compatible model list | basic |
51+
| 4 | Auth rejection with wrong key (HTTP 401) | basic |
52+
| 5 | Non-streaming chat completion | basic |
53+
| 6 | Streaming chat completion (SSE) | basic |
54+
| 7 | Embeddings via chat model | basic |
55+
| 8 | Switch to advanced config | advanced |
56+
| 9 | Round-robin load balancing across two deployments | advanced |
57+
| 10 | Dedicated embedding model (`nomic-embed-text`) | advanced |
58+
| 11 | OpenAI Python SDK — zero code changes required | advanced |
59+
60+
## Config anatomy
61+
62+
```yaml
63+
model_list:
64+
# Alias the client uses Provider / actual model on DMR
65+
- model_name: fast-model
66+
params:
67+
model: docker_model_runner/ai/smollm2
68+
69+
# Second entry with same alias → round-robin load balancing
70+
- model_name: fast-model
71+
params:
72+
model: docker_model_runner/ai/qwen3:0.6B-Q4_0
73+
74+
- model_name: big-model
75+
params:
76+
model: docker_model_runner/ai/gemma3
77+
78+
general_settings:
79+
master_key: demo-secret # Bearer token required on all requests
80+
num_retries: 2 # retry up to 2 times before fallback
81+
fallbacks:
82+
- fast-model: [big-model] # automatic fallback chain
83+
```
84+
85+
## Manual curl examples
86+
87+
```bash
88+
GW="http://localhost:4000"
89+
KEY="demo-secret"
90+
91+
# Health
92+
curl "${GW}/health"
93+
94+
# List models
95+
curl -H "Authorization: Bearer ${KEY}" "${GW}/v1/models"
96+
97+
# Chat completion
98+
curl -X POST "${GW}/v1/chat/completions" \
99+
-H "Content-Type: application/json" \
100+
-H "Authorization: Bearer ${KEY}" \
101+
-d '{"model":"smollm2","messages":[{"role":"user","content":"Hello!"}]}'
102+
103+
# Streaming
104+
curl -N -X POST "${GW}/v1/chat/completions" \
105+
-H "Content-Type: application/json" \
106+
-H "Authorization: Bearer ${KEY}" \
107+
-d '{"model":"smollm2","messages":[{"role":"user","content":"Count to 5"}],"stream":true}'
108+
109+
# Embeddings
110+
curl -X POST "${GW}/v1/embeddings" \
111+
-H "Content-Type: application/json" \
112+
-H "Authorization: Bearer ${KEY}" \
113+
-d '{"model":"embeddings","input":["hello world"]}'
114+
```

demos/gateway/config-advanced.yaml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Advanced gateway config: load balancing, retries, and fallbacks
2+
#
3+
# Demonstrates three key gateway features:
4+
#
5+
# 1. LOAD BALANCING — two deployments under the same model alias.
6+
# The gateway round-robins across them automatically.
7+
#
8+
# 2. RETRIES — if a provider call fails, it will be retried up to
9+
# num_retries times before giving up (or falling back).
10+
#
11+
# 3. FALLBACKS — if "fast-model" exhausts all retries, the gateway
12+
# automatically promotes the request to "big-model".
13+
14+
model_list:
15+
# Two local DMR models registered under the same alias.
16+
# Requests for "fast-model" are round-robined across both.
17+
- model_name: fast-model
18+
params:
19+
model: docker_model_runner/ai/smollm2
20+
21+
- model_name: fast-model
22+
params:
23+
model: docker_model_runner/ai/qwen3:0.6B-Q4_0
24+
25+
# Larger fallback model
26+
- model_name: big-model
27+
params:
28+
model: docker_model_runner/ai/gemma3
29+
30+
# Embedding model
31+
- model_name: embeddings
32+
params:
33+
model: docker_model_runner/ai/nomic-embed-text-v1.5
34+
35+
general_settings:
36+
master_key: demo-secret
37+
num_retries: 2 # retry failing calls twice before fallback
38+
fallbacks:
39+
- fast-model: [big-model] # fast-model falls back to big-model

demos/gateway/config-basic.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Basic gateway config: single Docker Model Runner provider
2+
#
3+
# The gateway exposes a unified OpenAI-compatible API on :4000 and
4+
# forwards requests to Docker Model Runner running on the local engine.
5+
6+
model_list:
7+
- model_name: smollm2 # alias clients use in their requests
8+
params:
9+
model: docker_model_runner/ai/smollm2 # provider/model
10+
11+
- model_name: gemma3
12+
params:
13+
model: docker_model_runner/ai/gemma3
14+
15+
general_settings:
16+
master_key: demo-secret # clients must send: Authorization: Bearer demo-secret

0 commit comments

Comments
 (0)