server: OpenAI API and llama.cpp Web UI compatibility by kirav5 · Pull Request #120 · tilesprivacy/tiles

kirav5 · 2026-04-05T20:16:34Z

Pull request: OpenAI-compatible HTTP API + llama.cpp Web UI hooks

Summary

Extends the Tiles Python inference server so OpenAI-style clients and the ggml-org llama.cpp SvelteKit Web UI can talk to Tiles (MLX backend) over HTTP, while keeping the existing Tiles CLI flow (POST /start, memory-mode chat) intact.

Motivation

Web UI: Developers want to use the stock llama.cpp Web UI against Tiles without running llama-server; the UI expects GET /props, enriched /v1/models rows, optional POST /models/load, OpenAI chat SSE, and timings on the final stream chunk.
CLI parity: Session survives server restarts where appropriate; streaming responses still expose metrics for the Rust CLI’s memory-mode benchmark line.
Operations: Health endpoints and predictable error bodies for /v1/* improve scripting and UI error dialogs.

What changed

API (`server/api.py`, `server/main.py`)

Lifespan: On startup, restore last model from ~/.tiles/server_session.json if present; optional TILES_BOOTSTRAP_MODEL + TILES_BOOTSTRAP_MODEL_CACHE_PATH (and optional memory/system prompt env) to load without a prior /start.
POST /start: After successful load, persist session; shared _apply_loaded_session used by /start, restore, bootstrap, and /models/load.
POST /models/load: For the Web UI’s {"model": "<id>"} body; MLX directory from TILES_MODEL_CACHE_PATH or TILES_BOOTSTRAP_MODEL_CACHE_PATH.
GET /props, GET /v1/models, GET /health / GET /v1/health, router stubs POST /models/load (implemented) / POST /models/unload (stub).
POST /v1/chat/completions: Resolved model, optional input memory path, OpenAI-shaped 422/400/500 for /v1/*, CORS for browser dev.
Middleware: Request body is not read in logging middleware (avoids consuming stream and logging prompts).

Compatibility helpers

server/llamacpp_webui_compat.py: /props-shaped JSON and /v1/models entries with path, status, in_cache expected by the Web UI.

Session persistence

server/session_persist.py: JSON file under ~/.tiles/server_session.json (override with TILES_SESSION_FILE); disabled in tests via TILES_SKIP_SESSION_PERSIST=1. Atomic write (temp + replace).

Streaming / metrics (`server/backend/mlx.py`)

Final SSE chunk and non-streaming completion include:
- timings (predicted_n, predicted_ms) for llama Web UI token/s UI.
- metrics (ttft_ms, total_tokens, tokens_per_second, total_latency_s) for Tiles CLI memory mode.

Schemas (`server/schemas.py`)

ChatCompletionRequest / ChatMessage adjustments for Web UI (optional model, multimodal content list, extra="ignore", max_tokens: -1 → unlimited).
RouterModelsLoadBody for /models/load.

Tests

server/tests/test_openai_compat.py, server/tests/conftest.py: Cover health, props, models, chat stream, errors, /models/load with env path.
just test-server: uv run --project server --with pytest pytest server/tests/ so pytest is not added to uv.lock (keeps lockfile identical to main).

Dev script

scripts/phase2_llamacpp_webui.sh + just webui-llamacpp: Clone/patch llama.cpp Web UI Vite proxy to TILES_BACKEND (default http://127.0.0.1:6969), npm run dev.

How to test

cd /path/to/tiles
just test-server
just serve   # terminal 1
just webui-llamacpp   # terminal 2 — load model per Tiles docs (POST /start, session, bootstrap, or TILES_MODEL_CACHE_PATH + UI load)

Configuration reference

Variable	Purpose
`TILES_SESSION_FILE`	Override path for session JSON
`TILES_SKIP_SESSION_PERSIST=1`	Disable read/write (tests)
`TILES_BOOTSTRAP_MODEL` / `TILES_BOOTSTRAP_MODEL_CACHE_PATH`	Auto-load on startup if no session
`TILES_BOOTSTRAP_MEMORY_PATH` / `TILES_BOOTSTRAP_SYSTEM_PROMPT`	Optional bootstrap extras
`TILES_MODEL_CACHE_PATH`	MLX directory for Web UI `POST /models/load`

coderabbitai · 2026-04-05T20:16:48Z

Caution

Review failed

The head commit changed during the review from 3f9084d to 1b8be89.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kirav5 closed this Apr 5, 2026

kirav5 force-pushed the feat/openai-llama-webui-compat branch from 3f9084d to 1b8be89 Compare April 5, 2026 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: OpenAI API and llama.cpp Web UI compatibility#120

server: OpenAI API and llama.cpp Web UI compatibility#120
kirav5 wants to merge 0 commit intotilesprivacy:mainfrom
kirav5:feat/openai-llama-webui-compat

kirav5 commented Apr 5, 2026

Uh oh!

coderabbitai bot commented Apr 5, 2026 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kirav5 commented Apr 5, 2026

Pull request: OpenAI-compatible HTTP API + llama.cpp Web UI hooks

Summary

Motivation

What changed

API (server/api.py, server/main.py)

Compatibility helpers

Session persistence

Streaming / metrics (server/backend/mlx.py)

Schemas (server/schemas.py)

Tests

Dev script

How to test

Configuration reference

Uh oh!

coderabbitai bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

API (`server/api.py`, `server/main.py`)

Streaming / metrics (`server/backend/mlx.py`)

Schemas (`server/schemas.py`)

coderabbitai bot commented Apr 5, 2026 •

edited

Loading