Skip to content

server: OpenAI API and llama.cpp Web UI compatibility#120

Closed
kirav5 wants to merge 0 commit intotilesprivacy:mainfrom
kirav5:feat/openai-llama-webui-compat
Closed

server: OpenAI API and llama.cpp Web UI compatibility#120
kirav5 wants to merge 0 commit intotilesprivacy:mainfrom
kirav5:feat/openai-llama-webui-compat

Conversation

@kirav5
Copy link
Copy Markdown

@kirav5 kirav5 commented Apr 5, 2026

Pull request: OpenAI-compatible HTTP API + llama.cpp Web UI hooks

Summary

Extends the Tiles Python inference server so OpenAI-style clients and the ggml-org llama.cpp SvelteKit Web UI can talk to Tiles (MLX backend) over HTTP, while keeping the existing Tiles CLI flow (POST /start, memory-mode chat) intact.

Motivation

  • Web UI: Developers want to use the stock llama.cpp Web UI against Tiles without running llama-server; the UI expects GET /props, enriched /v1/models rows, optional POST /models/load, OpenAI chat SSE, and timings on the final stream chunk.
  • CLI parity: Session survives server restarts where appropriate; streaming responses still expose metrics for the Rust CLI’s memory-mode benchmark line.
  • Operations: Health endpoints and predictable error bodies for /v1/* improve scripting and UI error dialogs.

What changed

API (server/api.py, server/main.py)

  • Lifespan: On startup, restore last model from ~/.tiles/server_session.json if present; optional TILES_BOOTSTRAP_MODEL + TILES_BOOTSTRAP_MODEL_CACHE_PATH (and optional memory/system prompt env) to load without a prior /start.
  • POST /start: After successful load, persist session; shared _apply_loaded_session used by /start, restore, bootstrap, and /models/load.
  • POST /models/load: For the Web UI’s {"model": "<id>"} body; MLX directory from TILES_MODEL_CACHE_PATH or TILES_BOOTSTRAP_MODEL_CACHE_PATH.
  • GET /props, GET /v1/models, GET /health / GET /v1/health, router stubs POST /models/load (implemented) / POST /models/unload (stub).
  • POST /v1/chat/completions: Resolved model, optional input memory path, OpenAI-shaped 422/400/500 for /v1/*, CORS for browser dev.
  • Middleware: Request body is not read in logging middleware (avoids consuming stream and logging prompts).

Compatibility helpers

  • server/llamacpp_webui_compat.py: /props-shaped JSON and /v1/models entries with path, status, in_cache expected by the Web UI.

Session persistence

  • server/session_persist.py: JSON file under ~/.tiles/server_session.json (override with TILES_SESSION_FILE); disabled in tests via TILES_SKIP_SESSION_PERSIST=1. Atomic write (temp + replace).

Streaming / metrics (server/backend/mlx.py)

  • Final SSE chunk and non-streaming completion include:
    • timings (predicted_n, predicted_ms) for llama Web UI token/s UI.
    • metrics (ttft_ms, total_tokens, tokens_per_second, total_latency_s) for Tiles CLI memory mode.

Schemas (server/schemas.py)

  • ChatCompletionRequest / ChatMessage adjustments for Web UI (optional model, multimodal content list, extra="ignore", max_tokens: -1 → unlimited).
  • RouterModelsLoadBody for /models/load.

Tests

  • server/tests/test_openai_compat.py, server/tests/conftest.py: Cover health, props, models, chat stream, errors, /models/load with env path.
  • just test-server: uv run --project server --with pytest pytest server/tests/ so pytest is not added to uv.lock (keeps lockfile identical to main).

Dev script

  • scripts/phase2_llamacpp_webui.sh + just webui-llamacpp: Clone/patch llama.cpp Web UI Vite proxy to TILES_BACKEND (default http://127.0.0.1:6969), npm run dev.

How to test

cd /path/to/tiles
just test-server
just serve   # terminal 1
just webui-llamacpp   # terminal 2 — load model per Tiles docs (POST /start, session, bootstrap, or TILES_MODEL_CACHE_PATH + UI load)

Configuration reference

Variable Purpose
TILES_SESSION_FILE Override path for session JSON
TILES_SKIP_SESSION_PERSIST=1 Disable read/write (tests)
TILES_BOOTSTRAP_MODEL / TILES_BOOTSTRAP_MODEL_CACHE_PATH Auto-load on startup if no session
TILES_BOOTSTRAP_MEMORY_PATH / TILES_BOOTSTRAP_SYSTEM_PROMPT Optional bootstrap extras
TILES_MODEL_CACHE_PATH MLX directory for Web UI POST /models/load

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

Caution

Review failed

The head commit changed during the review from 3f9084d to 1b8be89.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kirav5 kirav5 closed this Apr 5, 2026
@kirav5 kirav5 force-pushed the feat/openai-llama-webui-compat branch from 3f9084d to 1b8be89 Compare April 5, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant