server: OpenAI API and llama.cpp Web UI compatibility#120
Closed
kirav5 wants to merge 0 commit intotilesprivacy:mainfrom
Closed
server: OpenAI API and llama.cpp Web UI compatibility#120kirav5 wants to merge 0 commit intotilesprivacy:mainfrom
kirav5 wants to merge 0 commit intotilesprivacy:mainfrom
Conversation
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3f9084d to
1b8be89
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request: OpenAI-compatible HTTP API + llama.cpp Web UI hooks
Summary
Extends the Tiles Python inference server so OpenAI-style clients and the ggml-org llama.cpp SvelteKit Web UI can talk to Tiles (MLX backend) over HTTP, while keeping the existing Tiles CLI flow (
POST /start, memory-mode chat) intact.Motivation
llama-server; the UI expectsGET /props, enriched/v1/modelsrows, optionalPOST /models/load, OpenAI chat SSE, andtimingson the final stream chunk.metricsfor the Rust CLI’s memory-mode benchmark line./v1/*improve scripting and UI error dialogs.What changed
API (
server/api.py,server/main.py)~/.tiles/server_session.jsonif present; optionalTILES_BOOTSTRAP_MODEL+TILES_BOOTSTRAP_MODEL_CACHE_PATH(and optional memory/system prompt env) to load without a prior/start.POST /start: After successful load, persist session; shared_apply_loaded_sessionused by/start, restore, bootstrap, and/models/load.POST /models/load: For the Web UI’s{"model": "<id>"}body; MLX directory fromTILES_MODEL_CACHE_PATHorTILES_BOOTSTRAP_MODEL_CACHE_PATH.GET /props,GET /v1/models,GET /health/GET /v1/health, router stubsPOST /models/load(implemented) /POST /models/unload(stub).POST /v1/chat/completions: Resolvedmodel, optionalinputmemory path, OpenAI-shaped 422/400/500 for/v1/*, CORS for browser dev.Compatibility helpers
server/llamacpp_webui_compat.py:/props-shaped JSON and/v1/modelsentries withpath,status,in_cacheexpected by the Web UI.Session persistence
server/session_persist.py: JSON file under~/.tiles/server_session.json(override withTILES_SESSION_FILE); disabled in tests viaTILES_SKIP_SESSION_PERSIST=1. Atomic write (temp + replace).Streaming / metrics (
server/backend/mlx.py)timings(predicted_n,predicted_ms) for llama Web UI token/s UI.metrics(ttft_ms,total_tokens,tokens_per_second,total_latency_s) for Tiles CLI memory mode.Schemas (
server/schemas.py)ChatCompletionRequest/ChatMessageadjustments for Web UI (optionalmodel, multimodalcontentlist,extra="ignore",max_tokens: -1→ unlimited).RouterModelsLoadBodyfor/models/load.Tests
server/tests/test_openai_compat.py,server/tests/conftest.py: Cover health, props, models, chat stream, errors,/models/loadwith env path.just test-server:uv run --project server --with pytest pytest server/tests/so pytest is not added touv.lock(keeps lockfile identical tomain).Dev script
scripts/phase2_llamacpp_webui.sh+just webui-llamacpp: Clone/patch llama.cpp Web UI Vite proxy toTILES_BACKEND(defaulthttp://127.0.0.1:6969),npm run dev.How to test
Configuration reference
TILES_SESSION_FILETILES_SKIP_SESSION_PERSIST=1TILES_BOOTSTRAP_MODEL/TILES_BOOTSTRAP_MODEL_CACHE_PATHTILES_BOOTSTRAP_MEMORY_PATH/TILES_BOOTSTRAP_SYSTEM_PROMPTTILES_MODEL_CACHE_PATHPOST /models/load