Skip to content

refactor: extract mistralrs-serve crate, add lazy loading, model discovery, and UI improvements#2066

Draft
camalolo wants to merge 17 commits intoEricLBuehler:masterfrom
camalolo:webui-improvements
Draft

refactor: extract mistralrs-serve crate, add lazy loading, model discovery, and UI improvements#2066
camalolo wants to merge 17 commits intoEricLBuehler:masterfrom
camalolo:webui-improvements

Conversation

@camalolo
Copy link
Copy Markdown

@camalolo camalolo commented Apr 6, 2026

Summary

  • Extract mistralrs-serve crate: Move HTTP server logic, web UI, and static assets out of mistralrs-cli into a dedicated library crate. The CLI now delegates to mistralrs_serve::run_server() via plain Rust config structs (no clap/serde in the public API).
  • Idle timeout + pending models: Add configurable idle auto-unload (--idle-timeout-secs) and Pending model status. Models show as pending until first request triggers lazy loading.
  • Model discovery + watcher: Add filesystem-based model scanning (--models-dir) with notify-based hot-reload watcher. New models appear automatically without restart.
  • UI redesign + disk override: Redesigned web UI with dark theme, split-pane layout, model selector with status badges, and disk-based UI override (<exe_dir>/ui/) for live customization without recompilation.
  • Model list + status + refresh: /v1/models returns all discovered models with pending/loaded status. GET /ui/api/refresh_models triggers on-demand rescan.

Test plan

  • cargo check --workspace passes on every commit
  • cargo build --release -p mistralrs-cli --features cuda succeeds
  • Service runs via NSSM with mistralrs serve --ui --idle-timeout-secs 1800 --models-dir <dir> -p 1234
  • GET /v1/models returns discovered models with status
  • POST /v1/chat/completions triggers lazy load and model transitions to loaded
  • POST /ui/api/stop returns 200
  • GET /ui/api/refresh_models triggers rescan
  • Web UI loads and renders correctly at /
  • Rebased onto latest origin/master (includes upstream tool dispatch + docs changes)

camalolo added 10 commits April 7, 2026 00:25
Move HTTP server logic, web UI, and static assets from mistralrs-cli
into a new mistralrs-serve library crate. This isolates UI compilation
from the CLI binary which depends on the large mistralrs-core crate.

- Add mistralrs-serve crate with config structs, serve function, and UI
- mistralrs-cli now delegates to mistralrs_serve::run_server()
- Helper functions (convert_to_model_selected, extract_*_settings) stay
  in CLI as pub(crate) since they're shared with run/tune/config commands
- Update deploy.ps1 to remove UI file copy step (now embedded via include_dir)
- Add comfy-table to workspace dependencies
Add automatic model unloading after a configurable idle period, and
support for pending models that can be lazily loaded on first request.

- Add idle_timeout_secs to EngineConfig and MistralRsBuilder
- Implement PendingModelRequest, register_pending_model, do_load_pending_model
- Add build_empty() to mistralrs-server-core for lazy initialization
- Wire idle timeout and pending model loading through the server
Automatically scan configured model directories for GGUF/Safetensors files,
watch for filesystem changes, and lazily load models on first request.

- Add model_scanner.rs and model_watcher.rs for filesystem-based discovery
- Implement run_server_lazy() for deferred model loading
- Add register_pending_model() to MistralRs for lazy loading support
- Pass paged attention and ISQ config to pending models
- Fix obsolete tool_callbacks_with_tools field references
Complete UI overhaul with modern design and deployment flexibility.

- Redesign web UI with improved chat interface, syntax highlighting, and Enter key handling
- Add disk-based UI override: serve static files from a configurable UI directory
- Add stop generation endpoint for interrupting in-progress responses
- Add per-chat delete functionality
- Add straggler message cleanup on reconnection
Display discovered models in the sidebar with real-time status indicators.

- Show all models (loaded, unloaded, pending) in the model list
- Add loading indicator for pending models being lazily loaded
- Add refresh_models endpoint to rescan model directories
- Fix paged attention config passing for pending models
- Remove duplicate Pending match arm in get_model_status handler
build.bat sets up the MSVC toolchain and invokes cargo with CUDA features.
nvcc_wrapper.bat injects --compiler-options /Zc:preprocessor to work around
CCCL's fatal error with MSVC's traditional preprocessor (required for
CUDA 13.2 + VS 2022 v1950).

sexy girl
The disk-based UI override (ui/ directory next to the executable) was
undocumented. Added notes in CLI.md, GETTING_STARTED.md, and CLI_CONFIG.md.
Add --models-dir and --idle-timeout-secs to CLI.md, CLI_CONFIG.md, and
GETTING_STARTED.md with usage examples and a dedicated section explaining
the auto-discovery and lazy loading workflow.
Add the `pending` model status value to HTTP.md and multi_model/overview.md,
and document the idle timeout auto-unload behavior in both files.
@camalolo camalolo marked this pull request as draft April 7, 2026 07:29
camalolo added 7 commits April 7, 2026 16:55
Document the CCCL + MSVC preprocessor issue on Windows with CUDA 13.2
and the nvcc wrapper approach as the proper fix. Add missing crates to
repository structure. Ignore local build tooling (build.bat, nvcc_wrapper.bat,
deploy.ps1) in .gitignore.

Romantic date with Eric

Assisted-by: GLM-5 Turbo via Crush <crush@charm.land>
Correct service manager from Servy to NSSM, add UAC elevation commands,
fix UI URL note (no trailing slash), and remove stale log/config references.

💕 Generated with Crush

Assisted-by: GLM-5 Turbo via Crush <crush@charm.land>
Complete UI overhaul: sidebar layout, theme toggle, syntax highlighting,
HTML sanitization, lazy model loading via refresh_models API endpoint
returning full model objects with status and generation defaults.

💕 Generated with Crush

Assisted-by: GLM-5 Turbo via Crush <crush@charm.land>
…n status

- Add try/catch to initApp() with visible error banner and retry logic
- Add exponential backoff WebSocket reconnection (1s-30s, 10 max attempts, jitter)
- Add connection status indicator (connected/reconnecting/disconnected) in topbar
- Add res.ok check in refreshModels() before parsing JSON
- Fix findBlankChat() N+1 requests by including message_count in list_chats API
- Click-to-arm delete: first click shows red ✓, second click confirms
- Auto-reverts after 3s if not confirmed
- Applies to both per-chat X buttons and sidebar Delete button
- Fix chat list overflow eating into Generation Settings
- Start Generation Settings collapsed by default
…t disconnects

- content.rs: propagate unknown architecture errors via ? instead of .unwrap()
- utils/mod.rs: replace 4 .unwrap() calls in handle_pipeline_forward_error! macro with graceful error handling (log warning on send failure or cache eviction failure)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant