Auto integration#286
Closed
easel wants to merge 386 commits into
Closed
Conversation
Update the integration manifest after merging the latest PR Luce-Org#274 head (adaptive anchor radius and PFLASH_COMPRESS env rename). Record a fresh PR Luce-Org#266 worktree conflict attempt and current blocked classifications.
Merge latest origin/main (d947c70) into the integration stack and record the current PR classification. PR Luce-Org#266 was attempted again in an isolated worktree and remains blocked pending selective harness/server porting; Codex/Claude delegated resolution is unavailable due auth.
…nch in-tree Collapses 134 commits of `integration/props-uv-squared-clean` onto current main as one reviewable change. Most of the underlying server-side work already landed via separate PRs: thinking-budget v2 + multi-dialect reasoning + sidecars (Luce-Org#269), dflash→server rename + optimizations/ grouping (Luce-Org#281), qwen35moe hybrid CPU/CUDA expert split (Luce-Org#262), and a stream of smaller fixes from bragi over the last week. What remained in integration is everything *above* the server: the host-side runner, the container image, the benchmark/profile evidence pipeline, the harness for driving real clients, and the luce-bench framework itself. ## What changed ### Docker + host wrapper - Dockerfile (CUDA 12.8 base; copies server/, lucebox/, harness/, luce-bench/ into one image; wires `python -m lucebench.cli` as the `benchmark` entrypoint subcommand). - `lucebox.sh` (~470 lines of host bash, zero deps beyond docker + nvidia-smi): `check`, `configure`, `pull`, `download-models`, `serve`, `install`/`start`/ `status`/`logs` (user-systemd), `print-run`, `benchmark`, `profile`. - `.github/workflows/docker.yml` builds + pushes `ghcr.io/luce-org/lucebox-hub` tags (`:cuda12`, `:vX.Y.Z-cuda12`, `:X.Y-cuda12`, `:sha-<short>-cuda12`). - `server/scripts/entrypoint.sh` resolves draft GGUF by target architecture (gemma4 → gemma drafter, qwen3.6 → dflash-draft-3.6); warns when multiple targets are present in models/. ### lucebox Python package (in-container CLI) - `lucebox/` workspace member: `cli.py`, `autotune.py` (VRAM-tiered tier selection + per-host config writeback), `config.py` (typed TOML), `download.py`, `docker_run.py`, `host_check.py`, `host_facts.py`, `profile.py` (profile sweep across DFLASH_MAX_CTX × DFLASH_BUDGET, KV cache types, pFlash modes, lazy-draft, prefix-cache slots), `smoke.py`, `types.py`. - `lucebox/tests/` for the typed surfaces. - Level1/Level2/Level3 profile gates; sweep results merged back into `~/.lucebox/config.toml` only after capability + ds4-eval/agentic-tools/ agentic-session validation gates pass. ### luce-bench in monorepo - `luce-bench/` workspace member at v0.2.4 — the standalone bench framework (areas: ds4-eval, code, longctx, agent, forge; sweep + per-host snapshot output; v0.2.4 includes the forge area's EvalConfig + run_scenario signature realignment). - `[tool.uv.sources] luce-bench = { workspace = true }` replaces the prior git tag pin. - `.github/workflows/release-luce-bench.yml` publishes to PyPI from the monorepo on `luce-bench-v*` tags (trusted publisher, `pypi` environment). ### harness workspace - `harness/` workspace member: client adapters (`claude_code`, `codex`, `opencode`, `hermes`, `pi`, `openclaw`), `client_test_runner.py`, `benchmarks/run_lucebox_vs_llamacpp.sh`, prompts. `lucebox profile` delegates the actual bench runs to harness. ### Bench + profile evidence - `server/docs/BENCHMARK_SNAPSHOT_SPEC.md` — schema for tuning/profile artifacts. Snapshots themselves live in the standalone `luce-bench-baselines` repo (out of this tree). ### Misc - Updated CI workflow path filters for `server/` (post-rename). - README's "Quick start" section, hardware coverage table, env var reference table; minor edits to optimizations READMEs. - model card sidecar updates landed alongside Luce-Org#269 but kept here at current values (qwen3.6, gemma-4-26b-a4b, gemma-4-31b, laguna-xs.2, `_schema.json`). ## Out of scope / follow-ups - 31b backend wiring beyond what `share/model_cards/gemma-4-31b-it.json` shipped (working empirically @ 24GB on sindri AR-only; 26b spec-decode path already proven). - gemma4 MoE expert split (howard0su's PR Luce-Org#262 territory; merged but not applied to gemma4 yet). - Multi-Token Prediction (upstream PR #23398, draft). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck to </assistant>
…ness + luce-bench in-tree
Integrates PR Luce-Org#266 into the auto-integration stack over easel/auto-integration. Resolves server/ layout conflicts by keeping the current server tree, retaining existing harness adapters, and packaging the PR's metrics parser/session proxy/tests under harness/src/harness for uv workspace imports.\n\nVerification:\n- python3 -m py_compile harness/client_test_runner.py harness/clients/session_inject_proxy.py harness/src/harness/metrics_parser.py harness/src/harness/tests/*.py\n- uv run --extra dev --package harness pytest harness/src/harness/tests -q (56 passed)\n- git diff --check\n- conflict marker scan (no conflict markers)\n- C++ target build skipped: server/build missing in worktree
…d dep (v0.2.6) UX fixes to `lucebench`: * Preflight (liveness → /v1/models → /props) runs before any cases. Aborts with a clean one-line error if the server is unreachable — no more 92 timeouts on a typo'd URL. * New `smoke` area: three tiny prompts that complete in seconds. * `--areas` replaces `--area` / `--sweep`. Default: `smoke`. `--areas all` runs every stdlib area (old `--sweep`). `--areas ds4-eval,forge` runs a subset. Old flags still accepted with deprecation notice. * `anthropic` SDK promoted from `[forge]` extra to a regular runtime dep. The forge area is no longer a special install path; future Anthropic-shape tests (smoke / longctx hitting `/v1/messages`) need the SDK too. `[forge]` extra preserved as an empty no-op alias for back-compat. Bumps luce-bench to v0.2.6. Drops `luce-bench[forge]` suffix from lucebox-hub root + harness deps — plain `luce-bench` resolves the SDK now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this, the version (`[lucebench] v0.2.6 ...`) was only on the area-header line which appears AFTER preflight + model resolution. That made stale-cache debugging painful — by the time the user sees which version is actually running, they've already scanned past four lines of output. New first line: `[lucebench] v0.2.6`. Stale uvx caches surface at a glance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port PR Luce-Org#62 onto the server/ layout in auto-integration.\n\nThe original PR split target/draft transient StepGraph cleanup to avoid gallocr reallocation churn after daemon resets. Current auto-integration already has a separate draft_sg, so apply the reset/migration cleanup and add the regression test under server/test.
…ea headers The new first-line banner (b46ed55) already shows `[lucebench] v0.2.6` at the top of every run. The sweep header and area header were each prefixing their lines with `v{__version__} ...` too — two copies of the version in any single-area run, three in a sweep. Banner is the single source. Sweep / area headers now lead with their actual context (`sweep name=...`, `area=...`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace hardcoded `version = "X.Y.Z"` strings with a `dynamic = ["version"]` declaration backed by hatch-vcs. The release version is now computed from the matching git tag (`luce-bench-v*` or `lucebox-v*`) at build time, with a build hook writing `src/<pkg>/_version.py` so `__init__.py` can re-export `__version__`. Untagged checkouts resolve to a `*.dev0` fallback, and commits past a tag get a `.devN+g<sha>` suffix so dev installs are visibly distinct from releases. Single source of truth for each package's version is now the git tag — drops the tag-vs-pyproject assertion step from the luce-bench release workflow (now redundant: hatch-vcs derives the version from the tag itself, so they can't disagree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Quick start leads with the smoke default (no Status section gating it) - Document the `smoke` area (3-case sanity check, runs by default) - Add OpenRouter smoke recipe - Add uvx-from-branch recipe for pre-release validation - Note the `[lucebench] vX.Y.Z` version banner - Replace deprecated `--sweep` with `--areas all` throughout - Drop the stale "Sister of luce-dflash" framing; point Contributing at the lucebox-hub monorepo instead of the soon-to-be-archived easel/luce-bench repo Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The preflight grid now renders the full model list on the `/v1/models` line, with a `*` prefix on the model that will actually be sent. Default selection is the first model; an explicit `--model X` that matches a listed id moves the `*`. When the list overflows the 80-char preflight width, layout is: the first model, then the selected model (if different), then sequential fillers until the budget is hit, ending in `… (+N more)`. Long lists (e.g. OpenRouter) stay readable instead of dumping hundreds of ids. Examples: /v1/models ✓ *deepseek-v4-flash, deepseek-v4-pro /v1/models ✓ qwen/qwen3.6-27b, *anthropic/claude-opus-4-7, … (+2 more) Collapses the now-redundant post-preflight resolution print loop into a single `--model default → 'X'` line; the preflight grid already shows what was on offer and which was picked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-facing preflight line and surrounding docstrings/comments now call the server 'lucebox' instead of 'dflash'. Examples: /props ✓ absent (non-lucebox server) — skipped Scope is intentionally narrow (CLI module only): DFLASH_* env vars, the DflashRuntime class + cfg.dflash TOML key, and binary names stay as-is — those are interface renames for a separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…decode split, token-count breakdown) When the server doesn't surface `usage.timings.decode_tokens_per_sec` (e.g. OpenRouter, vLLM), fall back to `completion_tokens / wall_seconds` and mark it with a trailing `*` so the "prefill rolled in" caveat stays visible. When lucebox-server provides `prefill_ms` + `decode_ms`, show them as `prefill=210ms decode=3.5s` instead of just the wall time. Capture `usage.completion_tokens_details.reasoning_tokens` (OpenAI/OR shape) plus the deprecated top-level `usage.reasoning_tokens` so the per-case line can render `in=42 think=120 out=8` — no tokenizer dependency, just what the server reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Docker image versioning now mirrors the Python packages' hatch-vcs
scheme: `lucebox-v<X.Y.Z>` git tags are the single source of truth.
docker-bake.hcl
Accept a `VERSION` env var. When set, `cuda12-local` emits both the
moving `lucebox-hub:cuda12` and the pinned `lucebox-hub:<version>-cuda12`.
Unset → just the moving tag (zero-config local builds still work).
Legacy `TAG=` override still honored for back-compat.
scripts/build_image.sh
Thin wrapper that runs `git describe --tags --match 'lucebox-v*'`
(same regex hatch-vcs uses), strips the `lucebox-v` prefix, exports
VERSION, and delegates to `docker buildx bake cuda12-local`. Handles
dirty trees + untagged checkouts gracefully.
.github/workflows/docker.yml
- Bare `cuda12` tag (was release-only) now also fires on pushes to
main and on workflow_dispatch with push:true. Branch builds + PRs
still get only branch-suffixed tags so they can't clobber `:cuda12`.
- New `type=match,pattern=lucebox-v(\d+\.\d+\.\d+)` rule auto-emits
`<version>-cuda12` when a `lucebox-v*` tag is pushed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge latest origin/main README/GPU asset updates into the integration stack and record this run's PR 237 conflict-resolution attempts.\n\nVerification: git diff --check; PR 237 merge attempted in isolated worktree with Claude Code and Codex assistance, conflicts retained for manual inspection.
The previous `$(git rev-parse --show-toplevel)/..` resolved to the parent directory of the repo, so `docker buildx bake` failed with "couldn't find a bake definition". Strip the stray `/..` and add a proper fallback for when the script is run outside a git checkout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ble-fire `a || b && c` left-associates: when `a` (git rev-parse) succeeds, the short-circuit skips `b` but `&& c` still runs, printing pwd a second time and corrupting the captured path. Wrap the fallback in a subshell so `&& pwd` only fires when git can't resolve the repo root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Record the 2026-05-28 20:23 cron revalidation: upstream and carried PR heads remain current, draft Luce-Org#304 is excluded, fresh conflicted-PR probes were retained, and a tmux-driven Codex inspection keeps Luce-Org#135 as a designed current-layout port instead of a mechanical merge.
Record the 2026-05-28 21:23 unattended refresh, including fresh open-PR fetches, ancestor checks, and conflict-probe worktrees.
Revalidates open PR classification after fetching origin/easel, records fresh isolated conflict probes for the unchanged blocked PR set, and updates validation details for the current unattended run.
Record the latest open PR classification, note that Luce-Org#297 is non-draft and already carried, and capture refreshed conflict probes plus Luce-Org#237 delegated feasibility results.
The post-push PR recheck showed Luce-Org#297 is draft again, so keep it as an already-carried draft dependency rather than a current non-draft target.
Revalidate open PR classifications and retained probe worktrees for the 2026-05-28 22:30 cron run. No source stack rewrite was needed.
Revalidate open PR heads against origin/main and easel/auto-integration. Record repeated conflict probes for remaining non-draft PRs and the tmux-driven Luce-Org#237 Codex salvage-port report.
Record the 2026-05-28 23:14 cron pass, repeated conflict probes, and the Codex salvage assessment for PR Luce-Org#221. No source stack rewrite was needed because origin/main and carried non-draft PR heads were already current.
Record the 2026-05-28 23:28 cron pass, repeated conflict probes, and the Codex salvage assessment for PR Luce-Org#135. No source stack rewrite was needed because origin/main and carried non-draft PR heads were already current.
Record the 2026-05-29 cron preflight, open PR classification, repeated conflicted merge probes, and retained worktree paths. No source stack rewrite was needed because origin/main and all carried mergeable non-draft PR heads were already included.
Note the overlapping manifest refresh observed during the cron run and correct the recorded parent commit for this manifest-only update.
Record the 2026-05-29 cron preflight, current PR classification, repeated conflicted merge probes, and fresh delegated feasibility results for the remaining selective-port candidates.
Record the 2026-05-29 00:41 cron preflight, open PR classification, repeat worktree merge probes, and fresh delegated PR Luce-Org#135 attempts. No source stack rewrite was needed because origin/main and carried mergeable PR heads were already included.
Record the 2026-05-29 01:00 EDT unattended integration run, fresh merge probes, and Codex feasibility output for PR Luce-Org#135.
…nch in-tree Squashes 78 commits from feat/lucebox-docker (PR Luce-Org#285) onto origin/main. Net: 189 files changed. Major workstreams folded in: * Docker prebuild stack: ghcr.io/easel/lucebox-hub:cuda12 image, multi-stage Dockerfile, docker-bake.hcl, .github/workflows/docker.yml with GHA cache, build identity baked into /opt/lucebox-hub/IMAGE_INFO + /opt/lucebox-hub/HOST_INFO. * Host wrapper (lucebox.sh): probe_host, smart cmd_serve (INVOCATION_ID guard, container-state preflight), cmd_systemctl_passthrough (already- active short-circuit, restart-loop detection), cmd_update (bootstrap- installer pattern), cmd_completion (bash/zsh/fish), config.toml reader (env > toml > default precedence), shellcheck-clean. * Bootstrap installer (install.sh): bakes LUCEBOX_INSTALLED_FROM into the installed copy so lucebox update keeps tracking the channel; refuses SHA-pinned URLs without LUCEBOX_INSTALL_CHANNEL. * In-container Python CLI (lucebox/): sparse config.toml persistence, config get/set/unset sub-app, models list/download sub-app (replaces download-models), autotune with --apply / --json / --sweep, profile collapsed onto luce-bench snapshot (1701 → 183 lines). * luce-bench: snapshot subcommand + canonical HostInfo schema v2 + levels (level0/1/2/3) + report subcommand + submit-baseline + regrade. * Server (C++): /props.host block + props_schema=4 + host_info read at startup, /props.build identity, GGUF metadata + sha256 sidecars, model card sidecars. * Harness: client implementations for claude/codex/opencode/hermes/pi. * Strict 11-field config.toml allowlist for dflash.* runtime tunables. Deleted (rolled into new structure): * server/scripts/bench_agent.py, bench_he.py, bench_llm.py — replaced by luce-bench snapshot + areas. * lucebox configure, lucebox download-models, lucebox benchmark — replaced by config sub-app, models sub-app, autotune --sweep. * luce-bench --sweep flag — moved to argv-sniff subcommand dispatch. Conflict resolution: * server/scripts/bench_{agent,he,llm}.py — modify/delete kept the deletion (feat/lucebox-docker moved bench machinery into luce-bench). * README.md — took feat-branch version. origin/main had 19 commits worth of minor README tweaks since the branch base; those need to be folded back in as a follow-up PR. * docs/specs/openapi-props.yaml + docs/specs/props-endpoint.md — took feat-branch version. origin/main had 1 link-fix commit; feat-branch has the schema-4 + host-block additions that strictly supersede. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_load_or_build()` returned `config_mod.load()`'s result verbatim when config.toml existed, ignoring `LUCEBOX_*` env vars entirely. That contradicted the precedence lucebox.sh documents (env > toml > default) and bit sindri in production: its config.toml had `[image]` without a `registry` line, so the dataclass default `ghcr.io/luce-org/lucebox-hub` beat the systemd unit's `Environment=LUCEBOX_IMAGE=ghcr.io/easel/...`. Symptom: `lucebox start` brought up the wrong (stale luce-org) image even after explicit `lucebox install` + `lucebox pull` against easel. Fix: overlay env on top of whatever `load()` returns (or `live_config()` falls back to). Only the five top-level scalars have env hooks (LUCEBOX_VARIANT/IMAGE/PORT/CONTAINER/MODELS) — dflash/host/model intentionally don't. Adds two regression tests: - env beats config.toml when toml has no explicit value for that key, - env still wins when toml is absent (covers the live_config fallback). 102 lucebox tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g#285 CI CI's "Lint Python surfaces touched by lucebox tooling" job ran `ruff check .` and found 11 errors across surfaces this branch touches. Ruff --fix handled 6 (import sorting, unused imports); 5 needed hand-edits: luce-bench/src/lucebench/report.py:172 E741 rename `for l in` → `for lineup in` lucebox/tests/test_check.py:39, 95 E731 lambda → def stub() for the two HostFacts stubs lucebox/tests/test_cli.py:95 E501 wrap the LUCEBOX_HOST_GPU_LIST_CSV setenv lucebox/tests/test_sweep.py:174, 177 E501 wrap two CellResult constructors 22 lucebox tests touched still pass; ruff is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Record the 2026-05-29 01:18 cron refresh, repeated conflict probes, and the fresh tmux-driven PR Luce-Org#237 Claude/Codex delegation results.
Merge PR Luce-Org#285 after it changed from draft to open during the cron run. Resolve refreshed Docker/lucebox/luce-bench conflicts by taking the PR head for feature files while preserving the server include required by the existing integration stack.\n\nValidation:\n- git diff --check\n- python3 -m compileall -q lucebox/src lucebox/tests luce-bench/src luce-bench/tests harness/src\n- uv run --with pytest python -m pytest lucebox/tests luce-bench/tests/test_report.py luce-bench/tests/test_smoke_area.py luce-bench/tests/test_runner.py -q
Keep the primary checkout clean after integrating PR Luce-Org#285 by ignoring the generated .docker-build/ CMake scratch directory. Update the auto-integration manifest with the final PR Luce-Org#285 merge and validation details.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 29, 2026
Record that PR Luce-Org#286 is closed and report integration branch status directly from easel/auto-integration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.