feat: unified ModelRegistry keyed on (backend, provider) by beardedeagle · Pull Request #34 · beardedeagle/MonkeyClaw

beardedeagle · 2026-04-05T23:31:57Z

Summary

Drop-and-replace cutover of ModelRegistry from provider-keyed to (backend, provider)-keyed
Adds Baseline runtime-config loader for cold-start model availability
Adds probe tick handler with per-backend scheduling, in-flight dedup, and exponential backoff
Adds authenticated session hook for fresh model lists on every session start
Adds ETS heir for crash-restart continuity
Redacts Provider log sites through Vault.SecretScanner (spec I8)
Wires vault_live.ex stubs to real ModelRegistry API
Zero references to the old list_models/1 / list_all_models/0 API remain (grep-verified)

Spec: docs/superpowers/specs/2026-04-05-list-models-per-backend-design.md
Plan: docs/superpowers/plans/2026-04-05-list-models-per-backend.md

Test plan

Unit tests for CachedModel changeset (required fields, length caps, charset, embed list cap, per-model validations)
Unit tests for Baseline (config read, structural validation, log on drop)
Integration tests for ModelRegistry (boot, upsert funnel, precedence, read projections, tick, probe results, refresh, configure)
Integration tests for Session hook (authenticated cast, unregistered pid rejection)
Integration test for Provider log redaction
E2E test for full tree boot + crash-restart continuity via ETS heir
Zero-reference grep audit passes
All five quality gates clean (compile, format, credo --strict, dialyzer, 1945 tests/0 failures)

Spec covers unified (backend, provider) model registry with three-writer funnel (baseline, probe, session) and ETS-first reads. Plan decomposes implementation into 23 bite-sized TDD tasks enforcing a full drop-and- replace cutover (greenfield, zero legacy code preserved).

Drops the provider-keyed single-model-per-row table and creates the (backend, provider)-keyed shape with embedded JSON model list, source tag, and monotonic tiebreaker column. Greenfield cutover, zero data preserved. Unique index on (backend, provider) plus individual indexes on backend and provider for the by-backend and by-provider read paths.

Replaces the old flat model row with a (backend, provider)-keyed row holding an embedded list of models. Adds source tag and refreshed_mono columns for write arbitration. Deletes the old model_registry_test.exs (exercised the old API); replaced by cached_model_test.exs here and model_registry_test.exs rewritten in subsequent tasks. Changeset stub only — full validation chain lands in task 3. Also replaces lib/monkey_claw/model_registry.ex with a minimal cutover stub that preserves the public API surface consumed by vault_live.ex (list_all_models/0, refresh_all/0) so the project compiles through Tasks 3-10. Task 11 deletes the stub and writes the real GenServer.

Renames "has no legacy top-level model fields" to "rejects top-level model fields from the prior shape". The CLAUDE.md project standard forbids the word "legacy" in code, comments, and docs — this is a greenfield project with zero users, nothing is legacy. The banned word slipped in via the plan's verbatim test text; fixed in-place so future Task 3 changeset validation tests don't pattern-copy the violation.

@type

Replaces the Task 2 cast-only stub with full validate_required on all five top-level fields (backend, provider, source, refreshed_at, refreshed_mono), required: true on the embedded models list, and validate_required(:model_id, :display_name) in model_changeset/2. Also fixes a pre-existing dialyzer unknown_type error in the @type t definition where Model.t() was unresolved — corrected to __MODULE__.Model.t() to reference the inline embedded schema module. 10 tests, 0 failures (2 schema shape + 8 required-field).

Add trust-boundary invariants from spec §Schema to CachedModel.changeset/2: - validate_format on :backend and :provider enforcing ^[a-z][a-z0-9_]*$ - validate_length on :backend and :provider capping at 64 bytes - validate_inclusion on :source restricting to baseline/probe/session Exposes allowed_sources/0 as a typed public helper so callers can enumerate valid source values without duplicating the enum. Module attributes @identifier_pattern, @max_identifier_length, and @allowed_sources centralise the constraint values. Tests: 10 new value-constraint tests (6 rejection, 3 source acceptance, 1 empty backend) added to a dedicated describe block using @constraint_attrs to avoid redefining the existing @valid_attrs module attribute.

Caps models list at 500, enforces 256-byte length and UTF-8 charset on model_id/display_name, requires allowed punctuation pattern, caps capabilities serialized size at 8 KiB. Closes out the trust-boundary invariant set from spec section Schema.

The existing "accepts model_id with unicode letters and allowed punctuation" test actually used ASCII only ("claude-sonnet-4.5:preview"), which exercised the :.- punctuation branch of @model_field_pattern but not the \p{L} (unicode letter) category. A regression that narrowed the pattern to [a-zA-Z0-9...] would have gone undetected. Splits into two tests: - "accepts ASCII allowed punctuation" — documents the punctuation intent - "accepts non-ASCII unicode letters" — uses Japanese (日本) + Greek (α) + Cyrillic (модель) to actually exercise \p{L} Keeps the trust-boundary spec §Schema promise that non-ASCII model identifiers are supported.

@type

- Add test covering the non-map fallback clause of valid_entry?/1 (strings, tuples, atoms, nil). Previously the clause was implemented but had zero test coverage. - Add comment explaining @type entry is deliberately loose at this layer because CachedModel.changeset/2 is the real trust boundary. Deferred: load!/0 bang convention concern. Plan explicitly dictates {:ok, _} return shape for parallelism with other registry calls, and future Tasks 11/13/14 will cascade from this name.

New callback takes opts map (not session pid) so the registry can call it without a live session. Each model_attrs includes :provider so multi-provider backends can fan out per spec D2/D3. Adapters implemented in follow-up commits — compile will be red until then.

Mirrors the pattern from baseline.ex:33 — the capabilities field is deliberately `map()` at the adapter boundary because CachedModel.changeset/2 is the real trust boundary that validates and caps capability content. Adding this comment prevents a future maintainer from tightening the type without understanding the layering. Follow-up to review feedback on 3f141e9. The separate concern — name collision with ModelRegistry.Provider.model_attrs — is deferred to Task 14 where the upsert funnel will reconcile both shapes.

Fix two gaps flagged in Task 9 code review: - Add `assert models != []` so the test fails if a regression reduces the default canned list to an empty list (Enum.all?/2 returns true on [], which would otherwise silently accept the regression). - Extend the match pattern to include `:capabilities`, covering all four required fields of Backend.model_attrs() so the test enforces the full contract instead of three of four fields. Follow-up to 8d27efa. The related Low-priority note about timing the deadline short-circuit is deferred to Task 16 where probe latency handling is exercised end-to-end.

…reads Full drop-and-replace of the old provider-keyed GenServer. New state struct per spec (backends, intervals, in-flight, backoff), new read API (list_for_backend, list_for_provider, list_all_by_*). Upsert funnel, probe logic, and supervision integration land in follow-up commits. Existing vault_live.ex consumer is still on the old API and will be cut over in a later task.

EtsHeir owns the :monkey_claw_model_registry ETS table with itself as heir and gives ownership to ModelRegistry on start via give_away/3. On registry crash, ownership returns to the heir; supervisor restarts the registry, which re-claims the table. Reads observe continuity across the restart.

- Test now exercises full crash-and-reclaim round-trip (I-1) using :permanent restart and ETS owner assertion - Log ETS-TRANSFER events on both heir and registry sides (I-2) - EtsHeir.init/1 verifies ownership on re-init (I-3) - EtsHeir.handle_call({:claim,_}) ownership pre-check (M-1) - @claim_timeout_ms module attribute for the claim wait (M-2) - Comment on shared :start_model_registry lifecycle flag (M-3) - EtsHeir catch-all handle_info logs Logger.debug (M-4) - Add @SPEC to EtsHeir.init/1 (M-5)

Init claims ETS via heir, loads rows from SQLite, seeds any baseline entries not already present (warm-start delta). On SQLite failure, falls back to baseline-only ETS and enters degraded state.

- load_sqlite_rows rescue narrowed to environmental DB failures only (I-1, house rule compliance) - seed_baseline_delta returns :ok; caller keeps state reference (I-3) - Remove dead Sandbox.allow calls from boot sequence tests; DataCase provides shared-mode (I-4)

Single write funnel for all three writers (baseline/probe/session). Validates every write through CachedModel.changeset, drops invalid entries with a log, runs the valid set in one transaction with (refreshed_at, refreshed_mono) precedence, and updates ETS for winning rows only. Multi-provider backends fan out naturally because callers pass one write per (backend, provider) group.

…anups Stage-2 review fixes for Task 14 (b8f98ce). IMPORTANT: Move :ets.insert out of Repo.transaction closure. ETS is not transactional — if a later row raised mid-batch, SQLite would roll back while ETS kept the phantom rows. ETS now only writes after the transaction commits, guaranteeing ETS rows always correspond to SQLite rows. MINOR: Narrow @SPEC upsert/1 to {:ok, [CachedModel.t()]} — the {:error, _} branch was unreachable since apply_upserts never calls Repo.rollback. Strict {:ok, applied} match now documents actual control flow. MINOR: Simplify validate_writes/1 — direct destructure instead of then/2 indirection. MINOR: Use Map.get/2 in validate_writes error log — survives malformed write maps without a second crash point. MINOR: Replace System.monotonic_time() + 1 in upsert test with Process.sleep/1 + re-sample. Monotonic integers are opaque per Erlang docs; arithmetic on them is discouraged.

…light dedup First tick fires at startup_delay_ms (default 5s), subsequent ticks at default_interval_ms. Each tick iterates configured backends, skipping those with an in-flight probe task or that aren't due per their personal interval. Probe tasks dispatch via MonkeyClaw.TaskSupervisor with async_nolink. Task result handling lands in the next task.

The State struct declared startup_delay_ms: 5_000 as a literal default, duplicating the @startup_delay_ms_default 5_000 module attribute. Since init/1 always sets startup_delay_ms via Keyword.get(opts, :startup_delay_ms, @startup_delay_ms_default), the literal in defstruct was dead and a DRY violation. Promote startup_delay_ms to @enforce_keys so the State invariant is enforced structurally, and drop the duplicate default. init/1 remains the single source of truth for the default value.

…koff Successful probe results group by provider, fan into per-(backend, provider) writes, and hit the upsert funnel. Reset backoff and mark the backend as probed on success. Error results and task crashes trigger exponential backoff (5s initial, doubling to 5m cap) via apply_backoff/2, which bumps last_probe_at forward so the next tick skips the backend for the backoff duration. Registry stays alive on any backend failure — stale cache preserved. Process.demonitor(ref, [:flush]) on successful task results prevents the trailing :DOWN message from being mishandled. Also fix a pre-existing race in the ETS heir crash test: the assertion now uses wait_for_ets_owner/3 to poll until give_away completes before asserting ownership, eliminating the flake.

Address four IMPORTANT findings from Stage-2 code quality review of the ModelRegistry probe result handling: - Add catch-all clause to handle_probe_result/3 so malformed adapter results log and apply backoff instead of crashing the GenServer with FunctionClauseError. Runtime-injected backends are a trust boundary per CLAUDE.md. - Add explicit {:ok, []} clause that short-circuits empty probe results without opening a pointless SQLite transaction. Makes "empty = healthy" semantics visible to readers. - Validate default_interval_ms and backend_intervals at init time. Non-positive values now raise ArgumentError with a clear message (fail loudly per CLAUDE.md). - Replace :timer.sleep with a wait_until/3 polling helper in the three new probe result tests, mirroring the wait_for_ets_owner/3 pattern introduced in the same commit. Tests are now deterministic and fail fast with meaningful messages instead of depending on 100-200ms wall clock budgets. Also add the missing Baseline-clearing setup to the new describe block so tests are independent of describe execution order.

refresh/1 dispatches a synchronous probe via Task.Supervisor.async_nolink and blocks the caller until the probe completes or times out. The GenServer.call timeout is @per_backend_refresh_timeout_ms + 1_000 so the task deadline is always hit before the call times out. refresh_all/0 iterates all configured backends sequentially with a timeout scaled to the backend count (per spec I5), preventing spurious call timeouts on large backend sets. Both operations delegate to handle_probe_result/3 for successful and error paths so backoff, cache, and probed-at state stay consistent with the async tick path. Task.yield + Task.shutdown(:brutal_kill) cleans up the monitor and flushes any trailing :DOWN message.

The rescue clause only caught DBConnection.ConnectionError and Exqlite.Error. During supervisor restart after Process.exit(:kill), the Ecto sandbox can raise other exception types (e.g. DBConnection.OwnershipError) when the killed process's checkout is reclaimed. The unhandled exception caused the GenServer to restart- loop until the supervisor gave up, failing the ETS heir crash survival test on the Elixir 1.18 + OTP 27 CI matrix. Adds a fallback rescue clause with a warning log. The caller (load_existing_and_seed_baseline) already handles {:error, _} by falling back to baseline-only ETS in degraded mode, which is strictly better than a restart loop.

Copilot

Pull request overview

This PR rewrites MonkeyClaw.ModelRegistry from a provider-keyed cache into a unified (backend, provider)-keyed registry with multiple writers (baseline/probe/session/refresh), ETS crash continuity, and provider-log redaction, and updates consumers/tests to the new read APIs.

Changes:

Replaces the cached model schema with one row per (backend, provider) containing an embedded model list + precedence metadata (refreshed_at/refreshed_mono).
Adds baseline loading, tick-driven probing with backoff/in-flight tracking, session-start model hook, and ETS heir ownership transfer.
Redacts provider error logging via Vault.SecretScanner and wires VaultLive to the new registry read/refresh APIs.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/support/test_backend.ex	Adds `list_models/1` to the test backend adapter with programmable responses/delay/crash behavior.
test/monkey_claw/model_registry/provider_log_redaction_test.exs	New tests asserting Provider log sanitization redacts secret-shaped content.
test/monkey_claw/model_registry/cached_model_test.exs	New changeset/schema unit tests for the rewritten embedded-model CachedModel shape.
test/monkey_claw/model_registry/baseline_test.exs	New tests for baseline runtime config loading and structural validation/drop behavior.
test/monkey_claw/model_registry_test.exs	Rewritten integration suite for the new registry lifecycle, upsert funnel, probing, refresh, configure, and ETS heir behavior.
test/monkey_claw/model_registry_e2e_test.exs	New E2E test covering baseline availability, refresh, projections, and crash-restart continuity.
test/monkey_claw/agent_bridge/session_model_hook_test.exs	New integration tests for authenticated session hook writes into ModelRegistry.
test/monkey_claw/agent_bridge/backend/test_models_test.exs	New unit tests for the test backend’s `list_models/1` contract.
test/monkey_claw/agent_bridge/backend/beam_agent_list_models_test.exs	New integration tests for BeamAgent backend `list_models/1` error paths.
priv/repo/migrations/20260407000000_rewrite_cached_models.exs	Drop-and-replace migration for the new `(backend, provider)` cached_models table shape.
lib/monkey_claw/model_registry/provider.ex	Routes provider error logging through `sanitize_for_log/1` (SecretScanner redaction).
lib/monkey_claw/model_registry/ets_heir.ex	New long-lived GenServer owning the ETS table and transferring ownership to/from ModelRegistry.
lib/monkey_claw/model_registry/cached_model.ex	Rewrites schema into embedded models per `(backend, provider)` row + strict trust-boundary validations.
lib/monkey_claw/model_registry/baseline.ex	New baseline runtime-config reader with structural validation + logging.
lib/monkey_claw/model_registry.ex	Full rewrite: new read APIs, upsert funnel, boot seeding, tick probes with backoff, refresh/configure, ETS claiming.
lib/monkey_claw/application.ex	Starts `ModelRegistry.EtsHeir` under the same lifecycle flag as ModelRegistry.
lib/monkey_claw/agent_bridge/session.ex	Adds async authenticated session-start hook to push fresh model lists into ModelRegistry.
lib/monkey_claw/agent_bridge/backend/beam_agent.ex	Implements `list_models/1` by mapping backend→provider and delegating to Provider.fetch_models/2.
lib/monkey_claw/agent_bridge/backend.ex	Extends backend behaviour contract with `list_models/1` types + callback documentation.
lib/monkey_claw_web/live/vault_live.ex	Switches UI model listing/refresh to `list_all_by_provider/0` and `refresh_all/0` with rescue.
docs/superpowers/specs/2026-04-05-list-models-per-backend-design.md	Adds the design spec describing the new registry shape, writers/readers, and invariants.
config/runtime.exs	Adds baseline model entries seeded at boot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The wait_for_new_registry and wait_for_ets_owner helpers used 100 × 10ms = 1 second timeout. On slow CI runners the full restart cycle (supervisor restart → init → handle_continue → load_sqlite_rows → degraded fallback → ETS claim) exceeds 1s. Increases to 300 × 10ms = 3 seconds. Purely a test change — no production code impact.

… error handling - Extract probe_opts/2 helper that injects :backend and :workspace_id into adapter opts via Map.put_new, removing the requirement for backend_configs entries to redundantly carry these keys - Guard probe_deadline_ms with is_integer check before min/2 - Replace crash-on-error pattern-match in do_upsert/2 with graceful case handling — log warning and return {:error, reason} instead of crashing the GenServer when Repo.transaction fails - Update both callers (session_hook, handle_probe_result) to handle {:error, _} from do_upsert without crashing

Planning docs (specs, plans) are local-only and should never be committed. Updated .gitignore pattern from /doc/ to /doc*/ to cover both ExDoc output and superpowers planning documents.

@doc

Security: - ETS table access mode :public → :protected in registry and heir (H1) - All Logger calls with dynamic provider content routed through Provider.sanitize_for_log/1 → SecretScanner (M1) - Narrowed rescue in session_registered?/1 to ArgumentError (M3) - Google API key URL-param risk documented in provider.ex (L1) - backend_configs key allowlist validation prevents arbitrary keys (M2) Correctness: - config/config.exs: replaced dead keys with correct registry config (H2) - schedule_tick/2: cancel existing timer before scheduling new one (H3) - VaultLive: rescue clause now logs via Logger.warning (M4) - Restored refreshed_at in enrich/4 — VaultLive needs it for display - configure/1 :backends clause preserves existing last_probe_at (L5) - Degraded-mode retry on first tick via maybe_retry_sqlite_load/1 (L4) - in_flight? O(N) → O(1) via in_flight_backends reverse index (L7) - Session fire_model_hook: Task.start → Task.Supervisor.start_child for OTP crash visibility - resolve_backend_name/2 replaces infer_backend_name/1 — no hardcoded BeamAgent→"claude" mapping; requires explicit :backend_name opt - Removed Backend.Test clause from infer_backend_name (L3) Documentation: - EtsHeir.start_link/1 @doc added (M5) - README + index.html Model Registry sections updated (M6) - newer?/2 cross-restart mono tiebreaker limitation documented (L6) Tests: - session_model_hook_test: explicit backend_name: "test" in opts - Dialyzer clean: suppress cancel_timer return, replace MapSet.subset? with Enum.all? to avoid opaque type mismatch

Copilot

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…spect, match_object - Validate probe result entries have :provider key before group_by; malformed entries trigger backoff instead of crashing the GenServer - sanitize_for_log inspect limit: :infinity → 200 to bound allocation - ets_scan_by_backend/provider: tab2list+filter → match_object for targeted key lookup instead of full table copy

Add a hard @max_in_flight cap (10) to maybe_dispatch_probe/2 as an explicit concurrency bound. The per-backend in_flight? check already prevents duplicate probes, making the natural cap = length(backends), but this makes the invariant explicit and guards against any future code path that might bypass the per-backend check. Add a clarifying comment on the handle_info ETS-TRANSFER clause explaining that reconciliation happens during init via load_existing_and_seed_baseline/1, and this clause is a defensive catch-all for unexpected runtime transfers.

checkpoint_save/2 and checkpoint_rewind/2 in the BeamAgent adapter used a function_exported?/3 + apply pattern that silently returned {:error, :not_supported} — a compatibility shim banned by project standards. BeamAgent.Checkpoint does not yet export these functions. Replace both with explicit raises so the failure is loud. Add rescue blocks at the two Runner call sites (prepare_opts/1 and try_checkpoint_rewind/3) since the Runner → Backend adapter boundary is a trust boundary where rescue is appropriate. The Runner degrades gracefully: nil checkpoint_id skips rewind via the existing guard.

Reverts commit 4ff560a which incorrectly replaced the function_exported?/3 + apply/3 runtime feature detection with loud raises. The checkpoint system (BeamAgent.Checkpoint) is a planned feature waiting on an upstream dependency — the existing pattern correctly returns {:error, :not_supported} which callers handle gracefully (nil checkpoint_id skips rewind in the experiment Runner). This is runtime feature detection, not a stub. Also removes the unnecessary rescue blocks added to runner.ex prepare_opts/1 and try_checkpoint_rewind/2 — the backend adapter already returns error tuples, not raises.

The BeamAgent adapter was using function_exported?/3 guards to check for BeamAgent.Checkpoint.save/2 — a function that never existed. The real upstream API is BeamAgent.Checkpoint.snapshot/3 (session_id, uuid, file_paths) and BeamAgent.Checkpoint.rewind/2 (session_id, uuid). Wire the adapter to the actual API: - checkpoint_save/3 now calls BeamAgent.Checkpoint.snapshot/3 with the session_id (resolved via session_info), a generated UUID, and the scoped file paths from the Runner's mutation_scope - checkpoint_rewind/2 now calls BeamAgent.Checkpoint.rewind/2 with the session_id and checkpoint UUID - Backend behaviour updated: checkpoint_save/2 → checkpoint_save/3 to accept file_paths (the files to snapshot) - Runner passes state.mutation_scope (populated at init from strategy.mutation_scope/1) to checkpoint_save - Test backend updated to accept the new arity

Copilot

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

lib/monkey_claw/agent_bridge/backend.ex:227

The checkpoint rewind docs still reference checkpoint_save/2, but the behaviour callback was updated to checkpoint_save/3 (with file_paths). Update the doc text so it matches the new contract and doesn’t mislead implementers/callers.

  @doc """
  Rewind the session to a previously saved checkpoint.

  Restores the session state to the point captured by
  `checkpoint_save/2`. Used by the experiment Runner to
  rollback rejected iterations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two bugs identified by external review: 1. Exponential backoff was dead — apply_backoff/2 correctly set due?/2 timing, but handle_info(:tick) always rescheduled at default_interval (24h). A backend needing retry in 5s would sleep for 24h. Fix: schedule_next_tick/1 computes the earliest due time across all backends and wakes the tick accordingly. Reschedule also fires after probe completions and crashes. 2. resolve_backend_name/2 only checked :backend_name, which no caller ever sets. Scope.session_opts/1 produces :backend (atom). Every session-hook write landed under "unknown", breaking per-backend cache correctness. Fix: read :backend directly and convert atom to string.

The session hook cast was forgeable — any process on the node could call GenServer.cast with an arbitrary registered session pid and poison the model registry. In an agentic coder context, the agent has bash/code execution and could send arbitrary BEAM messages. Fix: capability token handshake. Session init generates an opaque make_ref() token, registers it with ModelRegistry via a synchronous call, and passes it to the hook task. The cast now carries the token as {:session_hook, pid, token, writes}. ModelRegistry validates the token matches what was registered for that pid. Forged casts with unknown tokens are rejected. Token lifecycle: - Created: Session.init generates make_ref() - Registered: GenServer.call to ModelRegistry stores {pid, token} - Validated: handle_cast checks token matches on every write - Cleaned up: Process.monitor on session pid, DOWN clears token Guard: Process.whereis check before registration so sessions start normally when ModelRegistry isn't running (most tests).

ETS heir transfer can take longer on OTP 28 under CI load. Bump poll bounds from 300×10ms to 500×10ms to prevent flaky failures.

…ledoc, init validation - Add atom catch-all clause to backend_to_provider/1 so session_opts atoms (:claude, :codex, etc.) resolve correctly instead of crashing with FunctionClauseError - Update upsert/1 @SPEC to include {:error, term()} return — do_upsert already handles and returns error tuples from Repo.transaction - Clarify moduledoc spec file reference as local-only (not in repo) - Add >= default_interval enforcement to backend_intervals init validation, matching configure/1 runtime validation

Exercises the new atom catch-all clause added in the previous commit. Passes backend: :claude (atom) to BeamAgent.list_models/1 and asserts no FunctionClauseError — the atom is normalized to "claude" before provider mapping. HTTP call still fails (unreachable localhost), but the normalization path is now covered.

The models table was grouped by provider but also showed a provider column per row — redundant. Replace with backend column to surface the new (backend, provider) architecture. Uses badge-secondary to visually distinguish backend tags from provider group headings.

…backend-grouped selector ChatLive now pulls available models from ModelRegistry.list_all_by_backend() instead of a hardcoded 3-model list. The select element uses <optgroup> per backend so users can pick any model from any configured backend. Falls back to the original Anthropic defaults when the registry is empty (e.g. before first refresh or when no secrets are configured). - available_models/0 reads ETS via ModelRegistry (no GenServer call) - maybe_select_backend/2 does direct map lookup by backend key - humanize_backend/1 formats backend names for optgroup labels - default_available_models/0 preserves config override escape hatch

…tion Document the new model selection flow: models sourced from ModelRegistry.list_all_by_backend/0 with fallback to config defaults, backend-grouped optgroups, and AgentBridge.set_model/2 integration.

beardedeagle added 30 commits April 5, 2026 10:34

feat: add ModelRegistry.Baseline runtime-config reader

893614b

feat: ship default baseline entries for claude/codex/gemini backends

afe6209

test: implement list_models/1 in TestBackend with programmable responses

8d27efa

feat: implement list_models/1 in BeamAgent backend via Provider

5bbfea9

test: restore plan-specified test name for missing_workspace_id case

13fb54a

chore: stub vault_live.ex model helpers pending Task 21 cutover

35454b1

feat: ModelRegistry boot sequence with baseline delta seed

4de0b31

Init claims ETS via heir, loads rows from SQLite, seeds any baseline entries not already present (warm-start delta). On SQLite failure, falls back to baseline-only ETS and enters degraded state.

Copilot AI review requested due to automatic review settings April 5, 2026 23:32

Copilot started reviewing on behalf of beardedeagle April 5, 2026 23:32 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread lib/monkey_claw/model_registry.ex Outdated

Comment thread lib/monkey_claw/model_registry.ex

Comment thread lib/monkey_claw/model_registry.ex Outdated

beardedeagle added 4 commits April 5, 2026 18:39

fix: remove docs/ from tracking and update .gitignore

79ee080

Planning docs (specs, plans) are local-only and should never be committed. Updated .gitignore pattern from /doc/ to /doc*/ to cover both ExDoc output and superpowers planning documents.

beardedeagle requested a review from Copilot April 6, 2026 01:48

Copilot started reviewing on behalf of beardedeagle April 6, 2026 01:48 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Comment thread test/monkey_claw/model_registry_test.exs

Comment thread test/monkey_claw/model_registry_test.exs

Comment thread lib/monkey_claw/model_registry.ex

Comment thread lib/monkey_claw/model_registry/provider.ex

Comment thread lib/monkey_claw/model_registry.ex Outdated

beardedeagle added 5 commits April 5, 2026 21:09

beardedeagle requested a review from Copilot April 6, 2026 05:42

Copilot started reviewing on behalf of beardedeagle April 6, 2026 05:42 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

beardedeagle added 8 commits April 6, 2026 06:07

fix: increase crash survival test timeouts from 3s to 5s for OTP 28 CI

fa62fdd

ETS heir transfer can take longer on OTP 28 under CI load. Bump poll bounds from 300×10ms to 500×10ms to prevent flaky failures.

docs: update ChatLive moduledoc for ModelRegistry-powered model selec…

1f048d6

…tion Document the new model selection flow: models sourced from ModelRegistry.list_all_by_backend/0 with fallback to config defaults, backend-grouped optgroups, and AgentBridge.set_model/2 integration.

beardedeagle merged commit b363c4c into main Apr 6, 2026
7 checks passed

beardedeagle deleted the feat/MonkeyClaw-list-models-per-backend branch April 6, 2026 13:40

beardedeagle mentioned this pull request Apr 6, 2026

fix: replace Provider HTTP module with beam_agent session-based model listing #35

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unified ModelRegistry keyed on (backend, provider)#34

feat: unified ModelRegistry keyed on (backend, provider)#34
beardedeagle merged 57 commits intomainfrom
feat/MonkeyClaw-list-models-per-backend

beardedeagle commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

beardedeagle commented Apr 5, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants