Skip to content

feat: unified ModelRegistry keyed on (backend, provider)#34

Merged
beardedeagle merged 57 commits intomainfrom
feat/MonkeyClaw-list-models-per-backend
Apr 6, 2026
Merged

feat: unified ModelRegistry keyed on (backend, provider)#34
beardedeagle merged 57 commits intomainfrom
feat/MonkeyClaw-list-models-per-backend

Conversation

@beardedeagle
Copy link
Copy Markdown
Owner

Summary

  • Drop-and-replace cutover of ModelRegistry from provider-keyed to (backend, provider)-keyed
  • Adds Baseline runtime-config loader for cold-start model availability
  • Adds probe tick handler with per-backend scheduling, in-flight dedup, and exponential backoff
  • Adds authenticated session hook for fresh model lists on every session start
  • Adds ETS heir for crash-restart continuity
  • Redacts Provider log sites through Vault.SecretScanner (spec I8)
  • Wires vault_live.ex stubs to real ModelRegistry API
  • Zero references to the old list_models/1 / list_all_models/0 API remain (grep-verified)

Spec: docs/superpowers/specs/2026-04-05-list-models-per-backend-design.md
Plan: docs/superpowers/plans/2026-04-05-list-models-per-backend.md

Test plan

  • Unit tests for CachedModel changeset (required fields, length caps, charset, embed list cap, per-model validations)
  • Unit tests for Baseline (config read, structural validation, log on drop)
  • Integration tests for ModelRegistry (boot, upsert funnel, precedence, read projections, tick, probe results, refresh, configure)
  • Integration tests for Session hook (authenticated cast, unregistered pid rejection)
  • Integration test for Provider log redaction
  • E2E test for full tree boot + crash-restart continuity via ETS heir
  • Zero-reference grep audit passes
  • All five quality gates clean (compile, format, credo --strict, dialyzer, 1945 tests/0 failures)

Spec covers unified (backend, provider) model registry with three-writer
funnel (baseline, probe, session) and ETS-first reads. Plan decomposes
implementation into 23 bite-sized TDD tasks enforcing a full drop-and-
replace cutover (greenfield, zero legacy code preserved).
Drops the provider-keyed single-model-per-row table and creates the
(backend, provider)-keyed shape with embedded JSON model list, source
tag, and monotonic tiebreaker column. Greenfield cutover, zero data
preserved. Unique index on (backend, provider) plus individual indexes
on backend and provider for the by-backend and by-provider read paths.
Replaces the old flat model row with a (backend, provider)-keyed row
holding an embedded list of models. Adds source tag and refreshed_mono
columns for write arbitration. Deletes the old model_registry_test.exs
(exercised the old API); replaced by cached_model_test.exs here and
model_registry_test.exs rewritten in subsequent tasks. Changeset stub
only — full validation chain lands in task 3.

Also replaces lib/monkey_claw/model_registry.ex with a minimal cutover
stub that preserves the public API surface consumed by vault_live.ex
(list_all_models/0, refresh_all/0) so the project compiles through
Tasks 3-10. Task 11 deletes the stub and writes the real GenServer.
Renames "has no legacy top-level model fields" to
"rejects top-level model fields from the prior shape".

The CLAUDE.md project standard forbids the word "legacy" in
code, comments, and docs — this is a greenfield project with
zero users, nothing is legacy. The banned word slipped in via
the plan's verbatim test text; fixed in-place so future Task 3
changeset validation tests don't pattern-copy the violation.
Replaces the Task 2 cast-only stub with full validate_required on all
five top-level fields (backend, provider, source, refreshed_at,
refreshed_mono), required: true on the embedded models list, and
validate_required(:model_id, :display_name) in model_changeset/2.

Also fixes a pre-existing dialyzer unknown_type error in the @type t
definition where Model.t() was unresolved — corrected to
__MODULE__.Model.t() to reference the inline embedded schema module.

10 tests, 0 failures (2 schema shape + 8 required-field).
Add trust-boundary invariants from spec §Schema to CachedModel.changeset/2:
- validate_format on :backend and :provider enforcing ^[a-z][a-z0-9_]*$
- validate_length on :backend and :provider capping at 64 bytes
- validate_inclusion on :source restricting to baseline/probe/session

Exposes allowed_sources/0 as a typed public helper so callers can
enumerate valid source values without duplicating the enum.

Module attributes @identifier_pattern, @max_identifier_length, and
@allowed_sources centralise the constraint values.

Tests: 10 new value-constraint tests (6 rejection, 3 source acceptance,
1 empty backend) added to a dedicated describe block using @constraint_attrs
to avoid redefining the existing @valid_attrs module attribute.
Caps models list at 500, enforces 256-byte length and UTF-8 charset on
model_id/display_name, requires allowed punctuation pattern, caps
capabilities serialized size at 8 KiB. Closes out the trust-boundary
invariant set from spec section Schema.
The existing "accepts model_id with unicode letters and allowed
punctuation" test actually used ASCII only ("claude-sonnet-4.5:preview"),
which exercised the :.- punctuation branch of @model_field_pattern but
not the \p{L} (unicode letter) category. A regression that narrowed the
pattern to [a-zA-Z0-9...] would have gone undetected.

Splits into two tests:
- "accepts ASCII allowed punctuation" — documents the punctuation intent
- "accepts non-ASCII unicode letters" — uses Japanese (日本) + Greek (α)
  + Cyrillic (модель) to actually exercise \p{L}

Keeps the trust-boundary spec §Schema promise that non-ASCII model
identifiers are supported.
- Add test covering the non-map fallback clause of valid_entry?/1
  (strings, tuples, atoms, nil). Previously the clause was implemented
  but had zero test coverage.
- Add comment explaining @type entry is deliberately loose at this
  layer because CachedModel.changeset/2 is the real trust boundary.

Deferred: load!/0 bang convention concern. Plan explicitly dictates
{:ok, _} return shape for parallelism with other registry calls,
and future Tasks 11/13/14 will cascade from this name.
New callback takes opts map (not session pid) so the registry can
call it without a live session. Each model_attrs includes :provider
so multi-provider backends can fan out per spec D2/D3. Adapters
implemented in follow-up commits — compile will be red until then.
Mirrors the pattern from baseline.ex:33 — the capabilities field is
deliberately `map()` at the adapter boundary because CachedModel.changeset/2
is the real trust boundary that validates and caps capability content.
Adding this comment prevents a future maintainer from tightening the type
without understanding the layering.

Follow-up to review feedback on 3f141e9. The separate concern — name
collision with ModelRegistry.Provider.model_attrs — is deferred to Task 14
where the upsert funnel will reconcile both shapes.
Fix two gaps flagged in Task 9 code review:
- Add `assert models != []` so the test fails if a regression reduces
  the default canned list to an empty list (Enum.all?/2 returns true
  on [], which would otherwise silently accept the regression).
- Extend the match pattern to include `:capabilities`, covering all
  four required fields of Backend.model_attrs() so the test enforces
  the full contract instead of three of four fields.

Follow-up to 8d27efa. The related Low-priority note about timing the
deadline short-circuit is deferred to Task 16 where probe latency
handling is exercised end-to-end.
…reads

Full drop-and-replace of the old provider-keyed GenServer. New state
struct per spec (backends, intervals, in-flight, backoff), new read
API (list_for_backend, list_for_provider, list_all_by_*). Upsert
funnel, probe logic, and supervision integration land in follow-up
commits. Existing vault_live.ex consumer is still on the old API and
will be cut over in a later task.
EtsHeir owns the :monkey_claw_model_registry ETS table with itself
as heir and gives ownership to ModelRegistry on start via
give_away/3. On registry crash, ownership returns to the heir;
supervisor restarts the registry, which re-claims the table. Reads
observe continuity across the restart.
- Test now exercises full crash-and-reclaim round-trip (I-1)
  using :permanent restart and ETS owner assertion
- Log ETS-TRANSFER events on both heir and registry sides (I-2)
- EtsHeir.init/1 verifies ownership on re-init (I-3)
- EtsHeir.handle_call({:claim,_}) ownership pre-check (M-1)
- @claim_timeout_ms module attribute for the claim wait (M-2)
- Comment on shared :start_model_registry lifecycle flag (M-3)
- EtsHeir catch-all handle_info logs Logger.debug (M-4)
- Add @SPEC to EtsHeir.init/1 (M-5)
Init claims ETS via heir, loads rows from SQLite, seeds any baseline
entries not already present (warm-start delta). On SQLite failure,
falls back to baseline-only ETS and enters degraded state.
- load_sqlite_rows rescue narrowed to environmental
  DB failures only (I-1, house rule compliance)
- seed_baseline_delta returns :ok; caller keeps state
  reference (I-3)
- Remove dead Sandbox.allow calls from boot sequence
  tests; DataCase provides shared-mode (I-4)
Single write funnel for all three writers (baseline/probe/session).
Validates every write through CachedModel.changeset, drops invalid
entries with a log, runs the valid set in one transaction with
(refreshed_at, refreshed_mono) precedence, and updates ETS for
winning rows only. Multi-provider backends fan out naturally
because callers pass one write per (backend, provider) group.
…anups

Stage-2 review fixes for Task 14 (b8f98ce).

IMPORTANT: Move :ets.insert out of Repo.transaction closure. ETS is
not transactional — if a later row raised mid-batch, SQLite would
roll back while ETS kept the phantom rows. ETS now only writes after
the transaction commits, guaranteeing ETS rows always correspond to
SQLite rows.

MINOR: Narrow @SPEC upsert/1 to {:ok, [CachedModel.t()]} — the
{:error, _} branch was unreachable since apply_upserts never calls
Repo.rollback. Strict {:ok, applied} match now documents actual
control flow.

MINOR: Simplify validate_writes/1 — direct destructure instead of
then/2 indirection.

MINOR: Use Map.get/2 in validate_writes error log — survives malformed
write maps without a second crash point.

MINOR: Replace System.monotonic_time() + 1 in upsert test with
Process.sleep/1 + re-sample. Monotonic integers are opaque per Erlang
docs; arithmetic on them is discouraged.
…light dedup

First tick fires at startup_delay_ms (default 5s), subsequent ticks at
default_interval_ms. Each tick iterates configured backends, skipping
those with an in-flight probe task or that aren't due per their
personal interval. Probe tasks dispatch via MonkeyClaw.TaskSupervisor
with async_nolink. Task result handling lands in the next task.
The State struct declared startup_delay_ms: 5_000 as a literal default,
duplicating the @startup_delay_ms_default 5_000 module attribute. Since
init/1 always sets startup_delay_ms via Keyword.get(opts, :startup_delay_ms,
@startup_delay_ms_default), the literal in defstruct was dead and a DRY
violation.

Promote startup_delay_ms to @enforce_keys so the State invariant is
enforced structurally, and drop the duplicate default. init/1 remains
the single source of truth for the default value.
…koff

Successful probe results group by provider, fan into per-(backend,
provider) writes, and hit the upsert funnel. Reset backoff and mark
the backend as probed on success. Error results and task crashes
trigger exponential backoff (5s initial, doubling to 5m cap) via
apply_backoff/2, which bumps last_probe_at forward so the next tick
skips the backend for the backoff duration. Registry stays alive on
any backend failure — stale cache preserved.

Process.demonitor(ref, [:flush]) on successful task results prevents
the trailing :DOWN message from being mishandled.

Also fix a pre-existing race in the ETS heir crash test: the assertion
now uses wait_for_ets_owner/3 to poll until give_away completes
before asserting ownership, eliminating the flake.
Address four IMPORTANT findings from Stage-2 code quality review of the
ModelRegistry probe result handling:

- Add catch-all clause to handle_probe_result/3 so malformed adapter
  results log and apply backoff instead of crashing the GenServer with
  FunctionClauseError. Runtime-injected backends are a trust boundary
  per CLAUDE.md.
- Add explicit {:ok, []} clause that short-circuits empty probe results
  without opening a pointless SQLite transaction. Makes "empty = healthy"
  semantics visible to readers.
- Validate default_interval_ms and backend_intervals at init time.
  Non-positive values now raise ArgumentError with a clear message
  (fail loudly per CLAUDE.md).
- Replace :timer.sleep with a wait_until/3 polling helper in the three
  new probe result tests, mirroring the wait_for_ets_owner/3 pattern
  introduced in the same commit. Tests are now deterministic and fail
  fast with meaningful messages instead of depending on 100-200ms wall
  clock budgets.

Also add the missing Baseline-clearing setup to the new describe block
so tests are independent of describe execution order.
refresh/1 dispatches a synchronous probe via Task.Supervisor.async_nolink
and blocks the caller until the probe completes or times out. The
GenServer.call timeout is @per_backend_refresh_timeout_ms + 1_000 so the
task deadline is always hit before the call times out.

refresh_all/0 iterates all configured backends sequentially with a
timeout scaled to the backend count (per spec I5), preventing spurious
call timeouts on large backend sets.

Both operations delegate to handle_probe_result/3 for successful and
error paths so backoff, cache, and probed-at state stay consistent with
the async tick path. Task.yield + Task.shutdown(:brutal_kill) cleans
up the monitor and flushes any trailing :DOWN message.
Copilot AI review requested due to automatic review settings April 5, 2026 23:32
The rescue clause only caught DBConnection.ConnectionError and
Exqlite.Error. During supervisor restart after Process.exit(:kill),
the Ecto sandbox can raise other exception types (e.g.
DBConnection.OwnershipError) when the killed process's checkout is
reclaimed. The unhandled exception caused the GenServer to restart-
loop until the supervisor gave up, failing the ETS heir crash
survival test on the Elixir 1.18 + OTP 27 CI matrix.

Adds a fallback rescue clause with a warning log. The caller
(load_existing_and_seed_baseline) already handles {:error, _} by
falling back to baseline-only ETS in degraded mode, which is
strictly better than a restart loop.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites MonkeyClaw.ModelRegistry from a provider-keyed cache into a unified (backend, provider)-keyed registry with multiple writers (baseline/probe/session/refresh), ETS crash continuity, and provider-log redaction, and updates consumers/tests to the new read APIs.

Changes:

  • Replaces the cached model schema with one row per (backend, provider) containing an embedded model list + precedence metadata (refreshed_at/refreshed_mono).
  • Adds baseline loading, tick-driven probing with backoff/in-flight tracking, session-start model hook, and ETS heir ownership transfer.
  • Redacts provider error logging via Vault.SecretScanner and wires VaultLive to the new registry read/refresh APIs.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/support/test_backend.ex Adds list_models/1 to the test backend adapter with programmable responses/delay/crash behavior.
test/monkey_claw/model_registry/provider_log_redaction_test.exs New tests asserting Provider log sanitization redacts secret-shaped content.
test/monkey_claw/model_registry/cached_model_test.exs New changeset/schema unit tests for the rewritten embedded-model CachedModel shape.
test/monkey_claw/model_registry/baseline_test.exs New tests for baseline runtime config loading and structural validation/drop behavior.
test/monkey_claw/model_registry_test.exs Rewritten integration suite for the new registry lifecycle, upsert funnel, probing, refresh, configure, and ETS heir behavior.
test/monkey_claw/model_registry_e2e_test.exs New E2E test covering baseline availability, refresh, projections, and crash-restart continuity.
test/monkey_claw/agent_bridge/session_model_hook_test.exs New integration tests for authenticated session hook writes into ModelRegistry.
test/monkey_claw/agent_bridge/backend/test_models_test.exs New unit tests for the test backend’s list_models/1 contract.
test/monkey_claw/agent_bridge/backend/beam_agent_list_models_test.exs New integration tests for BeamAgent backend list_models/1 error paths.
priv/repo/migrations/20260407000000_rewrite_cached_models.exs Drop-and-replace migration for the new (backend, provider) cached_models table shape.
lib/monkey_claw/model_registry/provider.ex Routes provider error logging through sanitize_for_log/1 (SecretScanner redaction).
lib/monkey_claw/model_registry/ets_heir.ex New long-lived GenServer owning the ETS table and transferring ownership to/from ModelRegistry.
lib/monkey_claw/model_registry/cached_model.ex Rewrites schema into embedded models per (backend, provider) row + strict trust-boundary validations.
lib/monkey_claw/model_registry/baseline.ex New baseline runtime-config reader with structural validation + logging.
lib/monkey_claw/model_registry.ex Full rewrite: new read APIs, upsert funnel, boot seeding, tick probes with backoff, refresh/configure, ETS claiming.
lib/monkey_claw/application.ex Starts ModelRegistry.EtsHeir under the same lifecycle flag as ModelRegistry.
lib/monkey_claw/agent_bridge/session.ex Adds async authenticated session-start hook to push fresh model lists into ModelRegistry.
lib/monkey_claw/agent_bridge/backend/beam_agent.ex Implements list_models/1 by mapping backend→provider and delegating to Provider.fetch_models/2.
lib/monkey_claw/agent_bridge/backend.ex Extends backend behaviour contract with list_models/1 types + callback documentation.
lib/monkey_claw_web/live/vault_live.ex Switches UI model listing/refresh to list_all_by_provider/0 and refresh_all/0 with rescue.
docs/superpowers/specs/2026-04-05-list-models-per-backend-design.md Adds the design spec describing the new registry shape, writers/readers, and invariants.
config/runtime.exs Adds baseline model entries seeded at boot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/monkey_claw/model_registry.ex Outdated
Comment thread lib/monkey_claw/model_registry.ex
Comment thread lib/monkey_claw/model_registry.ex Outdated
The wait_for_new_registry and wait_for_ets_owner helpers used
100 × 10ms = 1 second timeout. On slow CI runners the full
restart cycle (supervisor restart → init → handle_continue →
load_sqlite_rows → degraded fallback → ETS claim) exceeds 1s.

Increases to 300 × 10ms = 3 seconds. Purely a test change —
no production code impact.
… error handling

- Extract probe_opts/2 helper that injects :backend and :workspace_id
  into adapter opts via Map.put_new, removing the requirement for
  backend_configs entries to redundantly carry these keys
- Guard probe_deadline_ms with is_integer check before min/2
- Replace crash-on-error pattern-match in do_upsert/2 with graceful
  case handling — log warning and return {:error, reason} instead of
  crashing the GenServer when Repo.transaction fails
- Update both callers (session_hook, handle_probe_result) to handle
  {:error, _} from do_upsert without crashing
Planning docs (specs, plans) are local-only and should never be
committed. Updated .gitignore pattern from /doc/ to /doc*/ to cover
both ExDoc output and superpowers planning documents.
Security:
- ETS table access mode :public → :protected in registry and heir (H1)
- All Logger calls with dynamic provider content routed through
  Provider.sanitize_for_log/1 → SecretScanner (M1)
- Narrowed rescue in session_registered?/1 to ArgumentError (M3)
- Google API key URL-param risk documented in provider.ex (L1)
- backend_configs key allowlist validation prevents arbitrary keys (M2)

Correctness:
- config/config.exs: replaced dead keys with correct registry config (H2)
- schedule_tick/2: cancel existing timer before scheduling new one (H3)
- VaultLive: rescue clause now logs via Logger.warning (M4)
- Restored refreshed_at in enrich/4 — VaultLive needs it for display
- configure/1 :backends clause preserves existing last_probe_at (L5)
- Degraded-mode retry on first tick via maybe_retry_sqlite_load/1 (L4)
- in_flight? O(N) → O(1) via in_flight_backends reverse index (L7)
- Session fire_model_hook: Task.start → Task.Supervisor.start_child
  for OTP crash visibility
- resolve_backend_name/2 replaces infer_backend_name/1 — no hardcoded
  BeamAgent→"claude" mapping; requires explicit :backend_name opt
- Removed Backend.Test clause from infer_backend_name (L3)

Documentation:
- EtsHeir.start_link/1 @doc added (M5)
- README + index.html Model Registry sections updated (M6)
- newer?/2 cross-restart mono tiebreaker limitation documented (L6)

Tests:
- session_model_hook_test: explicit backend_name: "test" in opts
- Dialyzer clean: suppress cancel_timer return, replace MapSet.subset?
  with Enum.all? to avoid opaque type mismatch
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/monkey_claw/model_registry_test.exs
Comment thread test/monkey_claw/model_registry_test.exs
Comment thread lib/monkey_claw/model_registry.ex
Comment thread lib/monkey_claw/model_registry/provider.ex
Comment thread lib/monkey_claw/model_registry.ex Outdated
…spect, match_object

- Validate probe result entries have :provider key before group_by;
  malformed entries trigger backoff instead of crashing the GenServer
- sanitize_for_log inspect limit: :infinity → 200 to bound allocation
- ets_scan_by_backend/provider: tab2list+filter → match_object for
  targeted key lookup instead of full table copy
Add a hard @max_in_flight cap (10) to maybe_dispatch_probe/2 as an
explicit concurrency bound. The per-backend in_flight? check already
prevents duplicate probes, making the natural cap = length(backends),
but this makes the invariant explicit and guards against any future
code path that might bypass the per-backend check.

Add a clarifying comment on the handle_info ETS-TRANSFER clause
explaining that reconciliation happens during init via
load_existing_and_seed_baseline/1, and this clause is a defensive
catch-all for unexpected runtime transfers.
checkpoint_save/2 and checkpoint_rewind/2 in the BeamAgent adapter
used a function_exported?/3 + apply pattern that silently returned
{:error, :not_supported} — a compatibility shim banned by project
standards. BeamAgent.Checkpoint does not yet export these functions.

Replace both with explicit raises so the failure is loud. Add rescue
blocks at the two Runner call sites (prepare_opts/1 and
try_checkpoint_rewind/3) since the Runner → Backend adapter boundary
is a trust boundary where rescue is appropriate. The Runner degrades
gracefully: nil checkpoint_id skips rewind via the existing guard.
Reverts commit 4ff560a which incorrectly replaced the
function_exported?/3 + apply/3 runtime feature detection with
loud raises. The checkpoint system (BeamAgent.Checkpoint) is a
planned feature waiting on an upstream dependency — the existing
pattern correctly returns {:error, :not_supported} which callers
handle gracefully (nil checkpoint_id skips rewind in the
experiment Runner). This is runtime feature detection, not a stub.

Also removes the unnecessary rescue blocks added to runner.ex
prepare_opts/1 and try_checkpoint_rewind/2 — the backend adapter
already returns error tuples, not raises.
The BeamAgent adapter was using function_exported?/3 guards to check
for BeamAgent.Checkpoint.save/2 — a function that never existed. The
real upstream API is BeamAgent.Checkpoint.snapshot/3 (session_id, uuid,
file_paths) and BeamAgent.Checkpoint.rewind/2 (session_id, uuid).

Wire the adapter to the actual API:

- checkpoint_save/3 now calls BeamAgent.Checkpoint.snapshot/3 with
  the session_id (resolved via session_info), a generated UUID, and
  the scoped file paths from the Runner's mutation_scope
- checkpoint_rewind/2 now calls BeamAgent.Checkpoint.rewind/2 with
  the session_id and checkpoint UUID
- Backend behaviour updated: checkpoint_save/2 → checkpoint_save/3
  to accept file_paths (the files to snapshot)
- Runner passes state.mutation_scope (populated at init from
  strategy.mutation_scope/1) to checkpoint_save
- Test backend updated to accept the new arity
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

lib/monkey_claw/agent_bridge/backend.ex:227

  • The checkpoint rewind docs still reference checkpoint_save/2, but the behaviour callback was updated to checkpoint_save/3 (with file_paths). Update the doc text so it matches the new contract and doesn’t mislead implementers/callers.
  @doc """
  Rewind the session to a previously saved checkpoint.

  Restores the session state to the point captured by
  `checkpoint_save/2`. Used by the experiment Runner to
  rollback rejected iterations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/monkey_claw/agent_bridge/backend/beam_agent.ex
Comment thread lib/monkey_claw/agent_bridge/session.ex
Comment thread lib/monkey_claw/agent_bridge/session.ex Outdated
Comment thread lib/monkey_claw/model_registry.ex
Comment thread lib/monkey_claw/model_registry.ex Outdated
Comment thread lib/monkey_claw/model_registry.ex Outdated
Two bugs identified by external review:

1. Exponential backoff was dead — apply_backoff/2 correctly set
   due?/2 timing, but handle_info(:tick) always rescheduled at
   default_interval (24h). A backend needing retry in 5s would
   sleep for 24h. Fix: schedule_next_tick/1 computes the earliest
   due time across all backends and wakes the tick accordingly.
   Reschedule also fires after probe completions and crashes.

2. resolve_backend_name/2 only checked :backend_name, which no
   caller ever sets. Scope.session_opts/1 produces :backend (atom).
   Every session-hook write landed under "unknown", breaking
   per-backend cache correctness. Fix: read :backend directly and
   convert atom to string.
The session hook cast was forgeable — any process on the node could
call GenServer.cast with an arbitrary registered session pid and
poison the model registry. In an agentic coder context, the agent
has bash/code execution and could send arbitrary BEAM messages.

Fix: capability token handshake. Session init generates an opaque
make_ref() token, registers it with ModelRegistry via a synchronous
call, and passes it to the hook task. The cast now carries the token
as {:session_hook, pid, token, writes}. ModelRegistry validates the
token matches what was registered for that pid. Forged casts with
unknown tokens are rejected.

Token lifecycle:
- Created: Session.init generates make_ref()
- Registered: GenServer.call to ModelRegistry stores {pid, token}
- Validated: handle_cast checks token matches on every write
- Cleaned up: Process.monitor on session pid, DOWN clears token

Guard: Process.whereis check before registration so sessions start
normally when ModelRegistry isn't running (most tests).
ETS heir transfer can take longer on OTP 28 under CI load. Bump
poll bounds from 300×10ms to 500×10ms to prevent flaky failures.
…ledoc, init validation

- Add atom catch-all clause to backend_to_provider/1 so session_opts
  atoms (:claude, :codex, etc.) resolve correctly instead of crashing
  with FunctionClauseError
- Update upsert/1 @SPEC to include {:error, term()} return — do_upsert
  already handles and returns error tuples from Repo.transaction
- Clarify moduledoc spec file reference as local-only (not in repo)
- Add >= default_interval enforcement to backend_intervals init
  validation, matching configure/1 runtime validation
Exercises the new atom catch-all clause added in the previous commit.
Passes backend: :claude (atom) to BeamAgent.list_models/1 and asserts
no FunctionClauseError — the atom is normalized to "claude" before
provider mapping. HTTP call still fails (unreachable localhost), but
the normalization path is now covered.
The models table was grouped by provider but also showed a provider
column per row — redundant. Replace with backend column to surface
the new (backend, provider) architecture. Uses badge-secondary to
visually distinguish backend tags from provider group headings.
…backend-grouped selector

ChatLive now pulls available models from ModelRegistry.list_all_by_backend()
instead of a hardcoded 3-model list. The select element uses <optgroup> per
backend so users can pick any model from any configured backend. Falls back
to the original Anthropic defaults when the registry is empty (e.g. before
first refresh or when no secrets are configured).

- available_models/0 reads ETS via ModelRegistry (no GenServer call)
- maybe_select_backend/2 does direct map lookup by backend key
- humanize_backend/1 formats backend names for optgroup labels
- default_available_models/0 preserves config override escape hatch
…tion

Document the new model selection flow: models sourced from
ModelRegistry.list_all_by_backend/0 with fallback to config defaults,
backend-grouped optgroups, and AgentBridge.set_model/2 integration.
@beardedeagle beardedeagle merged commit b363c4c into main Apr 6, 2026
7 checks passed
@beardedeagle beardedeagle deleted the feat/MonkeyClaw-list-models-per-backend branch April 6, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants