Skip to content
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
a6525fb
docs: add spec and plan for list models per backend feature
beardedeagle Apr 5, 2026
bcd879a
feat: rewrite cached_models schema for (backend, provider) keying
beardedeagle Apr 5, 2026
4af2669
feat: rewrite CachedModel schema for (backend, provider) keying
beardedeagle Apr 5, 2026
097f486
test: rename CachedModel shape test to avoid banned term
beardedeagle Apr 5, 2026
aa9cc83
feat: add required-field validation to CachedModel changeset
beardedeagle Apr 5, 2026
4cc913c
feat: add identifier charset + source enum validation to CachedModel
beardedeagle Apr 5, 2026
938b3df
feat: add embedded model validations to CachedModel changeset
beardedeagle Apr 5, 2026
325fe04
test: add non-ASCII unicode test for CachedModel model_id charset
beardedeagle Apr 5, 2026
893614b
feat: add ModelRegistry.Baseline runtime-config reader
beardedeagle Apr 5, 2026
86f962c
fix: address Task 6 code review — add non-map fallback test + type doc
beardedeagle Apr 5, 2026
afe6209
feat: ship default baseline entries for claude/codex/gemini backends
beardedeagle Apr 5, 2026
3f141e9
feat: add list_models/1 callback to AgentBridge.Backend behaviour
beardedeagle Apr 5, 2026
5401cac
docs: mark Backend.model_attrs capabilities as loose by design
beardedeagle Apr 5, 2026
8d27efa
test: implement list_models/1 in TestBackend with programmable responses
beardedeagle Apr 5, 2026
290922f
test: tighten TestBackend default-response assertion
beardedeagle Apr 5, 2026
5bbfea9
feat: implement list_models/1 in BeamAgent backend via Provider
beardedeagle Apr 5, 2026
13fb54a
test: restore plan-specified test name for missing_workspace_id case
beardedeagle Apr 5, 2026
35454b1
chore: stub vault_live.ex model helpers pending Task 21 cutover
beardedeagle Apr 5, 2026
5cfa402
feat: rewrite ModelRegistry scaffold with new state struct and empty …
beardedeagle Apr 5, 2026
f77385f
feat: add ETS heir for ModelRegistry crash survival
beardedeagle Apr 5, 2026
b522544
fix: Task 12 quality review findings
beardedeagle Apr 5, 2026
4de0b31
feat: ModelRegistry boot sequence with baseline delta seed
beardedeagle Apr 5, 2026
15b26d6
fix: Task 13 quality review findings
beardedeagle Apr 5, 2026
b8f98ce
feat: ModelRegistry upsert funnel with precedence and per-row fan-out
beardedeagle Apr 5, 2026
513f268
fix: ModelRegistry upsert — ETS writes outside transaction, minor cle…
beardedeagle Apr 5, 2026
380b596
feat: ModelRegistry tick handler with per-backend scheduling and in-f…
beardedeagle Apr 5, 2026
9d14dc3
chore: promote startup_delay_ms to @enforce_keys, remove literal default
beardedeagle Apr 5, 2026
5c02689
feat: ModelRegistry probe result + DOWN handlers with exponential bac…
beardedeagle Apr 5, 2026
4adc908
fix: Task 16 Stage-2 review — harden trust boundaries and deflake tests
beardedeagle Apr 5, 2026
0141955
feat: ModelRegistry refresh/1 and refresh_all/0 on-demand probing
beardedeagle Apr 5, 2026
49e3102
fix: Task 17 Stage-2 review — harden refresh/1 trust boundaries and t…
beardedeagle Apr 5, 2026
8d8a1f2
fix: reformat handle_probe_result guard clause to satisfy format gate
beardedeagle Apr 5, 2026
47f4a1b
feat: validated configure/1 on ModelRegistry (spec I1)
beardedeagle Apr 5, 2026
6449b47
test: close configure/1 coverage gaps flagged in Stage-2 review
beardedeagle Apr 5, 2026
a899168
feat: authenticated session hook cast to ModelRegistry (spec C3)
beardedeagle Apr 5, 2026
2aee0e7
polish: address Task 19 Stage-2 Medium items
beardedeagle Apr 5, 2026
3c605da
feat: redact Provider log sites through SecretScanner (spec I8)
beardedeagle Apr 5, 2026
81aac3b
feat: wire vault_live.ex stubs to real ModelRegistry API (spec Cutove…
beardedeagle Apr 5, 2026
9c7bc70
test: add E2E integration test for ModelRegistry cutover
beardedeagle Apr 5, 2026
d323497
fix: widen load_sqlite_rows rescue to prevent restart loop on CI
beardedeagle Apr 5, 2026
7407aee
fix: increase crash survival test timeouts from 1s to 3s for CI
beardedeagle Apr 5, 2026
1edcac4
fix: address 3 Copilot review comments — probe opts injection, upsert…
beardedeagle Apr 5, 2026
79ee080
fix: remove docs/ from tracking and update .gitignore
beardedeagle Apr 5, 2026
225e0cc
fix: harden ModelRegistry — 18 review findings + 3 follow-up fixes
beardedeagle Apr 6, 2026
8e42eb0
fix: address 3 valid PR review comments — defensive probe, bounded in…
beardedeagle Apr 6, 2026
9f398e8
fix: add explicit @max_in_flight guard and clarify ETS-TRANSFER handler
beardedeagle Apr 6, 2026
4ff560a
fix: replace checkpoint stubs with loud raises, rescue at trust boundary
beardedeagle Apr 6, 2026
3d41389
revert: restore checkpoint runtime feature detection pattern
beardedeagle Apr 6, 2026
96230de
fix: implement real checkpoint save/rewind via BeamAgent.Checkpoint
beardedeagle Apr 6, 2026
67b32ea
fix: backoff-aware tick scheduling and session hook backend resolution
beardedeagle Apr 6, 2026
706fa93
fix: capability token auth for session-hook model registry writes
beardedeagle Apr 6, 2026
fa62fdd
fix: increase crash survival test timeouts from 3s to 5s for OTP 28 CI
beardedeagle Apr 6, 2026
04caf57
fix: address 4 Copilot review findings — atom backend, typespec, modu…
beardedeagle Apr 6, 2026
5da6402
test: add atom backend test for backend_to_provider/1 normalization
beardedeagle Apr 6, 2026
d9edfd8
fix: show backend column instead of redundant provider in models table
beardedeagle Apr 6, 2026
08d95cc
feat: replace hardcoded chat model picker with ModelRegistry-powered …
beardedeagle Apr 6, 2026
1f048d6
docs: update ChatLive moduledoc for ModelRegistry-powered model selec…
beardedeagle Apr 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/pages/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ <h3>Vault &amp; Secret Management</h3>
<div class="feature">
<div class="feature-icon">&#129302;</div>
<h3>Model Registry</h3>
<p>Periodic provider model list cache with SQLite persistence and ETS write-through for low-latency reads. GenServer-managed refresh cycles for Anthropic, OpenAI, and Google APIs with graceful degradation on failures.</p>
<p>Backend-keyed model registry with per-backend probes, SQLite persistence, and ETS read-through cache. Four write sources (baseline, probe, session hook, on-demand), ETS heir crash survival, and exponential backoff with graceful degradation on failures.</p>
</div>
</div>
</section>
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
/deps/

# Where 3rd-party dependencies like ExDoc output generated docs.
/doc/
/doc*/

# Ignore .fetch files in case you like to edit your project deps locally.
/.fetch
Expand Down
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Clean separation of concerns, connected through a public Elixir API.
| **Notifications** | `MonkeyClaw.Notifications` | Event-driven notification system — routes telemetry events to user-facing alerts via PubSub (real-time) and email (async), with workspace-scoped rules, severity thresholds, and ETS-cached routing |
| **Channels** | `MonkeyClaw.Channels` | Bi-directional platform adapters — Slack, Discord, Telegram, WhatsApp, Web — with adapter behaviour, message recording, webhook verification, and async agent dispatch |
| **Vault** | `MonkeyClaw.Vault` | Encrypted secret and OAuth token storage with `@secret:name` opaque references — model never sees plaintext; AES-256-GCM encryption at rest with HKDF-derived keys |
| **ModelRegistry** | `MonkeyClaw.ModelRegistry` | Periodic provider model list cache — GenServer with SQLite persistence, ETS write-through, and configurable refresh intervals |
| **ModelRegistry** | `MonkeyClaw.ModelRegistry` | Backend-keyed model registry — GenServer with per-backend probes, SQLite persistence, ETS read-through cache, and configurable refresh intervals per `(backend, provider)` pair |

Contexts (`MonkeyClaw.Assistants`, `MonkeyClaw.Workspaces`, `MonkeyClaw.Webhooks`, `MonkeyClaw.Notifications`, `MonkeyClaw.Channels`, `MonkeyClaw.Vault`) provide the
public CRUD API. `MonkeyClaw.AgentBridge` translates domain objects into
Expand Down Expand Up @@ -529,30 +529,35 @@ cached models grouped by provider, trigger refresh).

### Model Registry

Periodic refresh of available AI models from provider APIs, with
SQLite persistence and ETS write-through cache for low-latency reads.
Unified model cache keyed on `(backend, provider)` with per-backend
probes, SQLite persistence, and ETS read-through for low-latency reads.

Two-layer architecture:
Five-layer architecture:

| Layer | Module | Owns |
|-------|--------|------|
| **ModelRegistry** | `MonkeyClaw.ModelRegistry` | GenServer — ETS table lifecycle, periodic refresh timer, serialized writes |
| **Provider** | `MonkeyClaw.ModelRegistry.Provider` | HTTP fetching via Req for Anthropic, OpenAI, and Google APIs |

The GenServer is justified because it manages concurrent state (ETS
table ownership), periodic work (configurable refresh interval), and
serialized writes (preventing concurrent refresh races). Reads bypass
the GenServer entirely — ETS with `:read_concurrency` enabled.

Graceful degradation: provider API failures log warnings and preserve
stale cache. Vault resolution failures skip that provider. The
GenServer never crashes on refresh failure. The LiveView handles
a missing ModelRegistry process (disabled in test config) by showing
| **ModelRegistry** | `MonkeyClaw.ModelRegistry` | GenServer — ETS table lifecycle, tick scheduler, per-backend probe dispatch, serialized writes via single upsert funnel |
| **CachedModel** | `MonkeyClaw.ModelRegistry.CachedModel` | Ecto schema — `(backend, provider)` unique key, embedded model list, trust-boundary changeset validation |
| **Baseline** | `MonkeyClaw.ModelRegistry.Baseline` | Boot seed loader — reads baseline model entries from `runtime.exs`, cold-start availability |
| **EtsHeir** | `MonkeyClaw.ModelRegistry.EtsHeir` | ETS crash survival — heir process reclaims the table when the registry crashes and re-transfers on restart |
| **Provider** | `MonkeyClaw.ModelRegistry.Provider` | HTTP fetching via Req for Anthropic, OpenAI, and Google APIs; called by the BeamAgent backend adapter |

Four independent writers populate the cache: **Baseline** (boot seed),
**Probe** (periodic per-backend tasks via `TaskSupervisor`),
**Session hook** (authenticated cast from `AgentBridge.Session`), and
**on-demand refresh** (`refresh/1`, `refresh_all/0`). All four funnel
through a single validated upsert path with conditional precedence on
`(refreshed_at, refreshed_mono)`.

Graceful degradation: probe failures trigger exponential backoff
(5s initial, 5min cap) and preserve stale cache. Baseline guarantees
a floor of known models at boot even if SQLite is unavailable. The
GenServer never crashes on refresh failure. The LiveView handles a
missing ModelRegistry process (disabled in test config) by showing
an empty state.

Runtime reconfiguration via `ModelRegistry.configure/1` allows changing
the workspace ID and provider secret mappings without restarting the
process.
backends, intervals, and backend configs without restarting the process.

### Dashboard

Expand Down
13 changes: 8 additions & 5 deletions config/config.exs
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,15 @@ config :monkey_claw, MonkeyClaw.Extensions,
]
}

# Model registry — periodic refresh of available models from provider APIs.
# Disabled by default; configure workspace_id and provider_secrets to enable.
# Model registry — per-backend probe of available models from provider APIs.
# Configure :backends and :backend_configs to enable periodic probes.
# See MonkeyClaw.ModelRegistry moduledoc for full option descriptions.
config :monkey_claw, MonkeyClaw.ModelRegistry,
refresh_interval_ms: 3_600_000,
workspace_id: nil,
provider_secrets: %{}
backends: [],
default_interval_ms: 3_600_000,
backend_intervals: %{},
backend_configs: %{},
workspace_id: nil

# Configure Elixir's Logger
config :logger, :default_formatter,
Expand Down
39 changes: 39 additions & 0 deletions config/runtime.exs
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,42 @@ if config_env() == :prod do
#
# See https://hexdocs.pm/swoosh/Swoosh.html#module-installation for details.
end

# ── MonkeyClaw.ModelRegistry Baseline ────────────────────────
#
# Baseline entries seed the registry at boot so the agent has a
# floor of known models before any probe runs. Entries are
# structurally validated by MonkeyClaw.ModelRegistry.Baseline.load!/0
# and then trust-boundary validated by CachedModel.changeset/2 inside
# the registry's upsert funnel. Users can override or extend this
# list in their own runtime.exs without rebuilding the release.
config :monkey_claw, MonkeyClaw.ModelRegistry.Baseline,
entries: [
%{
backend: "claude",
provider: "anthropic",
models: [
%{model_id: "claude-opus-4-6", display_name: "Claude Opus 4.6", capabilities: %{}},
%{model_id: "claude-sonnet-4-6", display_name: "Claude Sonnet 4.6", capabilities: %{}},
%{
model_id: "claude-haiku-4-5-20251001",
display_name: "Claude Haiku 4.5",
capabilities: %{}
}
]
},
%{
backend: "codex",
provider: "openai",
models: [
%{model_id: "gpt-5", display_name: "GPT-5", capabilities: %{}}
]
},
%{
backend: "gemini",
provider: "google",
models: [
%{model_id: "gemini-2.5-pro", display_name: "Gemini 2.5 Pro", capabilities: %{}}
]
}
]
63 changes: 58 additions & 5 deletions lib/monkey_claw/agent_bridge/backend.ex
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,37 @@ defmodule MonkeyClaw.AgentBridge.Backend do
@type thread_info :: map()
@type permission_mode :: :default | :accept_edits | :bypass_permissions | :plan | :dont_ask

@typedoc """
Options for listing models. Adapter-specific keys are permitted.

Common keys used by MonkeyClaw adapters:

* `:workspace_id` — Vault workspace for secret resolution
* `:secret_name` — Vault secret name for the backend's API key
* `:probe_deadline_ms` — Hard wall-clock deadline for the probe
"""
@type list_models_opts :: %{
optional(:workspace_id) => Ecto.UUID.t(),
optional(:secret_name) => String.t(),
optional(:probe_deadline_ms) => pos_integer(),
optional(atom()) => term()
}

@typedoc """
Single model descriptor returned by `list_models/1`.

The `:provider` field MUST be present on every entry so the
registry can fan multi-provider backends out into one row per
`(backend, provider)` pair.
"""
# Loose shape by design — CachedModel.changeset/2 performs full trust-boundary validation.
@type model_attrs :: %{
provider: String.t(),
model_id: String.t(),
display_name: String.t(),
capabilities: map()
}

@doc """
Start a new agent session.

Expand All @@ -52,6 +83,27 @@ defmodule MonkeyClaw.AgentBridge.Backend do
"""
@callback start_session(opts :: map()) :: {:ok, session_pid()} | {:error, term()}

@doc """
List the models this backend currently supports.

Called by `MonkeyClaw.ModelRegistry` during boot (baseline delta),
periodic probes, and on-demand refreshes. Does NOT require a live
session — adapters decide internally how to satisfy the request
(HTTP API call, transient CLI init handshake, local manifest
read, etc.).

Implementations should respect their own deadline; the registry
also enforces a hard outer deadline via `Task.shutdown/2` as a
safety net.

Returns a flat list of `model_attrs` maps. A single adapter may
return models from multiple providers in one list (e.g., Copilot
routing both OpenAI and Anthropic); the registry groups by
`:provider` at write time.
"""
@callback list_models(opts :: list_models_opts()) ::
{:ok, [model_attrs()]} | {:error, term()}

@doc """
Stop an agent session.

Expand Down Expand Up @@ -155,15 +207,16 @@ defmodule MonkeyClaw.AgentBridge.Backend do
# ── Checkpoint Operations (Experiment Support) ───────────────

@doc """
Save a checkpoint of the current session state.
Snapshot the given files for later rollback.

Returns a checkpoint identifier that can be used with
`checkpoint_rewind/2` to restore the session to this point.
Captures the content, permissions, and existence of each file in
`file_paths` so that `checkpoint_rewind/2` can restore them.
Returns a checkpoint identifier (UUID) for the snapshot.

Used by the experiment Runner to snapshot state before each
Used by the experiment Runner to snapshot scoped files before each
iteration, enabling rollback on rejection.
"""
@callback checkpoint_save(session_pid(), label :: String.t()) ::
@callback checkpoint_save(session_pid(), label :: String.t(), file_paths :: [String.t()]) ::
{:ok, checkpoint_id :: String.t()} | {:error, term()}

@doc """
Expand Down
65 changes: 50 additions & 15 deletions lib/monkey_claw/agent_bridge/backend/beam_agent.ex
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ defmodule MonkeyClaw.AgentBridge.Backend.BeamAgent do

@behaviour MonkeyClaw.AgentBridge.Backend

alias MonkeyClaw.ModelRegistry.Provider

@impl true
def start_session(opts), do: BeamAgent.start_session(opts)

Expand Down Expand Up @@ -80,29 +82,62 @@ defmodule MonkeyClaw.AgentBridge.Backend.BeamAgent do
@impl true
def thread_list(pid), do: BeamAgent.Threads.thread_list(pid)

# ── Checkpoint Operations ────────────────────────────────────
@impl true
def list_models(opts) when is_map(opts) do
backend = Map.get(opts, :backend)
provider = backend_to_provider(backend)

provider_opts =
opts
|> Map.to_list()
Comment thread
beardedeagle marked this conversation as resolved.
|> Keyword.take([:workspace_id, :secret_name, :api_key, :base_url])

# BeamAgent.Checkpoint may not yet export these functions.
# Suppress Dialyzer warnings; function_exported?/3 guard
# ensures runtime safety until the API is available.
case Provider.fetch_models(provider, provider_opts) do
{:ok, models} ->
{:ok, Enum.map(models, &annotate_provider(&1, provider))}

{:error, _} = error ->
error
end
end

# Map the MonkeyClaw backend identifier to the upstream provider name.
# Static table — future SDK and local backends extend this.
defp backend_to_provider("claude"), do: "anthropic"
defp backend_to_provider("codex"), do: "openai"
defp backend_to_provider("gemini"), do: "google"
defp backend_to_provider("opencode"), do: "anthropic"
defp backend_to_provider("copilot"), do: "github_copilot"
defp backend_to_provider(nil), do: "anthropic"
defp backend_to_provider(other) when is_binary(other), do: other

defp annotate_provider(%{model_id: id, display_name: name, capabilities: caps}, provider) do
%{
provider: provider,
model_id: id,
display_name: name,
capabilities: caps
}
end

# ── Checkpoint Operations ────────────────────────────────────

@impl true
def checkpoint_save(pid, label) do
if function_exported?(BeamAgent.Checkpoint, :save, 2) do
# credo:disable-for-next-line Credo.Check.Refactor.Apply
apply(BeamAgent.Checkpoint, :save, [pid, label])
else
{:error, :not_supported}
def checkpoint_save(pid, label, file_paths) do
with {:ok, info} <- BeamAgent.session_info(pid) do
uuid = "#{label}-#{:erlang.unique_integer([:positive, :monotonic])}"

case BeamAgent.Checkpoint.snapshot(info.session_id, uuid, file_paths) do
{:ok, _cp} -> {:ok, uuid}
{:error, _} = error -> error
end
end
end

@impl true
def checkpoint_rewind(pid, checkpoint_id) do
if function_exported?(BeamAgent.Checkpoint, :rewind, 2) do
# credo:disable-for-next-line Credo.Check.Refactor.Apply
apply(BeamAgent.Checkpoint, :rewind, [pid, checkpoint_id])
else
{:error, :not_supported}
with {:ok, info} <- BeamAgent.session_info(pid) do
BeamAgent.Checkpoint.rewind(info.session_id, checkpoint_id)
end
end
end
Loading