Skip to content

Releases: planetarium/vicoop-bridge

@vicoop-bridge/client@0.35.4

Choose a tag to compare

@github-actions github-actions released this 29 Jun 07:59
7d5134d

Patch Changes

  • ab945bb: upgrade: resolve the latest release from the public GitHub Atom feed
    (github.com/<repo>/releases.atom) instead of the api.github.com REST
    API. The REST API caps unauthenticated requests at 60/hr per IP, so
    vicoop-client upgrade / upgrade --check would fail with a hard
    403 rate limit exceeded on shared provider egress IPs — exactly when
    an operator is rolling out a release (#405). The web feed has its own,
    far more generous anonymous limit and needs no GITHUB_TOKEN.

@vicoop-bridge/client@0.35.3

Choose a tag to compare

@github-actions github-actions released this 29 Jun 05:58
0669aa8

Patch Changes

  • daebd59: fix(claude): classify claude's terminal reason into a structured failure code

    When claude exits non-zero with zero usage it first emits a terminal result
    event whose result field carries the human-readable cause — "You've hit your
    session limit · resets 3pm (UTC)", "... · Rate limited", "529 Overloaded",
    "Prompt is too long". The bridge captured this in finalText but dropped it,
    so the only thing reaching the router was an opaque claude exited with code 1 [stdout: <raw JSON tail>], forcing the router to scrape keywords out of a
    truncated stdout dump.

    Now that reason is used as the failure message verbatim (when present) and
    run through normalizeTaskFailError, so it maps onto a structured terminal
    code — quota_exceeded / rate_limited / upstream_error / login_required
    / … — which the router consumes directly via reasonForTerminalCode (it
    prefers the code over message-pattern matching). The cause travels as
    structured data in the terminal_error.code channel, not a string baked into
    the diagnostic. When claude emits no such reason (a real crash) the message
    falls back to the exit/stdout diagnostic dump so triage data is preserved
    (#119).

    Also teaches normalizeTaskFailError the claude subscription "session limit"
    cap → quota_exceeded (server-side "529 Overloaded" already classifies as
    upstream_error via the existing numeric-status match).

@vicoop-bridge/client@0.35.2

Choose a tag to compare

@github-actions github-actions released this 26 Jun 04:57
16a7e55

Patch Changes

  • b763c48: fix(vicoop-codex): surface in-band SSE error frames as task failures

    vicoop-codex serve relays an upstream /responses error (e.g. "input
    exceeds the context window") as a {"error":{...}} frame on an otherwise-200
    SSE stream. The stream consumer only understood choices-bearing chunks, so
    it silently dropped the error frame, accumulated nothing, and synthesized an
    empty finish_reason:"stop" completion with no usage — the silent
    "Response Generated" with an empty body (and a $0-billed turn).

    Detect an in-band error frame and fail the task with upstream_error
    carrying the upstream message, instead of completing empty.

@vicoop-bridge/client@0.35.1

Choose a tag to compare

@github-actions github-actions released this 24 Jun 13:04
fdfb126

Patch Changes

  • 5b47bf5: fix(client): fixed-segment splitting for openai-compat history cache

    The default-on history cache (#372) emitted the frozen prefix as a single
    growing block
    with one cache_control breakpoint. When the conversation
    advances a turn the block's bytes change, and because Anthropic's cache matches
    at block boundaries (read-lookback walks back up to 20 blocks), there is no
    boundary at the previous turn's freeze point — so the lookback only re-matches
    at the stable system+tools boundary and the entire history is re-created every
    forward turn
    .

    A controlled fresh-data A/B (no pre-warm) confirmed this and corrected the #372
    "caching already works" reading, which had measured pre-warmed cache from
    repeated deterministic runs:

    rollover turn (200→210 entries) non_cached (≈creation) cache_read
    single growing block (before) 220,929 0
    fixed segments (after) 16,013 180,468

    formatChatHistoryBlocks now serializes the frozen prefix as one block per
    FREEZE_STEP_ENTRIES entries at absolute boundaries
    (so older segments never
    re-flow), with cache_control on only the last complete segment. On a
    rollover the new breakpoint's read-lookback finds the prior turn's entry one
    block back at the previous segment boundary and reads the whole frozen prefix,
    re-creating only the new segment + tail — i.e. creation becomes O(step) per turn
    instead of O(full history). Validated at production depth (20 segments): the
    lookback match is always one block back, well within the 20-block window.

    This is what the rolling-anchor approach was reaching for, but it uses one
    breakpoint (reads ride the lookback, not a second explicit anchor) → claude's
    system+tools+1 plus this one = 4, fitting Anthropic's budget. That is why the
    rolling-anchor patch tripped 400 maximum of 4 blocks and this does not. The
    concatenated text the model reads is byte-identical
    (serialize(a) + ",\n" + serialize(b) == serialize(a ++ b)); the existing latch
    still falls back to the unsplit block if a breakpoint is ever rejected.

@vicoop-bridge/client@0.35.0

Choose a tag to compare

@github-actions github-actions released this 22 Jun 04:17
5988023

Minor Changes

  • 29b701d: The bridge usage API now reports a canonical, backend-agnostic shape (BridgeUsage) for both backends, so a single consumer can read remaining quota regardless of backend:

    • claude (new): reports the operator's Claude subscription quota. Reads the Claude Code OAuth token from the host (macOS Keychain, or ~/.claude/.credentials.json on Linux/Windows), calls the authenticated api/oauth/usage endpoint (5-hour / weekly / Sonnet windows + monetary extra-usage), cached ~5 min to respect the endpoint's self-rate-limit. When the read fails it serves the last successful snapshot (stale, annotated) or an explicit source: 'none'.
    • vicoop-codex: its serve /usage payload is now normalised into the same shape (was forwarded verbatim).

    Canonical shape: { backend, source, fetchedAt, accounts: [{ id, label?, plan?, windows: [{ id, label, usedPercent, resetsAt, severity }], spend? }], note?, raw }. Conventions are fixed — usedPercent is 0–100 percent used (remaining = 100 − usedPercent), resetsAt is ISO 8601 — and the verbatim upstream payload is preserved under raw.

    The claude OAuth read path also gained, to match the reference monitor's robustness: $CLAUDE_CONFIG_DIR support for the credentials file; an official-client User-Agent: claude-code/<version> (discovered from the CLI) + Content-Type; Retry-After-aware backoff on 429; serving the last successful snapshot (stale, annotated) on a transient failure; best-effort CLI-delegated token refresh on auth expiry/401; and a retry-storm guard that won't re-send a known-dead token until it rotates.

    The stream's rate_limit_event is captured only to enrich spend.resetsAt (the monthly overage reset the oauth extra_usage block omits); it is deliberately NOT used as a usage fallback, because it reports only the single most-constrained window (e.g. a near-cap overage meter) and would misrepresent the subscription quota.

@vicoop-bridge/client@0.34.0

Choose a tag to compare

@github-actions github-actions released this 18 Jun 07:51
11157b1

Minor Changes

  • 314a0a9: feat(client): openai-compat/v1 reasoning channel (claude + vicoop-codex) +
    shared liveness heartbeat (all backends).

    Reasoning channel. The claude and vicoop-codex backends now forward the
    model's reasoning on a dedicated openai-compat/v1 reasoning channel — a
    separate artifact (claude-reasoning / vicoop-codex-reasoning) carrying
    metadata[openai-compat/v1] = { channel: "reasoning" }, kept on a distinct
    artifact id so reasoning never co-mingles with the answer. The claude side
    surfaces thinking_delta stream events and injects a MAX_THINKING_TOKENS
    budget on openai-compat spawns so Claude Code emits thinking on the wire
    (budget defaults to 8000, configurable via --claude-thinking-budget /
    backends.claude.thinking_budget); the vicoop-codex side surfaces
    delta.reasoning_content chunks (already enabled serve-side via
    summary:"auto", no thinking-enablement injection needed). This lets the
    a2x-internal-router treat a long silent reasoning turn as alive instead of
    false-failing-over it (planetarium/a2x-internal-router#95, #375, #376).

    Each reasoning channel is ON by default and individually disablable:
    --no-claude-reasoning / backends.claude.reasoning: false and
    --no-vicoop-codex-reasoning / backends['vicoop-codex'].reasoning: false.
    Disable the reasoning channel when the deployed oai2a2a codec predates
    0.6.0
    — an old codec doesn't understand the channel marker and would fold the
    reasoning artifact into the answer (the #95 rollout-order hazard). Claude
    redacted-thinking blocks are never forwarded.

    Liveness heartbeat. Every backend's shared task loop (claude, codex,
    openclaw, and now vicoop-codex) emits a tagged liveness heartbeat: the idle
    working task.status beat now fires every 10s of silence (was a per-backend
    30s beat) and carries metadata[openai-compat/v1] = { heartbeat: true }. The
    bridge server maps this onto the A2A TaskStatusUpdateEvent.metadata, where the
    oai2a2a codec (≥0.6.0) translates it to a : a2a-heartbeat SSE comment that
    re-arms the router's first-content / stall watchdog. This keeps a backend that
    is alive but byte-silent (long reasoning, tool runs) observably alive so it
    isn't false-failed-over, while a backend that errors (task.fail) ends the loop
    and stops heartbeating so failover still works
    (planetarium/a2x-internal-router#95). The 10s cadence sits at or below half the
    router's tightened 25–30s window; heartbeats carry no content and are safe to
    emit unconditionally.

@vicoop-bridge/client@0.33.1

Choose a tag to compare

@github-actions github-actions released this 17 Jun 08:30
cb5e04b

Patch Changes

  • d153dd8: Cache openai-compat chat_history on the claude backend (on by default).
    Splits the replayed <chat_history> into a frozen prefix carrying a
    cache_control breakpoint plus a small tail, so stable conversation history
    reads from Anthropic's prompt cache instead of re-billing at full price every
    turn. The split is byte-identical to the previous single block, so the model
    reads the same history.

    It relies on claude's stream-json input forwarding caller cache_control
    (undocumented) and shares the API's 4-breakpoint budget with claude's own
    system/tools markers. If claude ever rejects the breakpoint (e.g. a future CLI
    build whose own markers exhaust the budget), a process-wide latch auto-disables
    the split — that task fails, every later task falls back to the unsplit block,
    and a daemon restart re-arms it. Hard-disable with
    VICOOP_DISABLE_OAI_HISTORY_CACHE=1.

  • e136d46: vicoop-codex backend: emit a per-task timing breadcrumb (debug-gated) that
    stamps serveReady / firstByte / firstDelta / total milestones, so operators
    can split model-wait from streaming time on a slow turn. Opt in with
    VICOOP_CLIENT_LOG_LEVEL=debug; no new output at the default info level.

@vicoop-bridge/client@0.33.0

Choose a tag to compare

@github-actions github-actions released this 12 Jun 10:04
a8f8bab

Minor Changes

  • 9627f0e: Add opt-in crash telemetry. Off by default — the client only loads or
    initializes the Sentry SDK when config.json has "telemetry": "on". Opt in
    with vicoop-client agent register --enable-telemetry (persists the field) or
    by hand-editing config.json; disable by removing the field. When on, only
    crash reports are sent: exception class + stack trace with the operator's home
    path redacted. Tracing is disabled, breadcrumbs/console capture are suppressed,
    and sendDefaultPii is off — so prompts, code, agent output, tokens, and logs
    are never transmitted. The daemon prints a one-line disclosure at registration
    and at startup. DSN is configurable via VICOOP_CLIENT_SENTRY_DSN.
  • 8b91fe3: vicoop-codex backend: report per-account Codex usage to the bridge on request. The client answers the new usage.request frame by querying its local vicoop-codex serve /usage endpoint, which backs the server's admin/owner-only GET /admin-api/agents/:id/usage API.

Patch Changes

  • 7b92dbf: claude backend: report the actual response model in the OpenAI-compatible
    envelope. The envelope's top-level model (and its embedded usage.model)
    now resolve from model ids claude itself reports — the model named on the
    assistant turn, falling back to the system/init resolved model — instead
    of the result.modelUsage largest-output-share heuristic, which on short
    responses could be dominated by an internal sub-model (e.g.
    claude-haiku-4-5-* used for title generation) and mislabel the envelope even
    when the requested override model handled the request. The requested
    envelope.model is deliberately not used as a fallback, since it may be a
    routing slug or an A2A card url rather than a real model id. modelUsage is
    now used only to sum token counts (#348).
  • 00c9a6a: codex backend: recover real token usage on OpenAI-compatible tool-call
    turns by deferring turn/interrupt until codex's
    thread/tokenUsage/updated lands (#351). Previously the bridge
    interrupted the turn the moment the model invoked a caller tool, which
    raced ahead of codex app-server's token accounting — the accounting only
    runs after the bridge answers the item/tool/call request, and an
    interrupt in flight at that point drops the turn's usage everywhere (no
    notification, no turn/completed payload, info: null even in codex's
    own rollout record), so the router billed the request as
    total_tokens=0. The interrupt is now held until the usage notification
    for the turn arrives (measured at 15–40ms on codex 0.139, well ahead of
    the ~500ms a next model iteration needs to start) with a 1s backstop
    timer, configurable via toolCallUsageWaitMs. When codex still reports
    nothing, the {0,0,0} placeholder remains, and the bridge now logs a
    tokenUsage unavailable diagnostic so zero-usage records are
    explainable without --openai-compat-trace.

@vicoop-bridge/client@0.32.0

Choose a tag to compare

@github-actions github-actions released this 10 Jun 06:47
f737b81

Minor Changes

  • 2dab8db: feat(client): multi-model support on the claude backend via --claude-supported-models

    Claude Code has no headless "list models" interface, so the claude backend
    used to advertise — and accept per-request openai-compat model overrides
    for — only a single model (the --claude-model pin or the startup-probed
    default). Operators can now declare additional models their install can
    serve with --claude-supported-models claude-sonnet-4-6,claude-haiku-4-5
    (comma-separated) or backends.claude.supported_models in config.json. Declared ids
    are advertised on the openai-compat params.models[] block after the
    default, and a matching per-request model rides to the spawned claude as
    --model <id>. The envelope.model gate now also matches on the normalized
    (tier-suffix-stripped) form, so a caller selecting e.g.
    claude-opus-4-8[1m] against an advertised claude-opus-4-8 passes through
    with the tier selection intact.

@vicoop-bridge/client@0.31.0

Choose a tag to compare

@github-actions github-actions released this 09 Jun 10:15
377440c

Minor Changes

  • d2d4be4: Add short aliases for the two most-typed daemon flags: -c for --config and -d for --detach (e.g. vicoop-client start -d -c ./config.json). The detached child is now kept in the foreground daemon path by the VICOOP_DETACHED env guard rather than by argv stripping, so the re-exec stays correct even for optique's bundled short flags (-dc value parses as -d -c value).