Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,7 @@ The canonical list lives in [`models.json`](./models.json) — the single source
|----------|--------|-------------|
| `/v1/models` | GET | List available models |
| `/v1/chat/completions` | POST | Chat completion (streaming + non-streaming) |
| `/health` | GET | Comprehensive health check |
| `/health` | GET | Comprehensive health check (includes a `tui` block for TUI-mode drift/concurrency monitoring) |
| `/usage` | GET | Plan usage limits + per-model stats |
| `/status` | GET | Combined overview (usage + health) |
| `/settings` | GET/PATCH | View or update settings at runtime |
Expand Down Expand Up @@ -898,6 +898,7 @@ Future `ocp update` invocations sync automatically.
| `OCP_TUI_CWD` | `$HOME/.ocp-tui/work` | (TUI-mode) Scratch working directory where interactive claude sessions run. Transcripts land under `<HOME>/.claude/projects/<encoded-cwd>/`. Created automatically. |
| `OCP_TUI_HOME` | `$HOME` (real home) | (TUI-mode) `HOME` claude runs under. Default is the operator's real home (shared credentials, existing onboarding). Set to a separate path for scratch-home isolation — see ADR 0007 for the credential-fork caveat. |
| `OCP_TUI_ENTRYPOINT` | `cli` | (TUI-mode) Billing-classifier labeling: `cli` (default) pins `cc_entrypoint=cli` deterministically; `auto` lets claude self-classify via TTY detection; `off` leaves the inherited env untouched. Honest only when the spawn is a genuine interactive PTY — see ADR 0007. |
| `OCP_TUI_MAX_CONCURRENT` | `2` | (TUI-mode) Max concurrent interactive TUI turns. **Independent** of `CLAUDE_MAX_CONCURRENT` (which bounds the `-p`/stream-json path; TUI never uses it). A TUI turn is heavy (per-request cold-boot of tmux+claude + up to `CLAUDE_TUI_WALLCLOCK_MS` wallclock), so the default is low to keep small hosts (e.g. a Pi 4) alive under a burst. Excess turns **queue** (bounded); a full queue yields a 503. See ADR 0007 PR-B amendment. |
| `OCP_SKIP_AUTH_TEST` | *(unset)* | When `=1`, skip the `claude -p` auth probe during `setup.mjs`. After 2026-06-15 this probe draws from the Agent SDK credit pool; set this to avoid burning a metered credit on re-installs or `ocp update` runs. Auth is validated at the first real request. |
| `OCP_TUI_FULL_TOOLS` | *(unset)* | (TUI-mode, **single-user only**) When `=1`, grant the interactive session the **same tool surface as the `-p` path** — `--allowedTools` (+ optional `--mcp-config` / `--dangerously-skip-permissions`, read from `CLAUDE_ALLOWED_TOOLS` / `CLAUDE_MCP_CONFIG` / `CLAUDE_SKIP_PERMISSIONS`) — instead of the default MCP-walled, built-in-tools-only set. Lets a trusted single-operator TUI deployment run a **tool-using / MCP agent** (e.g. an OpenClaw assistant) on the subscription pool. Safe because TUI **refuses to boot under `AUTH_MODE=multi`** (hard exit) — no guest key can ever reach the TUI path, so this gate cannot expose tools to an untrusted caller. (Under `AUTH_MODE=shared` + `OCP_TUI_ALLOW_LAN=1`, anyone holding the single shared key reaches it — that is the existing TUI trust model, unchanged.) See [Subscription-pool (TUI) mode](#subscription-pool-tui-mode) and ADR 0007. |

Expand Down Expand Up @@ -965,6 +966,25 @@ Then restart OCP. At boot you will see:
- **Cache and singleflight work normally.** TUI-mode writes the buffered response to the cache on success; cache-hits skip the interactive turn entirely.
- **The host's `CLAUDE.md` / auto-memory is never injected.** OCP is a proxy — the proxied client (OpenClaw / your IDE) owns its own context and memory. TUI-mode always runs `claude` with `CLAUDE_CODE_DISABLE_CLAUDE_MDS` + `CLAUDE_CODE_DISABLE_AUTO_MEMORY`, so a `CLAUDE.md` on the OCP host can never leak into proxied turns (verified live; see #4). Built-in tool schemas + the interactive system prompt remain (the inherent ~20–35K context floor of interactive mode); MCP is hard-disabled.
- **Default path unchanged.** Unset `CLAUDE_TUI_MODE` and restart → `callClaude` / `callClaudeStreaming` are used again, byte-for-byte identical to today.
- **Concurrency is bounded separately.** TUI turns are heavy (per-request cold-boot + long wallclock), so the TUI path has its own limiter — `OCP_TUI_MAX_CONCURRENT` (default `2`), independent of `CLAUDE_MAX_CONCURRENT`. Excess turns queue; a full queue returns a 503. Tune it up only on a host that can run more interactive `claude` sessions at once.

### Monitoring drift via `/health`

`GET /health` includes a `tui` block so you can poll for a silent billing-pool drift (the top risk after the 6/15 flip — a lost TTY flipping `cc_entrypoint` from `cli` to the metered `sdk-cli` pool would still return answers but burn metered credits). The block is **always present** (with `enabled:false` when TUI-mode is off):

```jsonc
"tui": {
"enabled": true, // CLAUDE_TUI_MODE === "true"
"entrypointMode": "cli", // OCP_TUI_ENTRYPOINT (cli | auto | off)
"lastEntrypoint": "cli", // last cc_entrypoint observed in a transcript, or null
"entrypointMismatches": 0, // count of cli-expected-but-got-other turns — ALERT if this climbs
"inflight": 1, // TUI turns running right now
"queued": 0, // TUI turns waiting for a concurrency slot
"maxConcurrent": 2 // OCP_TUI_MAX_CONCURRENT
}
```

Alert on `entrypointMismatches > 0` (or `lastEntrypoint !== "cli"`): it means a turn drew from the metered Agent SDK pool instead of the subscription. `inflight` / `queued` show how close the TUI path is to its concurrency cap.

### Kill-switch

Expand Down
95 changes: 95 additions & 0 deletions docs/adr/0007-tui-interactive-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,101 @@ B-path is **deferred** and is not implemented in this ADR. Until B-path lands, T

---

## Observability and concurrency (PR-B amendment)

**Date:** 2026-06-10
**Status:** Accepted — amends ADR 0007.
**Motivation:** the post-PR-A code audit, findings C-4 (P1) and C-5 (P1).

### C-4 — independent concurrency bound for the TUI path

The global `MAX_CONCURRENT` gate lives in `spawnClaudeProcess()` (the `-p` / stream-json
path). `callClaudeTui()` never calls `spawnClaudeProcess` — it calls `runTuiTurn()`, which
cold-boots a full interactive `claude` inside a fresh tmux session. So the TUI path had **no**
concurrency bound: N concurrent TUI requests spawned N simultaneous cold-boot tmux+claude
processes. On a small host (e.g. a Pi 4 serving a family) a burst of ~5 is an OOM risk and
also multiplies subscription rate-limit pressure.

PR-B adds an **independent** limiter for the TUI path (`lib/tui/semaphore.mjs`,
`TuiSemaphore`):

- **`OCP_TUI_MAX_CONCURRENT`, default `2`.** Rationale: a TUI turn is heavy — a per-request
cold-boot of tmux+claude plus up to `CLAUDE_TUI_WALLCLOCK_MS` (120 s) of wallclock — so a
small host cannot run many at once. `2` is the conservative default that keeps a Pi-class
host alive under a family burst while still allowing some overlap. It is deliberately **not**
the same knob as `MAX_CONCURRENT` (default 8): the two pools have different shapes (a
stream-json spawn is cheap and fast; a TUI turn is a heavy cold-boot + long wallclock), so
coupling them would mis-size one of the two paths.
- **Queue, don't reject.** The limiter **queues** (awaits a slot), mirroring the spirit of
`MAX_CONCURRENT` — requests are not dropped on contention. To bound memory against a runaway
client, the wait queue itself is capped (`maxQueue`, default 32× the limit); when the queue
is full `run()` rejects with `tui_queue_full`, surfaced as a 503 — deterministic backpressure
rather than silent OOM.
- **Slot released in a `finally`.** `TuiSemaphore.run(fn)` releases the slot in a `finally`, so
any throw — PR-A's honesty gates (`tui_wallclock_truncated`, `tui_upstream_error`), a
`tui_paste_not_landed`, or a `tui_spawn_failed` — can never leak a slot.

This limiter has **zero effect when `TUI_MODE` is off**: `callClaudeTui` is never reached, so
the semaphore is never entered. The default stream-json path is untouched.

### C-5 — operator-visible drift surface on `/health` (additive)

The `tui_entrypoint_mismatch` warning only reached journald. After the 2026-06-15 flip, a
silent `sdk-cli` drift (the documented top risk in this ADR — a lost TTY flipping the
self-classification to the metered Agent SDK pool) would drain metered credits **invisibly**.
PR-B adds a `tui` block to the `/health` JSON response so an operator can poll it:

```
tui: {
enabled: <TUI_MODE>,
entrypointMode: <OCP_TUI_ENTRYPOINT>, // cli | auto | off
lastEntrypoint: <last observed cc_entrypoint, e.g. "cli", or null>,
entrypointMismatches: <count of cli-expected-but-got-other turns>,
inflight: <current concurrent TUI turns>,
queued: <turns waiting for a slot>,
maxConcurrent: <OCP_TUI_MAX_CONCURRENT>
}
```

`lastEntrypoint` is recorded and `entrypointMismatches` incremented inside `callClaudeTui` in
the same mismatch branch that already emits the journald warning (via `recordTuiEntrypoint`).
`inflight` / `queued` / `maxConcurrent` come from the C-4 semaphore. When `TUI_MODE` is off the
block still appears with `enabled:false` (cheap, harmless) so the response shape is stable for
consumers regardless of mode.

### ALIGNMENT authorization for the `/health` change

`/health` is a **grandfathered B.2 endpoint** under ADR 0006, frozen at its v3.16.4 behaviour.
`ALIGNMENT.md`'s grandfather provision states: *"Any change to the contract (request shape,
response shape, semantics) of a grandfathered B.2 endpoint is treated as a new authorization
request and requires either a behaviour-preserving refactor PR or its own ADR."*

This amendment **is** that authorization. The argument:

- The change is **additive**: it adds one new top-level field (`tui`) containing only new
sub-fields. **No existing `/health` field is changed, renamed, removed, or re-typed**, and no
existing semantics change. Existing `/health` consumers (the dashboard, `ocp-connect`,
monitoring) read the fields they already read and are unaffected — the change is
**behaviour-preserving** for them, which is exactly the bar the grandfather provision sets for
a non-ADR contract change.
- The TUI observability surface is an **intrinsic part of the TUI feature** whose authorizing
authority is **this ADR (0007)**, not a brand-new B.2 endpoint. We are not adding a new B.2
endpoint or a new method (which would each require their own fresh ADR under the New Class B
endpoint procedure) — we are extending the response of an existing grandfathered endpoint with
fields that report state owned by an ADR-0007 feature. ADR 0007 is the natural home for that
authority, and this amendment records it explicitly.
- `cli.js` does not perform this operation — `/health` is OCP-owned (Class B), so no `cli.js`
citation applies; the citation is this ADR + ADR 0006 (grandfathered B.2) per
`ALIGNMENT.md`'s Class B citation requirement.

### `OCP_TUI_MAX_CONCURRENT` summary

| Env var | Default | Meaning |
|---|---|---|
| `OCP_TUI_MAX_CONCURRENT` | `2` | Max concurrent interactive TUI turns. Independent of `CLAUDE_MAX_CONCURRENT` (the stream-json path). Excess turns queue (bounded); a full queue yields a 503. |

---

## Consequences

### Positive
Expand Down
102 changes: 102 additions & 0 deletions lib/tui/semaphore.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
// TUI-path concurrency limiter (audit finding C-4).
//
// WHY THIS EXISTS, SEPARATE FROM server.mjs's MAX_CONCURRENT:
// The global MAX_CONCURRENT gate lives in spawnClaudeProcess() (the -p / stream-json
// path). callClaudeTui() NEVER calls spawnClaudeProcess — it calls runTuiTurn(), which
// boots a full interactive `claude` inside a fresh tmux session. So nothing bounded the
// TUI path: N concurrent TUI requests spawned N simultaneous cold-boot tmux+claude
// processes. On a small host (a Pi 4 serving a family) a burst of ~5 is an OOM risk, and
// it also multiplies subscription rate-limit pressure. This is an INDEPENDENT limiter for
// the TUI path that mirrors MAX_CONCURRENT's intent without coupling to it (the two pools
// are different shapes: a stream-json spawn is cheap and fast; a TUI turn is a heavy
// cold-boot + up to 120s wallclock).
//
// QUEUE vs REJECT: we QUEUE (await a slot), mirroring the spirit of MAX_CONCURRENT's
// intent not to drop requests, rather than rejecting immediately. To avoid unbounded
// memory growth from a runaway client, the wait queue itself is bounded by maxQueue
// (default: a generous multiple of the concurrency limit). When the queue is full, run()
// rejects with a tui_queue_full error (the caller surfaces it as a 503) — a deterministic
// backpressure signal rather than silent OOM.
//
// Pure + importable so test-features.mjs can assert the bound directly (no server boot).

export class TuiSemaphore {
// limit: max concurrent slots. maxQueue: max waiters before run() rejects with backpressure.
constructor(limit, { maxQueue } = {}) {
this.limit = Math.max(1, parseInt(limit, 10) || 1);
// Default queue cap: 32× the limit. Large enough that real family-burst traffic never
// hits it, small enough that a pathological flood can't grow the queue without bound.
this.maxQueue = Number.isFinite(maxQueue) ? maxQueue : this.limit * 32;
this._inflight = 0;
this._waiters = []; // FIFO queue of resolve callbacks waiting for a slot
}

get inflight() { return this._inflight; }
get queued() { return this._waiters.length; }

// Acquire a slot. Resolves once a slot is free (immediately if under the limit, otherwise
// when an in-flight task releases). Rejects synchronously-ish if the wait queue is full.
acquire() {
if (this._inflight < this.limit) {
this._inflight++;
return Promise.resolve();
}
if (this._waiters.length >= this.maxQueue) {
return Promise.reject(new Error(
`tui_queue_full: TUI concurrency limit (${this.limit}) reached and wait queue ` +
`(${this.maxQueue}) is full`));
}
return new Promise((resolve) => { this._waiters.push(resolve); });
}

// Release a slot. If a waiter is queued, hand the slot directly to it (inflight stays
// constant across the handoff); otherwise decrement.
release() {
const next = this._waiters.shift();
if (next) {
next(); // the woken waiter already "owns" the slot — inflight unchanged
} else if (this._inflight > 0) {
this._inflight--;
}
}

// Run fn() under one slot. Releases in a finally so a throw (PR-A's honesty gates,
// wallclock truncation, paste-not-landed, tmux spawn failure) NEVER leaks a slot.
async run(fn) {
await this.acquire();
try {
return await fn();
} finally {
this.release();
}
}
}

// ── TUI drift observability (audit C-5) — pure helpers, importable for testing ──

// Record an observed cc_entrypoint into the (mutable) tuiStats counter. Sets lastEntrypoint
// unconditionally and increments entrypointMismatches when the spawn was supposed to be
// subscription-pool ("cli") but the transcript reported something else (a silent drift to
// the metered Agent SDK pool — the audit's top risk after the 6/15 billing flip).
// Returns true iff this observation was a mismatch (so the caller can also emit a log).
export function recordTuiEntrypoint(tuiStats, observed, expectedMode = "cli") {
tuiStats.lastEntrypoint = observed ?? null;
const mismatch = expectedMode === "cli" && observed !== "cli";
if (mismatch) tuiStats.entrypointMismatches++;
return mismatch;
}

// Build the additive /health `tui` block (ADR 0007 PR-B amendment). Pure: given the
// config + live counters, returns the exact object embedded in /health. New fields only —
// behaviour-preserving for existing /health consumers (grandfathered B.2 under ADR 0006).
export function buildTuiHealthBlock({ enabled, entrypointMode, maxConcurrent }, tuiStats, semaphore) {
return {
enabled,
entrypointMode, // cli | auto | off
lastEntrypoint: tuiStats.lastEntrypoint, // last observed cc_entrypoint, or null
entrypointMismatches: tuiStats.entrypointMismatches,
inflight: semaphore.inflight, // current concurrent TUI turns
queued: semaphore.queued, // turns waiting for a slot
maxConcurrent,
};
}
Loading
Loading