Summary
Surface a session's token usage directly in the chat toolbar, so you can eyeball context-window occupancy and cost without opening the hidden SessionDebugPanel.
Problem
Token usage is already captured per run (RunEntry.usage) for every backend (Claude, Codex, OpenCode, Cursor), but it's only visible inside the debug panel. While working a session you have no quick signal for:
- How full the model's context window is right now.
- How many input/output tokens the session has burned so far.
Proposal
Add a compact usage chip to ChatToolbar (next to the Send button):
- Chip body: current context-window size — the last turn's full prompt (
input + cache_read + cache_creation), e.g. 180k ctx.
- Tooltip: last-turn context size plus session totals (input, output, cache read, cache creation).
- Hidden when the session has no usage yet (new/archived sessions) and on narrow viewports.
Cross-backend correctness
Usage shape differs per backend, so the implementation normalizes everything to one convention (Anthropic-style: input_tokens excludes cached tokens, cache counters are additive):
- Claude —
result.usage is already Anthropic-style. Context must sum all input-side counters (bare input_tokens is just the uncached slice and badly understates real size).
- Codex —
thread/tokenUsage/updated payload is ThreadTokenUsage { last, total, … }; the per-turn breakdown lives under .last. inputTokens is OpenAI-style (includes cached), so it's normalized to input = inputTokens − cachedInputTokens.
- Cursor — already Anthropic-style field names; no change.
- OpenCode —
step-finish.tokens separates cache.read/cache.write from input; Anthropic-style, no change.
This is backend-specific, not model-specific, so it works for all LLMs each backend exposes.
Implementation notes
- Rust:
Session gains total_usage (session billing total) and latest_usage (last run's breakdown), both derived in SessionMetadata::to_session() from runs[].usage — no new write path, no migration (existing runs already carry usage).
- Frontend: new
SessionUsageChip component wired through ChatToolbar ← ChatWindow.
- No new Tauri commands — only schema additions on the existing
Session payload.
Screenshots
Acceptance criteria
Summary
Surface a session's token usage directly in the chat toolbar, so you can eyeball context-window occupancy and cost without opening the hidden
SessionDebugPanel.Problem
Token usage is already captured per run (
RunEntry.usage) for every backend (Claude, Codex, OpenCode, Cursor), but it's only visible inside the debug panel. While working a session you have no quick signal for:Proposal
Add a compact usage chip to
ChatToolbar(next to the Send button):input + cache_read + cache_creation), e.g.180k ctx.Cross-backend correctness
Usage shape differs per backend, so the implementation normalizes everything to one convention (Anthropic-style:
input_tokensexcludes cached tokens, cache counters are additive):result.usageis already Anthropic-style. Context must sum all input-side counters (bareinput_tokensis just the uncached slice and badly understates real size).thread/tokenUsage/updatedpayload isThreadTokenUsage { last, total, … }; the per-turn breakdown lives under.last.inputTokensis OpenAI-style (includes cached), so it's normalized toinput = inputTokens − cachedInputTokens.step-finish.tokensseparatescache.read/cache.writefrominput; Anthropic-style, no change.This is backend-specific, not model-specific, so it works for all LLMs each backend exposes.
Implementation notes
Sessiongainstotal_usage(session billing total) andlatest_usage(last run's breakdown), both derived inSessionMetadata::to_session()fromruns[].usage— no new write path, no migration (existing runs already carryusage).SessionUsageChipcomponent wired throughChatToolbar←ChatWindow.Sessionpayload.Screenshots
Acceptance criteria