Skip to content

Releases: DFKHelper/token-goat

v1.5.2

09 Jun 02:29

Choose a tag to compare

Three fixes: Codex hook wire-format compatibility, and two Windows coarse-mtime correctness issues in the cache and session layers.

Codex hook responses now pass schema validation

Codex 0.137.0 validates every hook response against embedded JSON schemas with additionalProperties: false, so any unrecognised key causes "hook returned invalid … JSON output" for the entire response — including SessionStart, PreToolUse, and PostToolUse. The root cause was _tg_elapsed_ms (and sibling _tg_handler/_tg_error fields) added by the internal dispatch() function and then emitted verbatim. The denormalize_response Codex branch now strips all _tg_* keys before output. The same path also injects the required hookEventName const field into hookSpecificOutput — Codex requires it on every hookSpecificOutput shape and token-goat was not emitting it because Claude Code does not require it. A _codex_hook_event_name() helper resolves the correct value (e.g. "pre-read""PreToolUse") from the hook registry. The old camelCase→snake_case key conversion (_translate_hso_to_codex) is no longer applied — Codex 0.137.0+ uses camelCase throughout hookSpecificOutput.

Freshest cache entry survives its own store call's eviction

evict_cache_dir sorts eviction candidates oldest-first by float(st_mtime) with a stable sort. When the just-written (MRU) entry shares a coarse st_mtime with older siblings, the stable sort falls back to arbitrary iterdir order, which on NTFS can place the newest file first and evict it — so a store_output call could delete the very entry it had just written. evict_cache_dir now accepts a protect_ids set that is excluded from the victim list regardless of timestamp, and skill_cache.store_output passes the id it just wrote.

save() refreshes the process-local load cache

session.load() caches (object, mtime) per session and serves the cached object whenever cached_mtime == current_mtime. When a later save()'s post-write timestamp aliased the mtime a previous load() had cached, the proc-cache kept serving the stale pre-save object on the next in-process load() even though the on-disk JSON was correct. save() now overwrites an existing proc-cache entry with the object it just persisted on every successful write.

v1.5.1

08 Jun 17:36

Choose a tag to compare

Correctness fixes for cache size accounting (compressed .gz bodies), surgical reads (oversized-docstring cap, signature-boundary fix), path normalization (uppercase WSL drives), mixed-case skill-compact invalidation, and the Gemini hook bridge (preserve systemMessage, route additionalContext natively), plus two documentation corrections.

See the CHANGELOG for full details.

v1.5.0

08 Jun 04:33

Choose a tag to compare

Context-pressure awareness: one source of truth for how full the window is, and hints that get terser as it fills. Ships alongside three install fixes that restore hook forwarding under editable installs and silence a recurring doctor warning.

Centralized context-pressure model

get_context_pressure(session_id) in compact.py is now the single place that answers how close a session is to autocompaction. It returns a frozen ContextPressure — a fill_fraction paired with a tier of cool, warm, hot, or critical. The estimate sums the known context contributors (loaded skill bodies, the ~10,800-token skills catalog, and per-event costs for bash history, web history, and read files) and divides by the fixed 660,000-token autocompact budget rather than the model's raw window, so the fraction carries the same meaning no matter which model is driving the session. The old _estimate_context_fill helper and the inline calculation in the session hook both defer to it, retiring the copies of the 660 K constant that had spread across half a dozen call sites in favor of one shared CONTEXT_AUTOCOMPACT_TOKENS.

Named tier boundaries

The fraction-to-tier mapping lives in tier_for_fraction(), backed by three named constants: CONTEXT_TIER_WARM (0.50), CONTEXT_TIER_HOT (0.70), and CONTEXT_TIER_CRITICAL (0.85). The bands are cool below 0.50, warm up to 0.70, hot up to 0.85, and critical at or above it. With the magic numbers pulled out of the band checks, the boundaries are defined once and the tests pin them directly.

Pressure-aware surgical-read hints

The pre-read hook tightens its large-file threshold as the window fills. A file earns a surgical-read suggestion past 500 lines while the session is cool, 350 when warm, 200 when hot, and 50 when critical. It also folds a single per-tier note into the read's additional context: "Context warming" at warm, "Context pressure" at hot, "CONTEXT CRITICAL" at critical. The note is fingerprinted by tier, so it fires once per band rather than on every read. Cool sessions get no note.

Smaller manifests under pressure

compute_adaptive_budget now weighs context pressure when it sizes the compaction manifest. Once the window runs hot the budget is capped at 500 tokens, and at critical it drops to 300, so the manifest stops adding to the very problem it exists to summarize.

Install robustness

Hooks no longer silently disable themselves under an editable install. The tg-hook wrapper carries an if not exist "<sentinel>" gate that short-circuits to a bare {"continue":true} during the uv tool install --reinstall race, when the venv's token_goat module is briefly absent. The sentinel used to be a hardcoded site-packages/token_goat/__init__.py path, which never exists under an editable install (uv sync, the project .venv), so the gate stayed permanently true and every hook no-op'd — the whole tool went dark with no error. The wrapper now resolves the sentinel through importlib.util.find_spec("token_goat").origin, which points at src/token_goat/__init__.py for editable installs and site-packages/... for regular ones, and falls back to an ungated wrapper when no sentinel resolves. A live handler emits {"continue": true, "_tg_elapsed_ms": N}; the _tg_elapsed_ms field is the tell that forwarding actually ran.

Re-install purges orphaned tokenwise entries. After the tokenwisetoken-goat rename, a re-install left the old hook and permission lines stranded in settings.json and the Codex config.toml, so both harnesses kept invoking a binary that no longer existed. patch_settings_json and patch_codex_config now strip any pre-rename tokenwise command and permission entry before writing the current ones.

Hook wrapper is written as bytes to stop CRLF doubling. hook_wrapper_content() hand-bakes platform-correct line endings — \r\n on Windows — then was written through atomic_write_text, whose text-mode handle translated every \n to \r\n a second time, doubling each line ending to \r\r\n on disk. cmd.exe tolerated the stray carriage return so forwarding still worked, but doctor does a byte-exact compare of the on-disk wrapper against the regenerated content and warned differs from expected — run token-goat install to refresh on every run, a nag that reinstalling could never clear because it rewrote the same doubled bytes. The wrapper now goes through atomic_write_bytes, preserving the authored endings verbatim.

Session-cache integrity

Concurrent session saves no longer drop an edit. The save() fast path skipped its compare-and-swap re-read and merge whenever the on-disk (st_mtime, st_size) fingerprint still matched the one captured at load. That fingerprint aliases: two caches whose keys are the same length serialize to byte-identical JSON sizes, and a float st_mtime rounds two sub-microsecond writes to the same value. When two writers collided on both fields the second skipped the merge and overwrote the first, losing exactly one edit — the 200-edit concurrency stress test intermittently saw 199. The fast path now consults an in-process version registry so a same-process writer that already advanced the version forces the stale save back through the merge, and the fingerprint is taken from integer st_mtime_ns instead of the rounded float, so a cross-process skip now requires a true nanosecond-and-size collision rather than a rounding coincidence.

v1.3.0

06 Jun 03:44

Choose a tag to compare

[1.3.0] - 2026-06-05

Context growth audit — four changes that cut session context size and make overhead visible.

Context footprint in doctor

token-goat doctor --context now prints a Context footprint section measuring every token source that pads the context window each turn: the skills catalog (~10,800 tokens/turn for a typical install), loaded skill bodies accumulated in system-reminder injections, CLAUDE.md + MEMORY.md meta-files, and the rolling conversation estimate. The section shows fill % against the 660,000-token autocompact threshold, an ETA in turns at the current growth rate, and an Actions block naming the exact commands to run when any loaded skill above 2,000 tokens is missing a compact.

Auto-shown when estimated fill exceeds 40 % or any loaded skill > 2 K tokens lacks a compact; always shown with --context.

Compact pre-generation at install time

token-goat install now runs skill-compact --all as a final step, so compacts are ready before the first session — no post-install warm-up turn required. A sentinel file (skill_pregen_sentinel.json) records the catalog count; the doctor section uses it to detect skills added after the last pre-gen pass.

Per-skill compact advisory in post_skill

When a skill body lands in context, the post_skill hook now reports the compact's token savings inline (pre-generated compacts, sync-generated compacts for bodies < 40 KB, background-generated for larger bodies, info-only when no worker is running). Advisory fires only for bodies above 8 KB to stay silent for tiny skills.

Threshold-crossing context advisory in user_prompt_submit

A lightweight ETA advisory fires the first time estimated context fill crosses 50 % and again at 70 %. The message is appended to the existing status line (bracket-joined, not a separate injection) and references /compact now at 70 %. Resets after each compact. Configurable via hints.context_threshold_advisory = false.

v1.2.0

05 Jun 20:46

Choose a tag to compare

[1.2.0] - 2026-06-05

14 commits since v1.1.0. Output overflow guard, cross-platform path normalization fixes, and a reliability pass.

Output Overflow Guard

Surgical-read commands (symbol, read, section, bash-output, web-output, and the rest) now cap oversized output before it reaches the model. When estimated tokens exceed the cap, the output is head-truncated on a line boundary. A marker line is appended naming the cap, the truncation ratio, and the narrowing action — symbol users get directed toward file::Class.method lookups, section users toward sub-headings, cached-output users toward --grep/--tail.

Default cap: 25,000 tokens. Configure via [overflow_guard] max_tokens in config.toml, override with TOKEN_GOAT_OVERFLOW_MAX_TOKENS=<n>, or disable with TOKEN_GOAT_OVERFLOW_GUARD=0 / [overflow_guard] enabled = false.

The estimator is deliberately conservative — 3 chars/token, same rate as the compaction manifest — so the cap is never under-applied. ANSI escapes are stripped before estimation since color codes inflate length without adding model-visible tokens. A single-line blob (no internal newlines) is sliced at the char budget so it cannot pass through whole.

Cross-Platform Path Normalization

Two fixes that make path-keyed caches work correctly across Windows, WSL, and Linux:

normalize_path / paths.normalize_key — Drive-letter lowercasing (C:c:) is now unconditional. The previous guard sys.platform == "win32" meant a WSL process that emits a Windows-format path (C:/Users/…) produced a different cache key than a native Windows process reading the same file. Both now produce c:/users/….

hooks_skill.post_skill — Windows-style backslash paths like C:\Users\user\.claude\skills\ralph were not stripped on Linux because the inline guard used _os.sep (/ on Linux) instead of the string literal "\\". The inline block is now a call to _normalize_skill_name, which hardcodes "\\" and handles both separator styles on every platform.

Reliability

  • Worker dirty-queue torn writes. Concurrent _append_dirty calls could produce truncated or concatenated JSON lines under write contention. An OS-level file lock (fcntl on POSIX, msvcrt on Windows) now serializes appends, same as the session cache.
  • SQLite WAL checkpoint mode. Changed from RESTART to PASSIVE on connection open. RESTART waited for all readers to drain, blocking hook subprocesses for hundreds of milliseconds during active indexing. PASSIVE checkpoints cooperatively and does not wait.

v1.1.0

05 Jun 01:43

Choose a tag to compare

57 commits since v1.0.1. Six new language indexers, twenty-plus CLI commands and flags, a pre-skill hook that cuts repeat skill loads from 40–65k tokens to ~400, pnpm/yarn/bun compress filters, rg/grep dedup hints, double-daemon prevention, and a reliability pass with 400+ new tests.

Highlights

  • Skill re-load prevention. A new PreToolUse(Skill) hook fires before every Skill invocation. When a skill was already loaded in the current session, the reload is blocked and the cached compact (~400 tokens) is served instead. A repeat /ralph or /superman invocation in the same session now costs ~400 tokens, not 40–65k.
  • New language indexers. CSS/SCSS, SQL, GraphQL, Protobuf, .env, and Makefile. All participate in token-goat symbol, read, outline, scope, and dedup hints.
  • New CLI flags. symbol --context N, symbol --json, outline --min-lines, outline --max-depth, web-output --list, map --filter, stats --since, token-goat recent, bash history exit codes.
  • Package manager filters. pnpm, yarn, and bun compress filters. pnpm run/yarn run route through their own filter.
  • rg/grep dedup. Bash rg/grep invocations now fire dedup hints the same way the native Grep tool does.
  • Top-5 file guarantee. The five most-accessed files always appear in the compaction manifest.
  • Double-daemon prevention. JSON PID files, cross-interpreter startup guard, worker --kill-duplicate, worker --status, install --check.

Full changelog: https://github.com/DFKHelper/token-goat/blob/main/CHANGELOG.md

v1.0.1

02 Jun 21:43

Choose a tag to compare

Bundles two 50-commit improvement runs: a skill-cache / context-savings accuracy loop and a general quality loop.

Highlights:

  • Skill cache: source_sha stale-compact detection, separate compact/body eviction buckets, sidecar schema v2, lazy skill injection, gzip compression
  • Stats accounting fixed for bash_output_cached, skill_cached, web_output_cached, and surgical-read lookup savings (were always 0)
  • Serve-diff-on-reread, session-hint cooldown, unified token formula, stats category grouping
  • RuffFilter and MypyFilter bash-compress support
  • Type safety, error handling, performance (hoisted regex), security (0o600 lock files), DRY helpers, debug log coverage
  • 55 new tests

See CHANGELOG for full details.

v1.0.0

29 May 14:53

Choose a tag to compare

First stable release. Promotes the [Unreleased] block covering the 35-iter /improve run: compaction hardening, doctor visibility, opt-in observability, four new bash-compress filters, surgical-read hints, and reliability fixes.

See CHANGELOG.md for the full change list.

v0.9.0

25 May 21:15

Choose a tag to compare

Highlights

Bundles three improvement loops landed since 0.8.0 (37-iter context/compaction on 2026-05-25, 68-iter reliability/perf on 2026-05-24, 55-iter context-savings baseline).

Security

  • DNS-rebinding window closed in SSRF guard: webfetch.py resolves once and pins the connection to that IP via a custom transport, so a hostile DNS server can't return a public IP to validation and a private IP (e.g. 169.254.169.254 IMDS) to the reconnect.
  • paths.safe_join() promoted as the canonical fragment joiner; two raw joins that took user-controlled session_ids now flow through it.
  • dispatch() ensures continue=true so handlers returning {} can't become harness-blocking responses.
  • webfetch sidecar path-traversal fix: validates that shrunk_path resolves inside the cache roots before writing.
  • PIL decode-bomb cap: MAX_IMAGE_PIXELS set to prevent multi-gigapixel decompression crashes.

Reliability

  • Hook registry consolidated to single source of truth (hook_registry.py); a startup _assert_hook_registry_aligned() raises ImportError if any event lacks a @hook_app.command decorator. Eliminates the recurring registry drift bug class.
  • Persistent hook wrapper at data_dir/bin/tg-hook.cmd survives uv tool install --reinstall; emits {"continue":true} and exits 0 when the venv is briefly absent.
  • paths.ensure_dir() retry helper for the Windows mkdir(parents=True, exist_ok=True) race where is_dir() returns stale False after a concurrent create.
  • Surrogate-escape crash fix in post_bash (1,311 crashes/week in production).
  • Orphaned project GC: reclaims 2.3 GB on the audited install with a 30-min safety window.
  • Session lock hardened: 5 s timeout, jittered poll backoff, fsynced PID write, rejects invalid PIDs.
  • Worker SIGTERM-aware drain loop via _GRACEFUL_SHUTDOWN event.
  • Hook crash log at hooks-stderr.log (100 KB cap, .prev rotation).
  • Concurrent dirty-queue write protected by fcntl/msvcrt locks.
  • Session CAS re-applies size caps after merge so a race can't inflate the JSON beyond limits.
  • Hooks stderr log test isolation: 230 KB / 316 crash blocks of test garbage were polluting the production sink.

Token savings — manifest, hints, hot path

  • Manifest format shortening: L:X-Y instead of lines X-Y, bash entries drop id=/shorten exit= to e=, lower _MAX_TODO_SUBJECT_CHARS (~71 tokens/manifest).
  • Active-skills section collapsed to a single line (**Skills:** name1, name2 — recall via ...) (~160 tokens/6-skill manifest).
  • Manifest Delta line: +sections that grew since last compact.
  • Bash entries grouped by exit class (Failed/Slow/Ok); suppresses TODOs that reference edited files.
  • Manifest bold-label bundle: ### Edited:**Edited:**, **Syms:** for inline labels.
  • Manifest SHA sidecar cache: pre_compact writes a sentinel and rebuilds only when the session SHA differs.
  • Cross-session grep dedup via global.db::grep_patterns.
  • Adaptive _MAX_BASH_ENTRIES scales with bash_history length.
  • Clean-repo session brief collapses to <branch> (clean) one-liner.
  • status_lines cap (50 + (+N more files)).
  • Single rev-list + adaptive git-log entry count; in-sync repos skip the git-log section.
  • WebFetch HTML strip before caching (60-90% byte reduction).
  • Process-local LRU on session.load() (cap 4, mtime-keyed).
  • Pre-Read structured-file hint (CSV headers, JSON top-level keys, log entry count) ~70% smaller.
  • Pre-Read index-only file suppression (lockfiles, source maps, build artifacts).
  • AVIF image-shrink support (~15% smaller than WebP when libaom available).
  • Hint fingerprint includes file path to prevent false positives across files.
  • Session-level hint budget caps (files 5, bash 3, web 2, skills 4).

Performance

  • Test suite 22% faster: eviction tests use patch.object(session, "save") instead of 200-500 real disk writes.
  • Hook cold-start: 190 ms → 110 ms (~42% faster) via lazy imports in hooks_session.py.
  • Compact-skip sentinel: <1 ms exit on fresh sessions when no edits since the last manifest.
  • Skip git ops entirely when cwd is not a repo (saves 60-100 ms per hook fire in non-repo dirs).
  • DB contention metric surfaced in doctor.
  • pytest-xdist --dist=loadscope for module-scoped fixture reuse across tests.

DRY

  • 16 git subprocess sites consolidated through util.run_git() (always --no-optional-locks + UTF-8 with errors="replace").
  • cache_common.safe_cache_op context manager + store_blob for atomic blob writes + short_content_hash() shared across bash/web/skill caches.
  • paths.safe_join(), paths.hook_wrapper_path(), paths.normalize_key() promoted as canonical helpers.
  • util.sanitize_surrogates, util.humanize_bytes, util.ellipsize cross-module helpers.
  • Session 6-item helper bundle (safe_load, _merge_lists, _cap_dict, _bump_read_count, _session_path, _atomic_write).

Codex / opencode / openclaw compatibility

  • Wire-format round-trip tests for every hook event across all four harnesses (Claude / Codex / opencode / openclaw); bridge TS event-table alignment regression test.
  • PowerShell bash_parser coverage: Get-Content, gc, type ambiguity guard, equals-form flags.
  • Bash dispatch + golden-output tests (+151) across all 17 compression filters.

CI

  • Workflow split into gating fast tier (-m "not slow") + opt-in slow tier with continue-on-error: true so racy multi-process stress tests don't block the build.

Suite at end of loop: 5051 pass / 28 skipped (started this release cycle at 4598; +453 tests added).

Full details in CHANGELOG.md.

v0.8.0

24 May 01:25

Choose a tag to compare

Added

  • Skill preservation through compaction. Every PostToolUse(Skill) invocation captures the loaded skill body to a persistent on-disk cache (data_dir()/skills, 5 MB LRU-evicted) keyed by (session, skill_name, content_sha). The compaction manifest gains an ### Active Skills section listing every loaded skill with a token-goat skill-body <name> recall hint, and the post-compact recovery hint surfaces the same list under **Skills**:. Solves the "I forgot parts of the skill after compaction" problem — load-bearing prose (Ralph's DoD gates, /improve's iteration sequence, any multi-thousand-token protocol skill) is recoverable without re-invoking the skill, which would replay any side effects and pollute the conversation with a fresh tool-result block. Configurable via config.toml [skill_preservation] (enabled, max_cache_bytes) or disabled at runtime via TOKEN_GOAT_SKILL_PRESERVATION=0. Default-on.

  • token-goat skill-body <name> — retrieve a cached skill body by name. Defaults to a head+tail view for large bodies; pass --full for everything, or narrow with --head N, --tail N, --grep PATTERN. Falls back to reading the original ~/.claude/skills/<name>/SKILL.md (or plugin-path equivalent) when the cache entry has been evicted but the source path was recorded.

  • token-goat skill-history — list cached skill bodies (newest first) with their IDs, byte sizes, ages, and skill names.

  • Skill marker (🧠) in the compaction manifest legend — joins edited=✎, read=→, stale=⚠, cold=❄ so the compaction LLM has a stable glyph vocabulary for every section type.

  • 4-section recovery hint allocator. _allocate_recovery_slots now distributes 18 total slots across Files / Bash / Web / Skills with skill loads taking priority in the greedy expansion pass.