Releases: DFKHelper/token-goat
v1.5.2
Three fixes: Codex hook wire-format compatibility, and two Windows coarse-mtime correctness issues in the cache and session layers.
Codex hook responses now pass schema validation
Codex 0.137.0 validates every hook response against embedded JSON schemas with additionalProperties: false, so any unrecognised key causes "hook returned invalid … JSON output" for the entire response — including SessionStart, PreToolUse, and PostToolUse. The root cause was _tg_elapsed_ms (and sibling _tg_handler/_tg_error fields) added by the internal dispatch() function and then emitted verbatim. The denormalize_response Codex branch now strips all _tg_* keys before output. The same path also injects the required hookEventName const field into hookSpecificOutput — Codex requires it on every hookSpecificOutput shape and token-goat was not emitting it because Claude Code does not require it. A _codex_hook_event_name() helper resolves the correct value (e.g. "pre-read" → "PreToolUse") from the hook registry. The old camelCase→snake_case key conversion (_translate_hso_to_codex) is no longer applied — Codex 0.137.0+ uses camelCase throughout hookSpecificOutput.
Freshest cache entry survives its own store call's eviction
evict_cache_dir sorts eviction candidates oldest-first by float(st_mtime) with a stable sort. When the just-written (MRU) entry shares a coarse st_mtime with older siblings, the stable sort falls back to arbitrary iterdir order, which on NTFS can place the newest file first and evict it — so a store_output call could delete the very entry it had just written. evict_cache_dir now accepts a protect_ids set that is excluded from the victim list regardless of timestamp, and skill_cache.store_output passes the id it just wrote.
save() refreshes the process-local load cache
session.load() caches (object, mtime) per session and serves the cached object whenever cached_mtime == current_mtime. When a later save()'s post-write timestamp aliased the mtime a previous load() had cached, the proc-cache kept serving the stale pre-save object on the next in-process load() even though the on-disk JSON was correct. save() now overwrites an existing proc-cache entry with the object it just persisted on every successful write.
v1.5.1
Correctness fixes for cache size accounting (compressed .gz bodies), surgical reads (oversized-docstring cap, signature-boundary fix), path normalization (uppercase WSL drives), mixed-case skill-compact invalidation, and the Gemini hook bridge (preserve systemMessage, route additionalContext natively), plus two documentation corrections.
See the CHANGELOG for full details.
v1.5.0
Context-pressure awareness: one source of truth for how full the window is, and hints that get terser as it fills. Ships alongside three install fixes that restore hook forwarding under editable installs and silence a recurring doctor warning.
Centralized context-pressure model
get_context_pressure(session_id) in compact.py is now the single place that answers how close a session is to autocompaction. It returns a frozen ContextPressure — a fill_fraction paired with a tier of cool, warm, hot, or critical. The estimate sums the known context contributors (loaded skill bodies, the ~10,800-token skills catalog, and per-event costs for bash history, web history, and read files) and divides by the fixed 660,000-token autocompact budget rather than the model's raw window, so the fraction carries the same meaning no matter which model is driving the session. The old _estimate_context_fill helper and the inline calculation in the session hook both defer to it, retiring the copies of the 660 K constant that had spread across half a dozen call sites in favor of one shared CONTEXT_AUTOCOMPACT_TOKENS.
Named tier boundaries
The fraction-to-tier mapping lives in tier_for_fraction(), backed by three named constants: CONTEXT_TIER_WARM (0.50), CONTEXT_TIER_HOT (0.70), and CONTEXT_TIER_CRITICAL (0.85). The bands are cool below 0.50, warm up to 0.70, hot up to 0.85, and critical at or above it. With the magic numbers pulled out of the band checks, the boundaries are defined once and the tests pin them directly.
Pressure-aware surgical-read hints
The pre-read hook tightens its large-file threshold as the window fills. A file earns a surgical-read suggestion past 500 lines while the session is cool, 350 when warm, 200 when hot, and 50 when critical. It also folds a single per-tier note into the read's additional context: "Context warming" at warm, "Context pressure" at hot, "CONTEXT CRITICAL" at critical. The note is fingerprinted by tier, so it fires once per band rather than on every read. Cool sessions get no note.
Smaller manifests under pressure
compute_adaptive_budget now weighs context pressure when it sizes the compaction manifest. Once the window runs hot the budget is capped at 500 tokens, and at critical it drops to 300, so the manifest stops adding to the very problem it exists to summarize.
Install robustness
Hooks no longer silently disable themselves under an editable install. The tg-hook wrapper carries an if not exist "<sentinel>" gate that short-circuits to a bare {"continue":true} during the uv tool install --reinstall race, when the venv's token_goat module is briefly absent. The sentinel used to be a hardcoded site-packages/token_goat/__init__.py path, which never exists under an editable install (uv sync, the project .venv), so the gate stayed permanently true and every hook no-op'd — the whole tool went dark with no error. The wrapper now resolves the sentinel through importlib.util.find_spec("token_goat").origin, which points at src/token_goat/__init__.py for editable installs and site-packages/... for regular ones, and falls back to an ungated wrapper when no sentinel resolves. A live handler emits {"continue": true, "_tg_elapsed_ms": N}; the _tg_elapsed_ms field is the tell that forwarding actually ran.
Re-install purges orphaned tokenwise entries. After the tokenwise → token-goat rename, a re-install left the old hook and permission lines stranded in settings.json and the Codex config.toml, so both harnesses kept invoking a binary that no longer existed. patch_settings_json and patch_codex_config now strip any pre-rename tokenwise command and permission entry before writing the current ones.
Hook wrapper is written as bytes to stop CRLF doubling. hook_wrapper_content() hand-bakes platform-correct line endings — \r\n on Windows — then was written through atomic_write_text, whose text-mode handle translated every \n to \r\n a second time, doubling each line ending to \r\r\n on disk. cmd.exe tolerated the stray carriage return so forwarding still worked, but doctor does a byte-exact compare of the on-disk wrapper against the regenerated content and warned differs from expected — run token-goat install to refresh on every run, a nag that reinstalling could never clear because it rewrote the same doubled bytes. The wrapper now goes through atomic_write_bytes, preserving the authored endings verbatim.
Session-cache integrity
Concurrent session saves no longer drop an edit. The save() fast path skipped its compare-and-swap re-read and merge whenever the on-disk (st_mtime, st_size) fingerprint still matched the one captured at load. That fingerprint aliases: two caches whose keys are the same length serialize to byte-identical JSON sizes, and a float st_mtime rounds two sub-microsecond writes to the same value. When two writers collided on both fields the second skipped the merge and overwrote the first, losing exactly one edit — the 200-edit concurrency stress test intermittently saw 199. The fast path now consults an in-process version registry so a same-process writer that already advanced the version forces the stale save back through the merge, and the fingerprint is taken from integer st_mtime_ns instead of the rounded float, so a cross-process skip now requires a true nanosecond-and-size collision rather than a rounding coincidence.
v1.3.0
[1.3.0] - 2026-06-05
Context growth audit — four changes that cut session context size and make overhead visible.
Context footprint in doctor
token-goat doctor --context now prints a Context footprint section measuring every token source that pads the context window each turn: the skills catalog (~10,800 tokens/turn for a typical install), loaded skill bodies accumulated in system-reminder injections, CLAUDE.md + MEMORY.md meta-files, and the rolling conversation estimate. The section shows fill % against the 660,000-token autocompact threshold, an ETA in turns at the current growth rate, and an Actions block naming the exact commands to run when any loaded skill above 2,000 tokens is missing a compact.
Auto-shown when estimated fill exceeds 40 % or any loaded skill > 2 K tokens lacks a compact; always shown with --context.
Compact pre-generation at install time
token-goat install now runs skill-compact --all as a final step, so compacts are ready before the first session — no post-install warm-up turn required. A sentinel file (skill_pregen_sentinel.json) records the catalog count; the doctor section uses it to detect skills added after the last pre-gen pass.
Per-skill compact advisory in post_skill
When a skill body lands in context, the post_skill hook now reports the compact's token savings inline (pre-generated compacts, sync-generated compacts for bodies < 40 KB, background-generated for larger bodies, info-only when no worker is running). Advisory fires only for bodies above 8 KB to stay silent for tiny skills.
Threshold-crossing context advisory in user_prompt_submit
A lightweight ETA advisory fires the first time estimated context fill crosses 50 % and again at 70 %. The message is appended to the existing status line (bracket-joined, not a separate injection) and references /compact now at 70 %. Resets after each compact. Configurable via hints.context_threshold_advisory = false.
v1.2.0
[1.2.0] - 2026-06-05
14 commits since v1.1.0. Output overflow guard, cross-platform path normalization fixes, and a reliability pass.
Output Overflow Guard
Surgical-read commands (symbol, read, section, bash-output, web-output, and the rest) now cap oversized output before it reaches the model. When estimated tokens exceed the cap, the output is head-truncated on a line boundary. A marker line is appended naming the cap, the truncation ratio, and the narrowing action — symbol users get directed toward file::Class.method lookups, section users toward sub-headings, cached-output users toward --grep/--tail.
Default cap: 25,000 tokens. Configure via [overflow_guard] max_tokens in config.toml, override with TOKEN_GOAT_OVERFLOW_MAX_TOKENS=<n>, or disable with TOKEN_GOAT_OVERFLOW_GUARD=0 / [overflow_guard] enabled = false.
The estimator is deliberately conservative — 3 chars/token, same rate as the compaction manifest — so the cap is never under-applied. ANSI escapes are stripped before estimation since color codes inflate length without adding model-visible tokens. A single-line blob (no internal newlines) is sliced at the char budget so it cannot pass through whole.
Cross-Platform Path Normalization
Two fixes that make path-keyed caches work correctly across Windows, WSL, and Linux:
normalize_path / paths.normalize_key — Drive-letter lowercasing (C: → c:) is now unconditional. The previous guard sys.platform == "win32" meant a WSL process that emits a Windows-format path (C:/Users/…) produced a different cache key than a native Windows process reading the same file. Both now produce c:/users/….
hooks_skill.post_skill — Windows-style backslash paths like C:\Users\user\.claude\skills\ralph were not stripped on Linux because the inline guard used _os.sep (/ on Linux) instead of the string literal "\\". The inline block is now a call to _normalize_skill_name, which hardcodes "\\" and handles both separator styles on every platform.
Reliability
- Worker dirty-queue torn writes. Concurrent
_append_dirtycalls could produce truncated or concatenated JSON lines under write contention. An OS-level file lock (fcntlon POSIX,msvcrton Windows) now serializes appends, same as the session cache. - SQLite WAL checkpoint mode. Changed from
RESTARTtoPASSIVEon connection open.RESTARTwaited for all readers to drain, blocking hook subprocesses for hundreds of milliseconds during active indexing.PASSIVEcheckpoints cooperatively and does not wait.
v1.1.0
57 commits since v1.0.1. Six new language indexers, twenty-plus CLI commands and flags, a pre-skill hook that cuts repeat skill loads from 40–65k tokens to ~400, pnpm/yarn/bun compress filters, rg/grep dedup hints, double-daemon prevention, and a reliability pass with 400+ new tests.
Highlights
- Skill re-load prevention. A new
PreToolUse(Skill)hook fires before every Skill invocation. When a skill was already loaded in the current session, the reload is blocked and the cached compact (~400 tokens) is served instead. A repeat/ralphor/supermaninvocation in the same session now costs ~400 tokens, not 40–65k. - New language indexers. CSS/SCSS, SQL, GraphQL, Protobuf,
.env, and Makefile. All participate intoken-goat symbol,read,outline,scope, and dedup hints. - New CLI flags.
symbol --context N,symbol --json,outline --min-lines,outline --max-depth,web-output --list,map --filter,stats --since,token-goat recent, bash history exit codes. - Package manager filters. pnpm, yarn, and bun compress filters.
pnpm run/yarn runroute through their own filter. rg/grepdedup. Bash rg/grep invocations now fire dedup hints the same way the native Grep tool does.- Top-5 file guarantee. The five most-accessed files always appear in the compaction manifest.
- Double-daemon prevention. JSON PID files, cross-interpreter startup guard,
worker --kill-duplicate,worker --status,install --check.
Full changelog: https://github.com/DFKHelper/token-goat/blob/main/CHANGELOG.md
v1.0.1
Bundles two 50-commit improvement runs: a skill-cache / context-savings accuracy loop and a general quality loop.
Highlights:
- Skill cache: source_sha stale-compact detection, separate compact/body eviction buckets, sidecar schema v2, lazy skill injection, gzip compression
- Stats accounting fixed for
bash_output_cached,skill_cached,web_output_cached, and surgical-read lookup savings (were always 0) - Serve-diff-on-reread, session-hint cooldown, unified token formula, stats category grouping
- RuffFilter and MypyFilter bash-compress support
- Type safety, error handling, performance (hoisted regex), security (0o600 lock files), DRY helpers, debug log coverage
- 55 new tests
See CHANGELOG for full details.
v1.0.0
First stable release. Promotes the [Unreleased] block covering the 35-iter /improve run: compaction hardening, doctor visibility, opt-in observability, four new bash-compress filters, surgical-read hints, and reliability fixes.
See CHANGELOG.md for the full change list.
v0.9.0
Highlights
Bundles three improvement loops landed since 0.8.0 (37-iter context/compaction on 2026-05-25, 68-iter reliability/perf on 2026-05-24, 55-iter context-savings baseline).
Security
- DNS-rebinding window closed in SSRF guard:
webfetch.pyresolves once and pins the connection to that IP via a custom transport, so a hostile DNS server can't return a public IP to validation and a private IP (e.g. 169.254.169.254 IMDS) to the reconnect. paths.safe_join()promoted as the canonical fragment joiner; two raw joins that took user-controlled session_ids now flow through it.dispatch()ensurescontinue=trueso handlers returning{}can't become harness-blocking responses.- webfetch sidecar path-traversal fix: validates that
shrunk_pathresolves inside the cache roots before writing. - PIL decode-bomb cap:
MAX_IMAGE_PIXELSset to prevent multi-gigapixel decompression crashes.
Reliability
- Hook registry consolidated to single source of truth (
hook_registry.py); a startup_assert_hook_registry_aligned()raisesImportErrorif any event lacks a@hook_app.commanddecorator. Eliminates the recurring registry drift bug class. - Persistent hook wrapper at
data_dir/bin/tg-hook.cmdsurvivesuv tool install --reinstall; emits{"continue":true}and exits 0 when the venv is briefly absent. paths.ensure_dir()retry helper for the Windowsmkdir(parents=True, exist_ok=True)race whereis_dir()returns staleFalseafter a concurrent create.- Surrogate-escape crash fix in
post_bash(1,311 crashes/week in production). - Orphaned project GC: reclaims 2.3 GB on the audited install with a 30-min safety window.
- Session lock hardened: 5 s timeout, jittered poll backoff, fsynced PID write, rejects invalid PIDs.
- Worker SIGTERM-aware drain loop via
_GRACEFUL_SHUTDOWNevent. - Hook crash log at
hooks-stderr.log(100 KB cap,.prevrotation). - Concurrent dirty-queue write protected by
fcntl/msvcrtlocks. - Session CAS re-applies size caps after merge so a race can't inflate the JSON beyond limits.
- Hooks stderr log test isolation: 230 KB / 316 crash blocks of test garbage were polluting the production sink.
Token savings — manifest, hints, hot path
- Manifest format shortening:
L:X-Yinstead oflines X-Y, bash entries dropid=/shortenexit=toe=, lower_MAX_TODO_SUBJECT_CHARS(~71 tokens/manifest). - Active-skills section collapsed to a single line (
**Skills:** name1, name2 — recall via ...) (~160 tokens/6-skill manifest). - Manifest Delta line:
+sections that grew since last compact. - Bash entries grouped by exit class (Failed/Slow/Ok); suppresses TODOs that reference edited files.
- Manifest bold-label bundle:
### Edited:→**Edited:**,**Syms:**for inline labels. - Manifest SHA sidecar cache:
pre_compactwrites a sentinel and rebuilds only when the session SHA differs. - Cross-session grep dedup via
global.db::grep_patterns. - Adaptive
_MAX_BASH_ENTRIESscales withbash_historylength. - Clean-repo session brief collapses to
<branch> (clean)one-liner. status_linescap (50 +(+N more files)).- Single rev-list + adaptive git-log entry count; in-sync repos skip the git-log section.
- WebFetch HTML strip before caching (60-90% byte reduction).
- Process-local LRU on
session.load()(cap 4, mtime-keyed). - Pre-Read structured-file hint (CSV headers, JSON top-level keys, log entry count) ~70% smaller.
- Pre-Read index-only file suppression (lockfiles, source maps, build artifacts).
- AVIF image-shrink support (~15% smaller than WebP when libaom available).
- Hint fingerprint includes file path to prevent false positives across files.
- Session-level hint budget caps (files 5, bash 3, web 2, skills 4).
Performance
- Test suite 22% faster: eviction tests use
patch.object(session, "save")instead of 200-500 real disk writes. - Hook cold-start: 190 ms → 110 ms (~42% faster) via lazy imports in
hooks_session.py. - Compact-skip sentinel: <1 ms exit on fresh sessions when no edits since the last manifest.
- Skip git ops entirely when
cwdis not a repo (saves 60-100 ms per hook fire in non-repo dirs). - DB contention metric surfaced in
doctor. pytest-xdist --dist=loadscopefor module-scoped fixture reuse across tests.
DRY
- 16 git subprocess sites consolidated through
util.run_git()(always--no-optional-locks+ UTF-8 witherrors="replace"). cache_common.safe_cache_opcontext manager +store_blobfor atomic blob writes +short_content_hash()shared across bash/web/skill caches.paths.safe_join(),paths.hook_wrapper_path(),paths.normalize_key()promoted as canonical helpers.util.sanitize_surrogates,util.humanize_bytes,util.ellipsizecross-module helpers.- Session 6-item helper bundle (
safe_load,_merge_lists,_cap_dict,_bump_read_count,_session_path,_atomic_write).
Codex / opencode / openclaw compatibility
- Wire-format round-trip tests for every hook event across all four harnesses (Claude / Codex / opencode / openclaw); bridge TS event-table alignment regression test.
- PowerShell
bash_parsercoverage:Get-Content,gc,typeambiguity guard, equals-form flags. - Bash dispatch + golden-output tests (+151) across all 17 compression filters.
CI
- Workflow split into gating fast tier (
-m "not slow") + opt-in slow tier withcontinue-on-error: trueso racy multi-process stress tests don't block the build.
Suite at end of loop: 5051 pass / 28 skipped (started this release cycle at 4598; +453 tests added).
Full details in CHANGELOG.md.
v0.8.0
Added
-
Skill preservation through compaction. Every
PostToolUse(Skill)invocation captures the loaded skill body to a persistent on-disk cache (data_dir()/skills, 5 MB LRU-evicted) keyed by(session, skill_name, content_sha). The compaction manifest gains an### Active Skillssection listing every loaded skill with atoken-goat skill-body <name>recall hint, and the post-compact recovery hint surfaces the same list under**Skills**:. Solves the "I forgot parts of the skill after compaction" problem — load-bearing prose (Ralph's DoD gates, /improve's iteration sequence, any multi-thousand-token protocol skill) is recoverable without re-invoking the skill, which would replay any side effects and pollute the conversation with a fresh tool-result block. Configurable viaconfig.toml [skill_preservation](enabled,max_cache_bytes) or disabled at runtime viaTOKEN_GOAT_SKILL_PRESERVATION=0. Default-on. -
token-goat skill-body <name>— retrieve a cached skill body by name. Defaults to a head+tail view for large bodies; pass--fullfor everything, or narrow with--head N,--tail N,--grep PATTERN. Falls back to reading the original~/.claude/skills/<name>/SKILL.md(or plugin-path equivalent) when the cache entry has been evicted but the source path was recorded. -
token-goat skill-history— list cached skill bodies (newest first) with their IDs, byte sizes, ages, and skill names. -
Skill marker (🧠) in the compaction manifest legend — joins
edited=✎,read=→,stale=⚠,cold=❄so the compaction LLM has a stable glyph vocabulary for every section type. -
4-section recovery hint allocator.
_allocate_recovery_slotsnow distributes 18 total slots across Files / Bash / Web / Skills with skill loads taking priority in the greedy expansion pass.