Skip to content

v1.5.0

Choose a tag to compare

@Zelys-DFKH Zelys-DFKH released this 08 Jun 04:33
· 37 commits to main since this release

Context-pressure awareness: one source of truth for how full the window is, and hints that get terser as it fills. Ships alongside three install fixes that restore hook forwarding under editable installs and silence a recurring doctor warning.

Centralized context-pressure model

get_context_pressure(session_id) in compact.py is now the single place that answers how close a session is to autocompaction. It returns a frozen ContextPressure — a fill_fraction paired with a tier of cool, warm, hot, or critical. The estimate sums the known context contributors (loaded skill bodies, the ~10,800-token skills catalog, and per-event costs for bash history, web history, and read files) and divides by the fixed 660,000-token autocompact budget rather than the model's raw window, so the fraction carries the same meaning no matter which model is driving the session. The old _estimate_context_fill helper and the inline calculation in the session hook both defer to it, retiring the copies of the 660 K constant that had spread across half a dozen call sites in favor of one shared CONTEXT_AUTOCOMPACT_TOKENS.

Named tier boundaries

The fraction-to-tier mapping lives in tier_for_fraction(), backed by three named constants: CONTEXT_TIER_WARM (0.50), CONTEXT_TIER_HOT (0.70), and CONTEXT_TIER_CRITICAL (0.85). The bands are cool below 0.50, warm up to 0.70, hot up to 0.85, and critical at or above it. With the magic numbers pulled out of the band checks, the boundaries are defined once and the tests pin them directly.

Pressure-aware surgical-read hints

The pre-read hook tightens its large-file threshold as the window fills. A file earns a surgical-read suggestion past 500 lines while the session is cool, 350 when warm, 200 when hot, and 50 when critical. It also folds a single per-tier note into the read's additional context: "Context warming" at warm, "Context pressure" at hot, "CONTEXT CRITICAL" at critical. The note is fingerprinted by tier, so it fires once per band rather than on every read. Cool sessions get no note.

Smaller manifests under pressure

compute_adaptive_budget now weighs context pressure when it sizes the compaction manifest. Once the window runs hot the budget is capped at 500 tokens, and at critical it drops to 300, so the manifest stops adding to the very problem it exists to summarize.

Install robustness

Hooks no longer silently disable themselves under an editable install. The tg-hook wrapper carries an if not exist "<sentinel>" gate that short-circuits to a bare {"continue":true} during the uv tool install --reinstall race, when the venv's token_goat module is briefly absent. The sentinel used to be a hardcoded site-packages/token_goat/__init__.py path, which never exists under an editable install (uv sync, the project .venv), so the gate stayed permanently true and every hook no-op'd — the whole tool went dark with no error. The wrapper now resolves the sentinel through importlib.util.find_spec("token_goat").origin, which points at src/token_goat/__init__.py for editable installs and site-packages/... for regular ones, and falls back to an ungated wrapper when no sentinel resolves. A live handler emits {"continue": true, "_tg_elapsed_ms": N}; the _tg_elapsed_ms field is the tell that forwarding actually ran.

Re-install purges orphaned tokenwise entries. After the tokenwisetoken-goat rename, a re-install left the old hook and permission lines stranded in settings.json and the Codex config.toml, so both harnesses kept invoking a binary that no longer existed. patch_settings_json and patch_codex_config now strip any pre-rename tokenwise command and permission entry before writing the current ones.

Hook wrapper is written as bytes to stop CRLF doubling. hook_wrapper_content() hand-bakes platform-correct line endings — \r\n on Windows — then was written through atomic_write_text, whose text-mode handle translated every \n to \r\n a second time, doubling each line ending to \r\r\n on disk. cmd.exe tolerated the stray carriage return so forwarding still worked, but doctor does a byte-exact compare of the on-disk wrapper against the regenerated content and warned differs from expected — run token-goat install to refresh on every run, a nag that reinstalling could never clear because it rewrote the same doubled bytes. The wrapper now goes through atomic_write_bytes, preserving the authored endings verbatim.

Session-cache integrity

Concurrent session saves no longer drop an edit. The save() fast path skipped its compare-and-swap re-read and merge whenever the on-disk (st_mtime, st_size) fingerprint still matched the one captured at load. That fingerprint aliases: two caches whose keys are the same length serialize to byte-identical JSON sizes, and a float st_mtime rounds two sub-microsecond writes to the same value. When two writers collided on both fields the second skipped the merge and overwrote the first, losing exactly one edit — the 200-edit concurrency stress test intermittently saw 199. The fast path now consults an in-process version registry so a same-process writer that already advanced the version forces the stale save back through the merge, and the fingerprint is taken from integer st_mtime_ns instead of the rounded float, so a cross-process skip now requires a true nanosecond-and-size collision rather than a rounding coincidence.