GBrain is a personal knowledge brain and GStack mod for agent platforms. Pluggable engines: PGLite (embedded Postgres via WASM, zero-config default) or Postgres + pgvector
- hybrid search in a managed Supabase instance.
gbrain initdefaults to PGLite; suggests Supabase for 1000+ files. GStack teaches agents how to code. GBrain teaches agents everything else: brain ops, signal detection, content ingestion, enrichment, cron scheduling, reports, identity, and access control.
gbrain aims to be the next Postgres for memory: the most well-tested, widest-coverage, best-for-the-most-at-the-least retrieval + agent memory system for company brains and personal AI, built to serve a billion people. Every feature and every eval is judged against this bar. "gbrain is best" is a WHOLE-SYSTEM claim — proven across the full BrainBench suite (retrieval, longmemeval, calibration, …) — not by any single feature. When scoping an eval, prove the FEATURE delivers value to gbrain users; do not waste it proving that gbrain's particular algorithm beats some other algorithm (a research bake-off, off-mission).
GBrain knowledge is organized along two orthogonal axes. Users AND agents must understand both, or queries misroute silently.
- Brain — WHICH DATABASE. Your personal brain is
host. You can mount additional brains (team-published, each with their own DB and access policy) viagbrain mounts add(v0.19+). Routing:--brain,GBRAIN_BRAIN_ID,.gbrain-mountdotfile. - Source — WHICH REPO INSIDE THE DATABASE. A brain can hold many sources
(wiki, gstack, openclaw, essays). Slugs scope per source. Routing:
--source,GBRAIN_SOURCE,.gbrain-sourcedotfile.
Both axes follow the same 6-tier resolution pattern. Read
docs/architecture/brains-and-sources.md for topology diagrams (personal, team
mount, CEO-class with multiple team brains) and
skills/conventions/brain-routing.md for the agent-facing decision table.
Contract-first: src/core/operations.ts defines ~47 shared operations (v0.29 adds get_recent_salience, find_anomalies, get_recent_transcripts). CLI and MCP
server are both generated from this single source. Engine factory (src/core/engine-factory.ts)
dynamically imports the configured engine ('pglite' or 'postgres'). Skills are fat
markdown files (tool-agnostic, work with both CLI and plugin contexts).
Trust boundary: OperationContext.remote distinguishes trusted local CLI callers
(remote: false set by src/cli.ts) from untrusted agent-facing callers
(remote: true set by src/mcp/server.ts). Security-sensitive operations like
file_upload tighten filesystem confinement when remote=true and default to
strict behavior when unset.
Cross-cutting invariants (must-never-violate, regardless of which file you touch).
These used to be buried across the per-file index; they live here so they always load.
Per-file detail is in docs/architecture/KEY_FILES.md.
- Trust is fail-closed.
OperationContext.remoteis REQUIRED on the type. Anything not strictlyfalseis treated as remote/untrusted (ctx.remote === falsefor trusted-only sites;ctx.remote !== falsefor untrust-unless-explicit-false). Don't default it falsy. - Source isolation. Every read-side op routes through
sourceScopeOpts(ctx); precedence is federated array (ctx.auth.allowedSources) > scalar (ctx.sourceId) > nothing. Don't hand-roll source filtering — a missed thread is a cross-source data leak. - JSONB: never
JSON.stringifyinto a::jsonbcast. postgres.js double-encodes it; PGLite hides the bug. Pass raw objects toengine.executeRaw, or useexecuteRawJsonb. Guarded byscripts/check-jsonb-pattern.sh. - Engine parity.
src/core/postgres-engine.tsandsrc/core/pglite-engine.tsmove in lockstep — a new method/SQL shape lands in BOTH, pinned bytest/e2e/engine-parity.test.ts. Forward-referenced columns/indexes go in the bootstrap probe set (guarded bytest/schema-bootstrap-coverage.test.ts). - Contract-first.
src/core/operations.tsis the single source; CLI + MCP are generated from it. Every op carriesscope: 'read'|'write'|'admin'+ optionallocalOnly. HTTP dispatch enforces scope/localOnly before the handler runs. - Migrations. Schema DDL lives in the
MIGRATIONSarray insrc/core/migrate.ts.CREATE INDEX CONCURRENTLYneedstransaction: false(pre-drop invalid remnants on Postgres; plainCREATE INDEXon PGLite viasqlFor.pglite). - Multi-source. Slug uniqueness is
(source_id, slug), not slug. Key batch ops and reverse-writes on the composite key;validateSourceIdbefore anysource_idpath join. - One canonical chat-pricing table. All paid-cloud chat/completion prices live ONCE in
src/core/model-pricing.ts(CANONICAL_PRICING+canonicalLookup). Every other table (anthropic-pricing.ts'sANTHROPIC_PRICING,takes-quality-eval/pricing.ts'sMODEL_PRICING, the contradictions/cross-modal/skillopt cost views) is a DERIVED view, never a hand-copied duplicate — so cross-table price drift is structurally impossible. Update a price inmodel-pricing.tsonly; each consumer keeps its own key allowlist + miss policy (fail-closed vs warn-only vs null), not its own numbers. Pinned bytest/model-pricing.test.ts(drift guard asserts each view equals canonical). Embeddings price separately inembedding-pricing.ts(different unit).
CLAUDE.md is the always-loaded orientation + dispatcher. Detailed reference loads
on demand — read the linked doc before working in that area. (Same two-layer
pattern gbrain ships for its own skills: thin router in skills/RESOLVER.md, fat
detail on demand.)
| When you're working on... | Read first |
|---|---|
any file in src/ (what it does + its invariants) |
docs/architecture/KEY_FILES.md — find the file's entry |
| search / ranking / hybrid / retrieval | docs/architecture/RETRIEVAL.md + the search/* entries in KEY_FILES.md |
| search modes / cost knobs | docs/guides/search-modes.md |
| schema packs / page types / extraction | docs/architecture/schema-packs.md, type-taxonomy.md, lens-packs.md |
| thin-client / remote MCP / cross-modal | docs/architecture/thin-client.md |
| the CLI surface (commands + flags) | gbrain --help / gbrain --tools-json, plus the relevant KEY_FILES.md entry |
| running or writing tests | docs/TESTING.md |
| bulk-command progress wiring | docs/progress-events.md |
| eval methodology / metrics | docs/eval/ |
| brains vs sources / topology | docs/architecture/brains-and-sources.md, topologies.md |
| skill routing | skills/RESOLVER.md |
| shipping a release / CHANGELOG / PR conventions | docs/RELEASING.md (ship IRON RULES stay inline below) |
The per-file index (## Key files), the thin-client routing seam, and the testing
discipline used to live inline here. They moved to the docs above so this file
stays small enough to load every session. Nothing was lost — the pre-move content
is in git, and the docs carry every load-bearing invariant (compressed to
current-state).
CLAUDE.md grew to ~592KB / ~147k tokens once the per-file index became append-only
(one **vX.Y.Z:** clause per release per file). That is the exact anti-pattern
gbrain exists to fix. The rules that keep it from recurring:
- CLAUDE.md is orientation, not the implementation spec. It carries the North Star, the two axes, architecture + cross-cutting invariants, the resolver, and the inline IRON RULES. Per-file/per-command/per-test detail lives in the reference docs and loads on demand.
- Reference docs (
KEY_FILES.md,thin-client.md,TESTING.md) describe CURRENT behavior only. Release history goes inCHANGELOG.md+ git. Do NOT append**vX.Y.Z (#NNN):**clauses, codex/review tags, or "pre-fix/then/was-now" narration. When a file's behavior changes, UPDATE its entry to the new truth. - CI is the enforcement, not this prose.
scripts/check-key-files-current-state.sh(inbun run verify) fails on the bolded-release-clause marker in the reference docs AND on a CLAUDE.md size cap. A written rule caused this disease; a guard cures it. - After any CLAUDE.md or reference-doc edit, run
bun run build:llms— the llms bundle inlines/links these (config inscripts/llms-config.ts); the freshness + budget test (bun test test/build-llms.test.ts) fails CI otherwise.
GBrain ships three named search modes that bundle the search-lite knobs from
PR #897 into a single config key. Pick one at install time; the rest of the
project resolves through src/core/search/mode.ts.
| Knob | conservative |
balanced |
tokenmax |
|---|---|---|---|
cache.enabled |
true | true | true |
cache.similarity_threshold |
0.92 | 0.92 | 0.92 |
cache.ttl_seconds |
3600 | 3600 | 3600 |
intentWeighting |
true | true | true |
tokenBudget |
4000 | 12000 | off |
expansion (LLM multi-query) |
false | false | true |
searchLimit default |
10 | 25 | 50 |
Cost anchors (downstream agent input cost — gbrain itself is rounding error). The corner-to-corner spread is 25x once you pair mode with downstream model. Chunks ~400 tokens avg. Per-query cost @ 10K queries/month (typical single-user volume), full search payload, no cache savings:
| Mode \ Downstream | Haiku 4.5 ($1/M) | Sonnet 4.6 ($3/M) | Opus 4.7 ($5/M) |
|---|---|---|---|
| conservative (~4K) | $40/mo | $120/mo | $200/mo |
| balanced (~10K) | $100/mo | $300/mo | $500/mo |
| tokenmax (~20K) | $200/mo | $600/mo | $1,000/mo |
Scales linearly: multiply by 10 for 100K/mo (heavy power user / multi-user fleet); divide by 10 for 1K/mo (light usage). Natural pairings span ~4x. Mismatches (tokenmax+Haiku, conservative+Opus) waste capacity differently — too-big payload overwhelms a cheap model; too-small payload starves an expensive one.
tokenmax adds ~$1.50 per 1K queries in Haiku expansion calls on top of
the matrix ($15/mo @ 10K). Cache hits cut all numbers ~50%. The cost
picker copy in gbrain init carries the same matrix verbatim — update
both when refreshing.
Per-query math vs real-world spend. The matrix above is what an
isolated benchmark would measure. Real agent loops with disciplined
Anthropic prompt caching see 50-80% discount on top (cache hits skip
downstream entirely). The realistic-scale anchor in
docs/eval/SEARCH_MODE_METHODOLOGY.md walks the natural pairings at
single-power-user volume (~860 turns/mo): tokenmax+Opus ~$700/mo,
balanced+Sonnet ~$430/mo, conservative+Haiku ~$170/mo. Setups WITHOUT
cache-aware prompt layout (frequent prefix churn) see the per-query
matrix dominate — mode + model choice matters more there.
Resolution chain (matches the v0.31.12 model-tier pattern at
src/core/model-config.ts:resolveModel):
per-call SearchOpts → per-key config (search.cache.enabled, …) →
MODE_BUNDLES[search.mode] → MODE_BUNDLES.balanced (fallback)
Mode resolution lives in bare hybridSearch (NOT just the cached wrapper)
per [CDX-5+6] in ~/.claude/plans/lets-take-a-look-validated-parrot.md — so
gbrain eval replay and gbrain eval longmemeval test the same mode-affected
behavior as the production query op.
Cache-key contamination hotfix [CDX-4]: migration v56 added a
knobs_hash column to query_cache. The lookup filter is now
WHERE source_id = $ AND knobs_hash = $ AND embedding similarity < $ so a
tokenmax write (expansion=on, limit=50) can't be served to a conservative
read.
v0.36.3.0 knobs_hash v=2 → v=3. The hash now folds the active
embedding column name + provider into the cache key, so a query routed
through embedding_voyage (1024d Voyage) can't be served a cache row
written against embedding (1536d OpenAI). Existing v=2 rows become
unreachable on first re-query (one-time miss spike on upgrade);
mode.ts:KNOBS_HASH_VERSION is the single source of truth.
Three CLI surfaces:
gbrain search modes # what is running, with per-knob attribution
gbrain search modes --reset # clear search.* overrides (mode bundle wins)
gbrain search stats [--days N] # cache hit rate, intent mix, budget drops
gbrain search tune [--apply] # data-driven recommendations
The install picker fires inside gbrain init AFTER engine.initSchema()
(non-TTY auto-selects). The upgrade banner fires once via runPostUpgrade
in src/commands/upgrade.ts, gated by search.mode_upgrade_notice_shown.
Every metric printed by any gbrain eval * or gbrain search stats command
resolves through src/core/eval/metric-glossary.ts so industry terms
(P@k, nDCG@k, MRR, Jaccard@k) carry a plain-English line in human
output and a _meta.metric_glossary block in JSON output (one block per
response per [CDX-25], NOT sibling _gloss fields).
The full methodology — datasets, sample selection, pre-registered
expectations, threats to validity, paired-bootstrap + Bonferroni p-value
discipline [CDX-14] — lives in docs/eval/SEARCH_MODE_METHODOLOGY.md.
Auto-regenerated docs/eval/METRIC_GLOSSARY.md is CI-guarded against
drift (scripts/check-eval-glossary-fresh.sh).
Per-run records land at <repo>/.gbrain-evals/eval-results.jsonl per
[CDX-23]. The user's personal ~/.gbrain brain is NEVER touched —
audit trail lives in the source repo's git history.
Read the skill files in skills/ before doing brain operations. GBrain ships 29 skills
organized by skills/RESOLVER.md (AGENTS.md is also accepted as of v0.19):
Original 8 (conformance-migrated): ingest (thin router), query, maintain, enrich, briefing, migrate, setup, publish.
Brain skills (ported from an upstream agent fork): signal-detector, brain-ops, idea-ingest, media-ingest, meeting-ingestion, citation-fixer, repo-architecture, skill-creator, daily-task-manager.
Operational + identity: daily-task-prep, cross-modal-review, cron-scheduler, reports,
testing, soul-audit, webhook-transforms, data-research, minion-orchestrator. As of
v0.20.4, minion-orchestrator is the single unified skill for both lanes of background
work (shell jobs via gbrain jobs submit shell, LLM subagents via gbrain agent run) ...
the prior gbrain-jobs skill was merged in, Preconditions are shared, and trigger
routing is narrowed to what the skill actually covers.
Skillify loop (v0.19): skillify (the markdown orchestration), skillpack-check (agent-readable health report).
Routing-table compression (v0.32.3.0): skills/functional-area-resolver/ —
two-layer dispatch pattern for shrinking large AGENTS.md / RESOLVER.md files
(>=12KB) without losing routing accuracy. Replaces one row per skill with one
entry per functional area, where each area declares its sub-skills in a
(dispatcher for: ...) clause. The static-prompt analog of hierarchical agent
routing (AnyTool arXiv:2402.04253, RAG-MCP
arXiv:2505.03275, Anthropic Agent Skills
progressive disclosure). Empirically validated across Opus 4.7 / Sonnet 4.6 /
Haiku 4.5: +13 to +17pp over the verbose baseline at 48% the size (25KB → 13KB
on a real fork). The (dispatcher for: ...) clause is the load-bearing signal
— strip it and lenient accuracy collapses to 41.7% on Sonnet (the
resolver-of-resolvers ablation case). A/B eval surface lives at
evals/functional-area-resolver/ (outside skills/ deliberately so the
skillpack bundler doesn't ship eval infrastructure to downstream installs):
gateway-routed TypeScript harness, 20 training + 5 held-out fixtures, strict +
lenient scoring, three committed cross-model receipts in baseline-runs/.
Receipt header binds (model, prompt_template_hash, fixtures_hash, harness_sha,
ts) so future contributors can verify reproduction. Companion rescore.mjs
re-scores existing JSONL with lenient tolerance for zero API cost. Reproduce
with cd evals/functional-area-resolver && node harness.mjs --model {opus|sonnet|haiku} (~$0.30–1.70 per model). Nine v0.33.x follow-up TODOs
filed for held-out corpus growth, cross-vendor verification, hierarchical
area-of-areas, embedding-based pre-router, and the run-1 vs run-2
prompt-design ablation methodology.
Operational health (v0.19.1): smoke-test (8 post-restart health checks with auto-fix
for Bun, CLI, DB, worker, Zod CJS, gateway, API key, brain repo; user-extensible via
~/.gbrain/smoke-tests.d/*.sh).
Conventions: skills/conventions/ has cross-cutting rules (quality, brain-first,
model-routing, test-before-bulk, cross-modal). skills/_brain-filing-rules.md and
skills/_output-rules.md are shared references.
All bulk commands (doctor, embed, import, export, sync, extract, migrate,
repair-jsonb, orphans, check-backlinks, lint, integrity auto, eval, files
sync, and apply-migrations) stream progress through the shared reporter
at src/core/progress.ts. Agents get heartbeats within 1 second of every
iteration regardless of how slow the underlying work is.
Rules:
- Progress always writes to stderr. Stdout stays clean for data output
(
--jsonpayloads, final summaries, JSON action events fromextract). - Non-TTY default: plain one-line-per-event human text. JSON requires the
explicit
--progress-jsonflag. - Global flags (
--quiet,--progress-json,--progress-interval=<ms>) are parsed bysrc/core/cli-options.tsBEFORE command dispatch. - Phase names are machine-stable
snake_case.dot.path(e.g.doctor.db_checks,sync.imports). Documented indocs/progress-events.md; additive changes only. scripts/check-progress-to-stdout.shis a CI guard that fails the build if any new code writes\rprogress to stdout. Wired intobun run test.- Minion handlers pass
job.updateProgressas theonProgresscallback to core functions (DB-backed primary progress channel); stderr fromjobs workstays coarse for daemon liveness only.
When wiring a new bulk command: import { createProgress } from '../core/progress.ts'
and import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'.
Create a reporter with createProgress(cliOptsToProgressOptions(getCliOptions())),
start(phase, total?) before the loop, tick() inside it, finish() after.
For single long-running queries, use startHeartbeat(reporter, note) with a
try/finally to guarantee cleanup. Never call process.stdout.write('\r...')
in bulk paths, the CI guard will fail the build.
Iron rule: when running bun test, bun run test:e2e, bun run typecheck,
or any other test/check command, redirect to a file FIRST, then tail the file
separately:
# RIGHT — full output preserved, real exit code visible
bun test > /tmp/ship_units.txt 2>&1
echo "EXIT=$?"
tail -50 /tmp/ship_units.txt
grep -E '(fail\)|✗|error:' /tmp/ship_units.txt | head -30# WRONG — exit code is `tail`'s (always 0), failures truncated, ship gates fail open
bun test 2>&1 | tail -10The pipe form silently breaks /ship Step T1 (test failure ownership triage) and the test verification gate (Step 16) because:
$?after a pipe is the LAST command's exit code (tail→ 0), not bun's- bun prints failure details before the summary line, so
tail -Ndrops them - Step T1 needs the full failure list to classify in-branch vs pre-existing
This bit us during v0.26.2 ship: bun test 2>&1 | tail -10 reported "3911 pass / 23 fail"
but no failure details survived, forcing a 23-minute re-run to triage.
Apply the same pattern to any long-running command whose exit code matters:
bun run typecheck, bun run ci:local, migration runs, eval suites, etc.
For background tasks (run_in_background: true), the harness captures the exit
file separately — use it via the bg task's <id>.exit file, not the streamed
output.
bun build --compile --outfile bin/gbrain src/cli.ts
Every release advances the version in five files at once. Keep these in
sync. /ship enforces this via Step 12's idempotency check (VERSION vs
package.json drift), but the canonical list lives here so future runs and
the auto-update agent know where to look.
Version format is mandatory: MAJOR.MINOR.PATCH.MICRO (four numeric
segments, dot-separated, no leading v). Every new release MUST use the
4-segment form. The .MICRO slot is the dot-suffix follow-up channel: when
a release ships its commit subject ahead of its VERSION bump (e.g. PR #795
landing as v0.31.4 without bumping the file), the corrective ship lands
as 0.31.4.1 rather than churning the patch number to 0.31.5. Suffixes
like -fixwave are still allowed as needed (0.31.1.1-fixwave), but the
four numeric segments are required first. Historical 3-segment versions
(0.31.3, 0.22.1) remain valid in git log and migration filenames
(skills/migrations/v0.21.0.md); do NOT rewrite them. Going forward only.
Required (every release must update all five):
| File | What lives there | Format |
|---|---|---|
VERSION |
The single source of truth. Read first by /ship, the binary, and CI version-gate. |
Bare 4-segment string MAJOR.MINOR.PATCH.MICRO (e.g. 0.31.4.1), no leading v. |
package.json |
Bun/npm package version. gbrain --version reads it via the compiled binary's bundled package metadata. CI version-gate cross-checks this against VERSION and fails if they drift. |
"version": "0.31.4.1" |
CHANGELOG.md |
Top entry header ## [0.31.4.1] - YYYY-MM-DD plus the "To take advantage of v0.31.4.1" block. |
Standard Keep-a-Changelog header. |
TODOS.md |
Any TODO entries that mention "follow-up from vX.Y.Z.W" use the version of the release that filed them. Update only when filing NEW follow-up TODOs. | Inline vX.Y.Z.W references in TODO bodies. |
CLAUDE.md |
The Key Files section's per-file annotations carry vX.Y.Z.W (#NNN) tags noting which release introduced a behavior. Update whenever a wave's annotations get folded in. |
Inline vX.Y.Z.W (#NNN, contributed by @user) references. |
Auto-derived (no manual edit; refreshed by their own commands):
bun.lock— root-package version is auto-pinned frompackage.json. After bumpingpackage.json, runbun installto refresh the lockfile.llms-full.txt/llms.txt— auto-generated documentation bundles. Any CLAUDE.md edit MUST be followed bybun run build:llmsin the same commit (or a follow-up commit before push). The committed bundles are checked against fresh generator output bytest/build-llms.test.ts, which runs in CI shard 1. If you edited CLAUDE.md and didn't regenerate, CI will fail. This has bitten the wave 3 times — every CLAUDE.md edit gets abun run build:llmschaser, no exceptions. (Theverifygate doesn't run this test; only the full unit suite does. Sobun run typecheckclean is NOT enough to know you can push after a CLAUDE.md edit.)
Historical (DO NOT bump on release):
skills/migrations/v0.21.0.md— migration files use the version they shipped FROM as their filename. v0.21.0's migration always says v0.21.0.src/commands/migrations/v0_21_0.ts— same: migration code references the schema version it migrates to.test/migrations-v0_21_0.test.ts,test/migration-orchestrator-v0_21_0.test.ts,test/migrate.test.ts— migration tests reference historical migration versions; these are correct as-is and should not move.src/core/db.ts,src/core/migrate.ts,src/core/import-file.ts,src/commands/reindex-code.ts— code comments cite the release that introduced a feature. Once written, these are historical record.README.md— references the latest published feature names by version (e.g. "v0.21.0 Code Cathedral"); update only when the README's marketing copy is intentionally being refreshed, NOT on every micro/patch bump.
The /ship workflow's version idempotency check: Step 12 reads
VERSION and package.json, classifies as FRESH / ALREADY_BUMPED /
DRIFT_STALE_PKG / DRIFT_UNEXPECTED, and refuses to proceed on
DRIFT_UNEXPECTED. This is why the two must move together.
The CI version-gate rejects pushes where VERSION and
package.json disagree, OR where VERSION is not strictly greater
than master's VERSION. If a queue collision claims your version on
master before yours lands, /ship's queue-aware allocator (Step 12)
will detect drift and re-bump on the next run.
Mandatory version-consistency audit (run after EVERY merge or commit that touches VERSION, package.json, or CHANGELOG)
The trio MUST agree. Every merge from master will hit conflicts on VERSION + package.json + CHANGELOG.md because master ships its own version bumps. Auto-merge sometimes resolves these silently in unexpected ways. After any merge, branch update, or version-related edit, run this audit. It's three lines and never lies:
echo "VERSION: $(cat VERSION)"
echo "package.json: $(node -e 'process.stdout.write(require("./package.json").version)')"
grep -E "^## \[" CHANGELOG.md | head -1All three MUST show the same MAJOR.MINOR.PATCH.MICRO. If any one
disagrees, you have not finished the merge. Fix it before pushing or
shipping. There is no situation in which "I'll fix it next push" is OK,
because:
- A green local test run with mismatched VERSION/package.json still fails the CI version-gate.
- A green CHANGELOG entry under the wrong version header silently lies to release-notes consumers.
- /ship's Step 12 idempotency check classifies a mismatch as
DRIFT_UNEXPECTEDand HALTS — but only if you remember to run /ship before pushing. Manualgit pushskips the check.
When git merge origin/master reports conflicts on VERSION,
package.json, or CHANGELOG.md, resolve in this exact order:
- VERSION — overwrite with the wave's version (`echo -n "X.Y.Z.W"
VERSION`). Highest semver wins; do NOT take master's lower version.
- package.json — strip the conflict markers, keep the wave's
version line. Sed pattern:
sed -i.bak '/^<<<<<<< HEAD$/d; /^=======$/,/^>>>>>>> /d' package.json && rm package.json.bak(assumes ours is above the=======). - CHANGELOG.md — strip ALL three conflict markers; both your entry
and master's entry stay. Sed pattern:
sed -i.bak '/^<<<<<<< HEAD$/d; /^=======$/d; /^>>>>>>> origin\/master$/d' CHANGELOG.md && rm CHANGELOG.md.bakThen verify your entry is the topmost## [X.Y.Z.W]and master's newer-than-yours entries (if any) sit below. - Run the 3-line audit above. If it doesn't show your version on all three lines, you missed a marker.
- Run
bun installto refreshbun.lockagainst the resolvedpackage.json. Stage and commit if it changed. - Run
bun run typecheckbefore committing the merge. - Only THEN run
git commitfor the merge.
If the audit shows drift after step 4, do NOT proceed to step 5. Re-run steps 1-3 against the actual file content; you missed a marker or resolved one in the wrong direction.
Anti-pattern to avoid: Resolving via git checkout --ours package.json
and git checkout --theirs scripts/test-shard.sh mixed in the same
commit. The selective directional resolution is fine, but on
VERSION/package.json/CHANGELOG specifically, ALWAYS use the explicit
echo > VERSION + sed-strip-markers pattern above. The directional
checkout flags have bitten us when the conflict shape was unexpected
(e.g. master stripped a section we expected to keep).
Before any git push of a merge commit, run the audit one more time:
echo "VERSION: $(cat VERSION)"
echo "package.json: $(node -e 'process.stdout.write(require("./package.json").version)')"
grep -E "^## \[" CHANGELOG.md | head -1If you've been editing the branch via /ship you can rely on Step 12's
idempotency check. If you've been editing manually (merge resolution,
conflict fix, version bump), the audit is the last line of defense
before CI yells at you.
Conductor workspaces expect the git branch name to match the workspace directory name. When they disagree, Conductor silently fails to render the PR view + show ship state, leading to "did you actually push?" confusion.
Check this FIRST on every ship and BEFORE creating any PR:
WORKSPACE=$(basename "$PWD") # e.g. puebla-v4
BRANCH=$(git branch --show-current) # e.g. garrytan/gstack-requests
case "$BRANCH" in
*/"$WORKSPACE") echo "OK: branch tail matches workspace" ;;
"$WORKSPACE") echo "OK: branch == workspace" ;;
*) echo "MISMATCH: branch=$BRANCH workspace=$WORKSPACE — RENAME BEFORE SHIPPING" ;;
esacIf MISMATCH (branch is garrytan/foo but workspace is puebla-v4):
# Rename local, push under new name, delete old remote (and old PR if it
# was already created — github auto-closes it when head ref dies).
git branch -m garrytan/<workspace-name>
git push -u origin garrytan/<workspace-name>
git push origin --delete <old-branch-name>
# If a PR existed against the old branch:
# gh pr comment <old-pr> --body "Superseded by #<new>: branch renamed to match Conductor workspace."
# gh pr create --base master --title "..." --body "..." # recreate from renamed branchCaught the hard way on v0.41.9.0 ship: workspace puebla-v4 but branch
garrytan/gstack-requests produced PR #1439 that Conductor wouldn't
display. Renamed to garrytan/puebla-v4; recreated as #1440.
The /ship workflow's Step 1 should be augmented to run the mismatch
check; until that lands upstream, ALWAYS run the check above before
/ship invokes its first push or PR-create step.
Before any ship, read docs/RELEASING.md in full. It carries the
full release + contributor process: pre-ship test requirements (bun run ci:local / the
E2E lifecycle), the CHANGELOG voice + release-summary template, the "To take advantage of
vX" self-repair block, version migrations, the GitHub Actions SHA refresh, PR conventions,
and the community-PR-wave process. Use /ship — never hand-roll a release.
The ship-critical IRON RULES stay inline in this file (do NOT relocate them): the
Version-locations table above (the 5-file sync + the 3-line VERSION/package.json/CHANGELOG
audit), the Conductor branch=workspace rule (above), Post-ship /document-release (below),
the Privacy + Responsible-disclosure rules (below), and the PR-title-version-first rule
(below).
After EVERY /ship, you MUST run /document-release. This is NOT optional. Do NOT skip it. Do NOT say "docs look fine" without running it. The skill reads every .md file in the project, cross-references the diff, and updates anything that drifted.
If /ship's Step 8.5 triggers document-release automatically, that counts. But if it gets skipped for ANY reason (timeout, error, oversight), you MUST run it manually before considering the ship complete.
Files that MUST be checked on every ship:
- README.md — does it reflect new features, commands, or setup steps?
- CLAUDE.md — does it reflect new files, test files, or architecture changes?
- CHANGELOG.md — does it cover every commit?
- TODOS.md — are completed items marked done?
- docs/ — do any guides need updating?
A ship without updated docs is an incomplete ship. Period.
Never reference real people, companies, funds, or private agent names in any
public-facing artifact. Public artifacts include: CHANGELOG.md, README.md,
docs/, skills/, PR titles + bodies, commit messages, and comments in checked-in
code. Query examples, benchmark stories, and migration guides MUST use generic
placeholders.
Why: gbrain runs a personal knowledge brain containing notes on real people and
real companies (YC founders, portfolio companies, funds, investors, meeting
attendees). When a doc copies a query like gbrain graph diana-hu --depth 2 or
names a specific agent fork like Wintermute, that real name gets indexed by
search engines, surfaced in cross-references, and distributed with every release.
Name mapping to use in examples:
- Agent forks →
your agent fork,a downstream agent, oragent-fork - Example person →
alice-example,charlie-example, ora-founder - Example company →
acme-example,widget-co, ora-company - Example fund →
fund-a,fund-b,fund-c - Example deal →
acme-seed,widget-series-a - Example meeting →
meetings/2026-04-03(generic date is fine) - Example user →
youorthe user, never a proper name
Specific rule: never say Wintermute in any CHANGELOG, README, doc, PR, or
commit message. When the temptation is to illustrate with the real fork name:
- Reader-facing copy →
your OpenClaw(covers Wintermute, Hermes, AlphaClaw, and any other downstream OpenClaw deployment in one term the reader already recognizes). - First-person / origin-story copy →
Garry's OpenClaw(honest that this is the production deployment driving the feature, without exposing the private agent's name).
Wintermute may appear in private artifacts (scratch plans under
~/.gstack/projects/…, memory files, conversation transcripts, CEO-review
plans) — those aren't distributed. Anything checked into this repo or shipped
in a release must use the OpenClaw phrasing above. Sweeping a stale reference
is a small clean-up PR, not a debate.
When in doubt, ask yourself: "Would this query reveal private information about the user's contacts, investments, or portfolio if it were read by a stranger?" If yes, replace with generic placeholders.
Illustrative API examples with household-brand companies (Stripe, Brex, OpenAI, GitHub, etc.) are fine — they're public entities, not contacts in anyone's brain. Do not confuse illustrative API examples with queries that reveal real relationships.
When a release fixes a security gap or a user-impacting bug, describe the fix functionally. Do not enumerate the attack surface, quantify the exposure window, or highlight the most sensitive records by name in public-facing artifacts.
Public-facing artifacts include: CHANGELOG.md, README.md, docs/, PR titles
and bodies, commit messages, GitHub issue titles and comments, release pages,
tweets, blog posts.
Don't write:
- "10 tables were publicly readable by the anon key for months, including X, Y, Z"
- "X and Y are the most sensitive ones"
- "N tables exposed. Fix: enable RLS on these specific tables: ..."
Do write:
- "Security hardening pass. Fresh installs secure by default. Existing brains brought to the same bar automatically on upgrade."
- "If
gbrain doctorstill flags anything after upgrade, the message names each table and gives the exact fix."
Why: anyone reading the release page before they've upgraded now has a directed
probe list for unpatched installs. The source code ships the specifics anyway
(src/schema.sql, src/core/migrate.ts, test fixtures) — reverse engineers can
get them. But the release page is a broadcast channel. Don't hand attackers a
curated list with a banner.
The test: if a reader with no prior context could read the release note and walk away knowing "gbrain at version X has table Y readable by anon key until they patch," the note is too specific. Rewrite until that's no longer possible.
What IS fine in public artifacts:
- The mechanism of the fix ("the check now scans every public table instead of a hardcoded allowlist").
- User-facing operator ergonomics (the escape-hatch SQL template, the upgrade commands, the breaking-change flag).
- Credit to contributors.
- Generic framing of severity ("security posture tightening pass") without quantification.
What stays in private artifacts (plan files, private memories, internal docs):
- Specific table names, record counts, exposure duration.
- Which records stand out as highest-risk.
- Detailed before/after tables in the "numbers that matter" format.
If the CEO/Eng review of a plan produces a detailed exposure table, keep it in
the plan file under ~/.claude/plans/ or ~/.gstack/projects/. Don't copy it
into the CHANGELOG or PR body.
Applies retroactively: if you see a prior CHANGELOG entry naming attack-surface specifics, scrub it as a small cleanup commit, the same way a stale Wintermute reference gets swept.
Every PR title MUST start with the version, then the conventional-commit subject:
vMAJOR.MINOR.PATCH.MICRO <type>(<scope>): <summary> (#issue or wave ref)
Example (correct): v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1)
The version goes at the BEGINNING, never the end. This matches the repo's
commit-subject convention (git log shows v0.41.38.0 fix: ...,
v0.42.1.0 feat: ...) so the PR list, the merge commit, and the changelog all
read version-first. A title with the version parenthesized at the end
(feat(search): autocut ... (v0.42.3.0)) is WRONG — fix it with
gh pr edit <N> --title "vX.Y.Z.W <type>: <summary>".
This applies to gh pr create and every gh pr edit --title. When /ship
(or any flow) sets a PR title, the version is the first token. Same rule for the
final commit subject that carries the version bump.
When the user's request matches an available skill, ALWAYS invoke it using the Skill tool as your FIRST action. Do NOT answer directly, do NOT use other tools first. The skill has specialized workflows that produce better results than ad-hoc answers.
NEVER hand-roll ship operations. Do not manually run git commit + push + gh pr
create when /ship is available. /ship handles VERSION bump, CHANGELOG, document-release,
pre-landing review, test coverage audit, and adversarial review. Manually creating a PR
skips all of these. If the user says "commit and ship", "push and ship", "bisect and
ship", or any combination that ends with shipping — invoke /ship and let it handle
everything including the commits. If the branch name contains a version (e.g.
v0.5-live-sync), /ship should use that version for the bump.
Key routing rules:
- Product ideas, "is this worth building", brainstorming → invoke office-hours
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
- Ship, deploy, push, create PR, "commit and ship", "push and ship" → invoke ship
- QA, test the site, find bugs → invoke qa
- Code review, check my diff → invoke review
- Update docs after shipping → invoke document-release
- Weekly retro → invoke retro
- Design system, brand → invoke design-consultation
- Visual audit, design polish → invoke design-review
- Architecture review → invoke plan-eng-review
- Save progress, checkpoint, resume → invoke checkpoint
- Code quality, health check → invoke health