Skip to content

Commit ce9bb1b

Browse files
committed
feat: initial commit
0 parents  commit ce9bb1b

426 files changed

Lines changed: 85433 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/diagnose/SKILL.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
name: diagnose
3+
description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
4+
---
5+
6+
# Diagnose
7+
8+
A discipline for hard bugs. Skip phases only when explicitly justified.
9+
10+
When exploring the codebase, use the project's domain glossary to get a clear mental model of the relevant modules, and check ADRs in the area you're touching.
11+
12+
## Phase 1 — Build a feedback loop
13+
14+
**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.
15+
16+
Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
17+
18+
### Ways to construct one — try them in roughly this order
19+
20+
1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e.
21+
2. **Curl / HTTP script** against a running dev server.
22+
3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
23+
4. **Headless browser script** (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
24+
5. **Replay a captured trace.** Save a real network request / payload / event log to disk; replay it through the code path in isolation.
25+
6. **Throwaway harness.** Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.
26+
7. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
27+
8. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it.
28+
9. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs.
29+
10. **HITL bash script.** Last resort. If a human must click, drive _them_ with `scripts/hitl-loop.template.sh` so the loop is still structured. Captured output feeds back to you.
30+
31+
Build the right feedback loop, and the bug is 90% fixed.
32+
33+
### Iterate on the loop itself
34+
35+
Treat the loop as a product. Once you have _a_ loop, ask:
36+
37+
- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
38+
- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
39+
- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
40+
41+
A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
42+
43+
### Non-deterministic bugs
44+
45+
The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.
46+
47+
### When you genuinely cannot build a loop
48+
49+
Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop.
50+
51+
Do not proceed to Phase 2 until you have a loop you believe in.
52+
53+
## Phase 2 — Reproduce
54+
55+
Run the loop. Watch the bug appear.
56+
57+
Confirm:
58+
59+
- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
60+
- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
61+
- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
62+
63+
Do not proceed until you reproduce the bug.
64+
65+
## Phase 3 — Hypothesise
66+
67+
Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea.
68+
69+
Each hypothesis must be **falsifiable**: state the prediction it makes.
70+
71+
> Format: "If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."
72+
73+
If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
74+
75+
**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.
76+
77+
## Phase 4 — Instrument
78+
79+
Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.**
80+
81+
Tool preference:
82+
83+
1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs.
84+
2. **Targeted logs** at the boundaries that distinguish hypotheses.
85+
3. Never "log everything and grep".
86+
87+
**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
88+
89+
**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan), then bisect. Measure first, fix second.
90+
91+
## Phase 5 — Fix + regression test
92+
93+
Write the regression test **before the fix** — but only if there is a **correct seam** for it.
94+
95+
A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.
96+
97+
**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.
98+
99+
If a correct seam exists:
100+
101+
1. Turn the minimised repro into a failing test at that seam.
102+
2. Watch it fail.
103+
3. Apply the fix.
104+
4. Watch it pass.
105+
5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.
106+
107+
## Phase 6 — Cleanup + post-mortem
108+
109+
Required before declaring done:
110+
111+
- [ ] Original repro no longer reproduces (re-run the Phase 1 loop)
112+
- [ ] Regression test passes (or absence of seam is documented)
113+
- [ ] All `[DEBUG-...]` instrumentation removed (`grep` the prefix)
114+
- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location)
115+
- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns
116+
117+
**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the `/improve-codebase-architecture` skill with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env bash
2+
# Human-in-the-loop reproduction loop.
3+
# Copy this file, edit the steps below, and run it.
4+
# The agent runs the script; the user follows prompts in their terminal.
5+
#
6+
# Usage:
7+
# bash hitl-loop.template.sh
8+
#
9+
# Two helpers:
10+
# step "<instruction>" → show instruction, wait for Enter
11+
# capture VAR "<question>" → show question, read response into VAR
12+
#
13+
# At the end, captured values are printed as KEY=VALUE for the agent to parse.
14+
15+
set -euo pipefail
16+
17+
step() {
18+
printf '\n>>> %s\n' "$1"
19+
read -r -p " [Enter when done] " _
20+
}
21+
22+
capture() {
23+
local var="$1" question="$2" answer
24+
printf '\n>>> %s\n' "$question"
25+
read -r -p " > " answer
26+
printf -v "$var" '%s' "$answer"
27+
}
28+
29+
# --- edit below ---------------------------------------------------------
30+
31+
step "Open the app at http://localhost:3000 and sign in."
32+
33+
capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"
34+
35+
capture ERROR_MSG "Paste the error message (or 'none'):"
36+
37+
# --- edit above ---------------------------------------------------------
38+
39+
printf '\n--- Captured ---\n'
40+
printf 'ERRORED=%s\n' "$ERRORED"
41+
printf 'ERROR_MSG=%s\n' "$ERROR_MSG"
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Docs to Types Reference
2+
3+
Use this only when `SKILL.md` needs concrete examples or scope boundaries.
4+
5+
## Allowed depth
6+
7+
Allowed:
8+
9+
- domain types, schemas/parsers, brands
10+
- discriminated unions/state models
11+
- smart constructors for validated values
12+
- service/interface seams
13+
- typed errors/results
14+
- composition topology: modules, layers, factories, providers, registries, etc.
15+
- memory/no-op adapters needed for typecheck or architecture tests
16+
- production stubs that fail with typed `NotImplemented` errors
17+
- architecture tests, lint rules, or import-boundary checks
18+
19+
Not allowed unless the user expands scope:
20+
21+
- feature/business behavior
22+
- real persistence queries or network calls
23+
- real HTTP handlers beyond contracts/boundaries
24+
- after-the-fact broad refactors unrelated to codifying approved context
25+
- speculative helper libraries or generic frameworks without concrete use cases
26+
27+
## Fact buckets
28+
29+
Look for facts in these categories:
30+
31+
1. **Language** — canonical terms and forbidden aliases.
32+
2. **Shape** — variants, states, invariants, cardinality, allowed combinations, impossible states.
33+
3. **Boundary** — which module owns which responsibility.
34+
4. **Dependency** — which modules may import/call which others.
35+
5. **Runtime** — services/interfaces, composition, config, time, absence/nullability, error policy.
36+
6. **Adapter** — production/test adapters and external infrastructure boundaries.
37+
7. **Call stack** — production and test paths through the system.
38+
8. **Error** — expected failures and where they are translated.
39+
9. **Edge** — where raw external input becomes validated domain data.
40+
10. **Test surface** — public seams future behavior tests should target.
41+
42+
## Example: call stack
43+
44+
```txt
45+
Production:
46+
HTTP handlers
47+
-> LinkCatalog
48+
-> LinkCatalogDurableObjectAdapter
49+
-> Durable Object RPC/fetch boundary
50+
-> LinkCatalogCoordinator
51+
-> LinkCatalogStore
52+
-> LinkCatalogSqlExecutor
53+
-> PublicRedirectIndex
54+
55+
Tests:
56+
HTTP handlers
57+
-> LinkCatalog
58+
-> LinkCatalogMemoryAdapter
59+
-> LinkCatalogCoordinator
60+
-> LinkCatalogStoreMemoryAdapter
61+
-> PublicRedirectIndexMemoryAdapter
62+
```
63+
64+
## Example: prose to types
65+
66+
Prose:
67+
68+
```md
69+
- Customer can order drip coffee or espresso-based coffee.
70+
- Drip coffee is served with or without milk.
71+
- Espresso can be one or two shots.
72+
- Cappuccino and latte are espresso plus milk.
73+
```
74+
75+
Types:
76+
77+
```ts
78+
type Milk = "whole" | "oat" | "almond"
79+
type Coffee = { beans: "arabica" | "robusta" }
80+
type Espresso = { coffee: Coffee; shots: 1 | 2 }
81+
82+
type DripCoffee = { kind: "drip"; coffee: Coffee; milk?: Milk }
83+
type StraightEspresso = { kind: "espresso"; espresso: Espresso }
84+
type EspressoWithMilk = { kind: "cappuccino" | "latte"; espresso: Espresso; milk: Milk }
85+
86+
type Beverage = DripCoffee | StraightEspresso | EspressoWithMilk
87+
```
88+
89+
If the type model needs awkward optional fields or permits impossible combinations, return to the ambiguity gate.
90+
91+
## Example: service seam
92+
93+
Use project conventions. Plain TypeScript shape:
94+
95+
```ts
96+
export interface LinkCatalog {
97+
createAlias(input: CreateAliasInput): Promise<Result<AliasLink, LinkCatalogCreateError>>
98+
getLink(slug: LinkSlug): Promise<Result<Link | null, LinkCatalogReadError>>
99+
}
100+
101+
export const createLinkCatalog = (deps: LinkCatalogDeps): LinkCatalog => ({
102+
createAlias: deps.coordinator.createAlias,
103+
getLink: deps.store.getLinkBySlug,
104+
})
105+
```
106+
107+
If the project uses DI, Context services, modules, providers, or layers, express the same seam that way.
108+
109+
## Example: edge type safety
110+
111+
Raw data from HTTP, queues, DBs, files, SDKs, and AI/model output is not domain data yet.
112+
113+
```ts
114+
type EmailAddress = string & { readonly __brand: "EmailAddress" }
115+
type InvalidEmailAddress = { kind: "InvalidEmailAddress"; input: string }
116+
117+
const createEmailAddress = (input: string): Result<EmailAddress, InvalidEmailAddress> =>
118+
input.includes("@") ? ok(input as EmailAddress) : err({ kind: "InvalidEmailAddress", input })
119+
```
120+
121+
Use the project's actual schema/result/brand conventions.
122+
123+
## Example: dependency checks
124+
125+
Use the lightest existing mechanism: import-boundary lint, dependency-cruiser, custom rule, focused import-scan test, or documented rule in `AGENTS.md` if mechanical checking is too expensive.
126+
127+
Example facts to enforce:
128+
129+
- HTTP handlers may depend on `LinkCatalog`, not `LinkCatalogStore`.
130+
- Domain modules may not import SQL/Cloud/SDK clients.
131+
- Test adapters must satisfy the same seam as production adapters.
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
name: docs-to-types
3+
description: >-
4+
Converts grill-with-docs output — CONTEXT.md glossaries, ADRs, and approved domain decisions — into typed architecture. Use after grill-with-docs when the user wants domain types, seams, adapters, errors, call stacks, and dependency rules expressed in code before business behavior.
5+
---
6+
7+
# Docs to Types
8+
9+
Upfront architecture skill for converting clarified prose into typed architecture. If the team knows a durable domain or architecture fact, the type system and module graph should know it too.
10+
11+
Not for general grilling, refactoring review, or the first TDD slice. `CONTEXT.md`, ADRs, and grill notes are source material, not the final harness.
12+
13+
## Prime directive
14+
15+
Do **not** implement business behavior. Compress approved context into the whole intended typed architecture:
16+
17+
- canonical domain types, schemas, brands, and invariants
18+
- discriminated unions/state models that rule out invalid states
19+
- smart constructors/parsers for values entering from system edges
20+
- service/interface seams using project conventions
21+
- typed result/error families and error translation boundaries
22+
- production/test adapter slots
23+
- composition/layer/module topology using project conventions
24+
- production and test call stacks
25+
- dependency-direction checks where practical
26+
27+
Do **not** create business workflows, real persistence/network logic, product behavior, after-the-fact refactors, generic mutation/outcome frameworks, or fake production logic pretending to be complete.
28+
29+
## Read first
30+
31+
1. `AGENTS.md` and project coding rules
32+
2. `CONTEXT-MAP.md`, if present
33+
3. relevant `CONTEXT.md` files
34+
4. relevant `docs/adr/*`
35+
5. approved `grill-with-docs` notes/specs
36+
6. existing source near the target area
37+
38+
If there are no context docs/ADRs and no approved `grill-with-docs` output, stop and recommend `grill-with-docs`.
39+
40+
## Workflow
41+
42+
### 1. Extract architecture facts
43+
44+
Before editing, produce a fact table:
45+
46+
```md
47+
| Fact | Source | Code artifact | Confidence |
48+
| ------------------------------------------- | ---------- | --------------------- | ---------- |
49+
| Link Catalog is the application-facing seam | CONTEXT.md | `LinkCatalog` service | high |
50+
```
51+
52+
Include domain names, ownership boundaries, dependency direction, call stacks, adapter choices, runtime constraints, typed errors, and infrastructure that must stay behind adapters.
53+
54+
### 2. Ambiguity gate
55+
56+
If docs, code, or user plan conflict, ask one concrete question and wait.
57+
58+
If a prose fact cannot be represented cleanly as a type, seam, adapter, state, error, or dependency rule, treat the domain language as still ambiguous.
59+
60+
Example: `CONTEXT.md` says “Operator,” but code says “User” and “Actor.” Which is canonical?
61+
62+
### 3. Propose the typed structure
63+
64+
Before editing, unless the user requested direct implementation, list:
65+
66+
- files to create/update
67+
- domain types/schemas/brands/invariants/smart constructors
68+
- state models/discriminated unions
69+
- service/interface seams
70+
- typed result/error families
71+
- production and test call stacks
72+
- adapter slots/stubs
73+
- architecture checks
74+
- business logic intentionally excluded
75+
76+
Ask for approval.
77+
78+
### 4. Codify only the typed structure
79+
80+
When approved, write compiling architecture code for the full intended typed structure.
81+
82+
Rules:
83+
84+
- Use project domain names exactly.
85+
- Prefer deep modules: small interfaces, complexity behind the seam.
86+
- Follow project conventions for services, interfaces, DI, composition, layers, factories, providers, or registries.
87+
- Model absence, validation, variants, states, and expected failures with project-native typed patterns.
88+
- Keep HTTP/UI/CLI transport details at their boundaries.
89+
- Keep storage, SQL, queues, RPC clients, SDKs, and third-party APIs behind adapters.
90+
- Add architecture tests/lint/import rules when dependency direction can be checked mechanically.
91+
92+
See [REFERENCE.md](REFERENCE.md) for allowed code depth and examples.
93+
94+
### 5. Validate and report
95+
96+
Run narrow checks: typecheck, lint/static checks, and architecture tests if added. Final response: facts codified, files changed, production/test call stacks, adapter slots, checks run, and business logic left unimplemented.

.agents/skills/grill-me/SKILL.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
name: grill-me
3+
description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
4+
---
5+
6+
Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
7+
8+
Ask the questions one at a time.
9+
10+
If a question can be answered by exploring the codebase, explore the codebase instead.

0 commit comments

Comments
 (0)