modernweb-dev
diff --git a/‎.agents/skills/diagnose/SKILL.md‎
Lines changed: 117 additions & 0 deletions b/‎.agents/skills/diagnose/SKILL.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎.agents/skills/diagnose/scripts/hitl-loop.template.sh‎
Lines changed: 41 additions & 0 deletions b/‎.agents/skills/diagnose/scripts/hitl-loop.template.sh‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎.agents/skills/docs-to-types/REFERENCE.md‎
Lines changed: 131 additions & 0 deletions b/‎.agents/skills/docs-to-types/REFERENCE.md‎
Lines changed: 131 additions & 0 deletions
diff --git a/‎.agents/skills/docs-to-types/SKILL.md‎
Lines changed: 96 additions & 0 deletions b/‎.agents/skills/docs-to-types/SKILL.md‎
Lines changed: 96 additions & 0 deletions
diff --git a/‎.agents/skills/grill-me/SKILL.md‎
Lines changed: 10 additions & 0 deletions b/‎.agents/skills/grill-me/SKILL.md‎
Lines changed: 10 additions & 0 deletions
@@ -0,0 +1,117 @@
+---
+name: diagnose
+description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
+---
+
+# Diagnose
+
+A discipline for hard bugs. Skip phases only when explicitly justified.
+
+When exploring the codebase, use the project's domain glossary to get a clear mental model of the relevant modules, and check ADRs in the area you're touching.
+
+## Phase 1 — Build a feedback loop
+
+**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.
+
+Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
+
+### Ways to construct one — try them in roughly this order
+
+1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e.
+2. **Curl / HTTP script** against a running dev server.
+3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
+4. **Headless browser script** (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
+5. **Replay a captured trace.** Save a real network request / payload / event log to disk; replay it through the code path in isolation.
+6. **Throwaway harness.** Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.
+7. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
+8. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it.
+9. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs.
+10. **HITL bash script.** Last resort. If a human must click, drive _them_ with `scripts/hitl-loop.template.sh` so the loop is still structured. Captured output feeds back to you.
+
+Build the right feedback loop, and the bug is 90% fixed.
+
+### Iterate on the loop itself
+
+Treat the loop as a product. Once you have _a_ loop, ask:
+
+- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
+- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
+- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
+
+A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
+
+### Non-deterministic bugs
+
+The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.
+
+### When you genuinely cannot build a loop
+
+Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop.
+
+Do not proceed to Phase 2 until you have a loop you believe in.
+
+## Phase 2 — Reproduce
+
+Run the loop. Watch the bug appear.
+
+Confirm:
+
+- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
+- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
+- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
+
+Do not proceed until you reproduce the bug.
+
+## Phase 3 — Hypothesise
+
+Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea.
+
+Each hypothesis must be **falsifiable**: state the prediction it makes.
+
+> Format: "If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."
+
+If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
+
+**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.
+
+## Phase 4 — Instrument
+
+Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.**
+
+Tool preference:
+
+1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs.
+2. **Targeted logs** at the boundaries that distinguish hypotheses.
+3. Never "log everything and grep".
+
+**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
+
+**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan), then bisect. Measure first, fix second.
+
+## Phase 5 — Fix + regression test
+
+Write the regression test **before the fix** — but only if there is a **correct seam** for it.
+
+A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.
+
+**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.
+
+If a correct seam exists:
+
+1. Turn the minimised repro into a failing test at that seam.
+2. Watch it fail.
+3. Apply the fix.
+4. Watch it pass.
+5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.
+
+## Phase 6 — Cleanup + post-mortem
+
+Required before declaring done:
+
+- [ ] Original repro no longer reproduces (re-run the Phase 1 loop)
+- [ ] Regression test passes (or absence of seam is documented)
+- [ ] All `[DEBUG-...]` instrumentation removed (`grep` the prefix)
+- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location)
+- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns
+
+**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the `/improve-codebase-architecture` skill with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started.
@@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+# Human-in-the-loop reproduction loop.
+# Copy this file, edit the steps below, and run it.
+# The agent runs the script; the user follows prompts in their terminal.
+#
+# Usage:
+#   bash hitl-loop.template.sh
+#
+# Two helpers:
+#   step "<instruction>"          → show instruction, wait for Enter
+#   capture VAR "<question>"      → show question, read response into VAR
+#
+# At the end, captured values are printed as KEY=VALUE for the agent to parse.
+
+set -euo pipefail
+
+step() {
+  printf '\n>>> %s\n' "$1"
+  read -r -p "    [Enter when done] " _
+}
+
+capture() {
+  local var="$1" question="$2" answer
+  printf '\n>>> %s\n' "$question"
+  read -r -p "    > " answer
+  printf -v "$var" '%s' "$answer"
+}
+
+# --- edit below ---------------------------------------------------------
+
+step "Open the app at http://localhost:3000 and sign in."
+
+capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"
+
+capture ERROR_MSG "Paste the error message (or 'none'):"
+
+# --- edit above ---------------------------------------------------------
+
+printf '\n--- Captured ---\n'
+printf 'ERRORED=%s\n' "$ERRORED"
+printf 'ERROR_MSG=%s\n' "$ERROR_MSG"
@@ -0,0 +1,131 @@
+# Docs to Types Reference
+
+Use this only when `SKILL.md` needs concrete examples or scope boundaries.
+
+## Allowed depth
+
+Allowed:
+
+- domain types, schemas/parsers, brands
+- discriminated unions/state models
+- smart constructors for validated values
+- service/interface seams
+- typed errors/results
+- composition topology: modules, layers, factories, providers, registries, etc.
+- memory/no-op adapters needed for typecheck or architecture tests
+- production stubs that fail with typed `NotImplemented` errors
+- architecture tests, lint rules, or import-boundary checks
+
+Not allowed unless the user expands scope:
+
+- feature/business behavior
+- real persistence queries or network calls
+- real HTTP handlers beyond contracts/boundaries
+- after-the-fact broad refactors unrelated to codifying approved context
+- speculative helper libraries or generic frameworks without concrete use cases
+
+## Fact buckets
+
+Look for facts in these categories:
+
+1. **Language** — canonical terms and forbidden aliases.
+2. **Shape** — variants, states, invariants, cardinality, allowed combinations, impossible states.
+3. **Boundary** — which module owns which responsibility.
+4. **Dependency** — which modules may import/call which others.
+5. **Runtime** — services/interfaces, composition, config, time, absence/nullability, error policy.
+6. **Adapter** — production/test adapters and external infrastructure boundaries.
+7. **Call stack** — production and test paths through the system.
+8. **Error** — expected failures and where they are translated.
+9. **Edge** — where raw external input becomes validated domain data.
+10. **Test surface** — public seams future behavior tests should target.
+
+## Example: call stack
+
+```txt
+Production:
+HTTP handlers
+  -> LinkCatalog
+    -> LinkCatalogDurableObjectAdapter
+      -> Durable Object RPC/fetch boundary
+        -> LinkCatalogCoordinator
+          -> LinkCatalogStore
+            -> LinkCatalogSqlExecutor
+          -> PublicRedirectIndex
+
+Tests:
+HTTP handlers
+  -> LinkCatalog
+    -> LinkCatalogMemoryAdapter
+      -> LinkCatalogCoordinator
+        -> LinkCatalogStoreMemoryAdapter
+        -> PublicRedirectIndexMemoryAdapter
+```
+
+## Example: prose to types
+
+Prose:
+
+```md
+- Customer can order drip coffee or espresso-based coffee.
+- Drip coffee is served with or without milk.
+- Espresso can be one or two shots.
+- Cappuccino and latte are espresso plus milk.
+```
+
+Types:
+
+```ts
+type Milk = "whole" | "oat" | "almond"
+type Coffee = { beans: "arabica" | "robusta" }
+type Espresso = { coffee: Coffee; shots: 1 | 2 }
+
+type DripCoffee = { kind: "drip"; coffee: Coffee; milk?: Milk }
+type StraightEspresso = { kind: "espresso"; espresso: Espresso }
+type EspressoWithMilk = { kind: "cappuccino" | "latte"; espresso: Espresso; milk: Milk }
+
+type Beverage = DripCoffee | StraightEspresso | EspressoWithMilk
+```
+
+If the type model needs awkward optional fields or permits impossible combinations, return to the ambiguity gate.
+
+## Example: service seam
+
+Use project conventions. Plain TypeScript shape:
+
+```ts
+export interface LinkCatalog {
+  createAlias(input: CreateAliasInput): Promise<Result<AliasLink, LinkCatalogCreateError>>
+  getLink(slug: LinkSlug): Promise<Result<Link | null, LinkCatalogReadError>>
+}
+
+export const createLinkCatalog = (deps: LinkCatalogDeps): LinkCatalog => ({
+  createAlias: deps.coordinator.createAlias,
+  getLink: deps.store.getLinkBySlug,
+})
+```
+
+If the project uses DI, Context services, modules, providers, or layers, express the same seam that way.
+
+## Example: edge type safety
+
+Raw data from HTTP, queues, DBs, files, SDKs, and AI/model output is not domain data yet.
+
+```ts
+type EmailAddress = string & { readonly __brand: "EmailAddress" }
+type InvalidEmailAddress = { kind: "InvalidEmailAddress"; input: string }
+
+const createEmailAddress = (input: string): Result<EmailAddress, InvalidEmailAddress> =>
+  input.includes("@") ? ok(input as EmailAddress) : err({ kind: "InvalidEmailAddress", input })
+```
+
+Use the project's actual schema/result/brand conventions.
+
+## Example: dependency checks
+
+Use the lightest existing mechanism: import-boundary lint, dependency-cruiser, custom rule, focused import-scan test, or documented rule in `AGENTS.md` if mechanical checking is too expensive.
+
+Example facts to enforce:
+
+- HTTP handlers may depend on `LinkCatalog`, not `LinkCatalogStore`.
+- Domain modules may not import SQL/Cloud/SDK clients.
+- Test adapters must satisfy the same seam as production adapters.
@@ -0,0 +1,96 @@
+---
+name: docs-to-types
+description: >-
+  Converts grill-with-docs output — CONTEXT.md glossaries, ADRs, and approved domain decisions — into typed architecture. Use after grill-with-docs when the user wants domain types, seams, adapters, errors, call stacks, and dependency rules expressed in code before business behavior.
+---
+
+# Docs to Types
+
+Upfront architecture skill for converting clarified prose into typed architecture. If the team knows a durable domain or architecture fact, the type system and module graph should know it too.
+
+Not for general grilling, refactoring review, or the first TDD slice. `CONTEXT.md`, ADRs, and grill notes are source material, not the final harness.
+
+## Prime directive
+
+Do **not** implement business behavior. Compress approved context into the whole intended typed architecture:
+
+- canonical domain types, schemas, brands, and invariants
+- discriminated unions/state models that rule out invalid states
+- smart constructors/parsers for values entering from system edges
+- service/interface seams using project conventions
+- typed result/error families and error translation boundaries
+- production/test adapter slots
+- composition/layer/module topology using project conventions
+- production and test call stacks
+- dependency-direction checks where practical
+
+Do **not** create business workflows, real persistence/network logic, product behavior, after-the-fact refactors, generic mutation/outcome frameworks, or fake production logic pretending to be complete.
+
+## Read first
+
+1. `AGENTS.md` and project coding rules
+2. `CONTEXT-MAP.md`, if present
+3. relevant `CONTEXT.md` files
+4. relevant `docs/adr/*`
+5. approved `grill-with-docs` notes/specs
+6. existing source near the target area
+
+If there are no context docs/ADRs and no approved `grill-with-docs` output, stop and recommend `grill-with-docs`.
+
+## Workflow
+
+### 1. Extract architecture facts
+
+Before editing, produce a fact table:
+
+```md
+| Fact                                        | Source     | Code artifact         | Confidence |
+| ------------------------------------------- | ---------- | --------------------- | ---------- |
+| Link Catalog is the application-facing seam | CONTEXT.md | `LinkCatalog` service | high       |
+```
+
+Include domain names, ownership boundaries, dependency direction, call stacks, adapter choices, runtime constraints, typed errors, and infrastructure that must stay behind adapters.
+
+### 2. Ambiguity gate
+
+If docs, code, or user plan conflict, ask one concrete question and wait.
+
+If a prose fact cannot be represented cleanly as a type, seam, adapter, state, error, or dependency rule, treat the domain language as still ambiguous.
+
+Example: `CONTEXT.md` says “Operator,” but code says “User” and “Actor.” Which is canonical?
+
+### 3. Propose the typed structure
+
+Before editing, unless the user requested direct implementation, list:
+
+- files to create/update
+- domain types/schemas/brands/invariants/smart constructors
+- state models/discriminated unions
+- service/interface seams
+- typed result/error families
+- production and test call stacks
+- adapter slots/stubs
+- architecture checks
+- business logic intentionally excluded
+
+Ask for approval.
+
+### 4. Codify only the typed structure
+
+When approved, write compiling architecture code for the full intended typed structure.
+
+Rules:
+
+- Use project domain names exactly.
+- Prefer deep modules: small interfaces, complexity behind the seam.
+- Follow project conventions for services, interfaces, DI, composition, layers, factories, providers, or registries.
+- Model absence, validation, variants, states, and expected failures with project-native typed patterns.
+- Keep HTTP/UI/CLI transport details at their boundaries.
+- Keep storage, SQL, queues, RPC clients, SDKs, and third-party APIs behind adapters.
+- Add architecture tests/lint/import rules when dependency direction can be checked mechanically.
+
+See [REFERENCE.md](REFERENCE.md) for allowed code depth and examples.
+
+### 5. Validate and report
+
+Run narrow checks: typecheck, lint/static checks, and architecture tests if added. Final response: facts codified, files changed, production/test call stacks, adapter slots, checks run, and business logic left unimplemented.
@@ -0,0 +1,10 @@
+---
+name: grill-me
+description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
+---
+
+Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
+
+Ask the questions one at a time.
+
+If a question can be answered by exploring the codebase, explore the codebase instead.