Clarify sub-skill progressive disclosure in token/cost optimization guidance (#39227)

Copilot · web-flow · commit 7a4c2e429cf4 · 2026-06-14T08:16:43.000-07:00
diff --git a/.github/aw/token-optimization.md b/.github/aw/token-optimization.md
@@ -14,6 +14,7 @@ Apply these in order — each check can halve costs:
 - [ ] **gh-proxy**: Set `tools.github.mode: gh-proxy` — skips Docker MCP server startup and extra tool definitions
 - [ ] **cli-proxy**: Mount additional MCP servers as CLIs via `cli-proxy: true` — agent pipes output through `jq` before it enters context
 - [ ] **Sub-agents**: Delegate repetitive per-item tasks to `model: small` sub-agents (~10–20× cheaper)
+- [ ] **Sub-skills**: Keep the main prompt as a short execution plan; move detailed playbooks/output layouts into `## skill:` blocks the agent invokes only when needed
 - [ ] **Prompt size**: Strip redundant instructions, examples, and pleasantries from the prompt body
 - [ ] **Dynamic context**: Inject only required fields — `${{ github.event.issue.number }}` not the full event payload
 - [ ] **Pull context on demand**: query logs/data only after a hypothesis forms; avoid preloading large raw dumps into the initial prompt
@@ -264,6 +265,16 @@ Nothing else.
 
 **Why this saves tokens:** sub-agents run on the cheap `small` model; the main agent only reads compact `{"number":…, "category":…}` JSON; sub-agent dispatches can run in parallel.
 
+### Pair sub-agents with sub-skills (progressive disclosure)
+
+Use sub-skills as progressive disclosure for instruction-heavy tasks:
+
+- Keep the main prompt short and plan-like (what to do, in what order).
+- Put verbose instructions (report layout, rubric details, long formatting constraints) into `## skill:` blocks.
+- Invoke those skills only at the moment they are needed (for example, when producing final output), so early planning/execution turns stay lean.
+
+This pattern lowers ambient context and usually improves both latency and AIC by delaying expensive instruction payloads until the final phase.
+
 **Sub-agent model aliases:**
 
 | Alias | Use when |
diff --git a/docs/src/content/docs/reference/cost-management.md b/docs/src/content/docs/reference/cost-management.md
@@ -476,7 +476,15 @@ See [Inline Sub-Agents](/gh-aw/reference/inline-sub-agents/) for the full syntax
 
 ### Use Inline Skills to Reduce Context
 
-Move large instruction blocks out of the main prompt body using inline skills. At runtime, each `## skill:` block is extracted and written to engine-specific skill locations — the agent can invoke the skill on demand instead of receiving the guidance upfront, keeping the ambient context slim:
+Move large instruction blocks out of the main prompt body using inline skills. At runtime, each `## skill:` block is extracted and written to engine-specific skill locations — the agent can invoke the skill on demand instead of receiving the guidance upfront, keeping the ambient context slim.
+
+Treat the main prompt as an execution plan and sub-skills as deferred detail:
+
+- Main prompt: concise plan, sequencing, and decision points.
+- Sub-skills: verbose checklists, report templates/layout rules, and domain rubrics.
+- Invoke sub-skills only when needed (for example, at final report generation), not at startup.
+
+This progressive-disclosure pattern keeps early turns focused and reduces per-run token overhead:
 
 ```aw wrap
 engine: