Skills that ground AI agents in strategic context: frame the problem before solving it, capture learning that persists, ship outcomes not just outputs.
Full framework docs or just install and start.
npx skills add open-horizon-labs/skills -g -a claude-code -yRe-run the install command to pull the latest version.
10 commands forming the language of strategic execution:
| Skill | Description |
|---|---|
/teach-oh |
Project setup. Explore codebase, ask about strategy/aims/reviewers/practices, write context to AGENTS.md |
| Skill | Description |
|---|---|
/aim |
Clarify the outcome you want—a change in behavior, not a feature |
/problem-space |
Map what we're optimizing and what constraints we treat as real |
/problem-statement |
Define the framing. Change the statement, change the solution space |
| Skill | Description |
|---|---|
/solution-space |
Explore candidate solutions. Band-aid, optimize, reframe, or redesign? |
/execute |
Do the work. Pre-flight, build, detect drift, salvage if needed |
/ship |
Deliver to users. Optimize the path from code to working install |
| Skill | Description |
|---|---|
/review |
Check work and detect drift. A second opinion before committing, with code, writing, and learning lenses |
/dissent |
Devil's advocate. Seek contrary evidence before locking in |
/salvage |
Extract learning before restarting. Code is a draft; learning is the asset |
/distill |
Curate accumulated metis across sessions. Surface patterns, promote to guardrails, compact noise |
Intent (aim, problem-space, problem-statement)
│
▼
Execution (solution-space, execute, ship)
│
▼
Review (review, dissent)
│
▼ (if drift detected)
SALVAGE ──► back to Problem Space with new understanding
Periodically (between sessions):
DISTILL ──► curate accumulated metis, promote patterns to guardrails
The workflow is strongest when each phase preserves one kind of shared meaning and leaves explicit fields the next phase can reuse rather than re-inferring them from prose. In practice, the carry-forward contract is:
/teach-ohpreserves project-specific reorientation norms across sessions/aimpreserves intent: behavior change, why it matters, mechanism-as-hypothesis, assumptions, feedback signal, guardrails, and misunderstanding evidence/problem-spacepreserves the situation map: terrain, real constraints, hidden assumptions, frame-stress signals, and open questions/problem-statementpreserves the frame: plausible working story, constraints, assumptions, pressure test, and invalidation signal/solution-spacepreserves optionality: materially different candidates, interpretive variety, accepted trade-offs, and adversarial risk-retirement checks/executepreserves execution boundaries under pressure: success criteria, constraints, drift handling, adversarial checks that would fail the tempting local patch, and decisions that need review, dissent, or human judgment/shippreserves contact with reality: target-environment evidence and learning routed back to aim, problem-space, guardrails, or metis/reviewpreserves independent judgment: result vs. aim, contract drift, incomplete risk retirement, and frame collapse routed separately from ordinary implementation drift/dissentpreserves contested meaning: contrary evidence, stress-tested story, and a shared decision point/salvagepreserves learning from failure: frame shifts, ownership/coordination breakdowns, and durable metis/distillpreserves metis quality: patterns promoted by evidence, mythology compacted or discarded
Outcome alignment: strategic grounding and productive friction come from preserving intent, constraints, assumptions, optionality, and dissent before momentum hardens; redesign over recurring patches comes from frame-stress, frame-collapse, and salvage routes that expose broken interpretations instead of patching symptoms; persistent learning across sessions comes from teach-oh, salvage, and distill carrying situated metis, role boundaries, and guardrails forward. Across the tree, each artifact should satisfy necessity, viability, sufficiency, and connectedness: why this exists, why this path can work, why it is enough for now, and how it connects to the level above and below.
Every step should pair a strategy (what and why) with a tactic (how). Tactics that do not ladder back to a parent objective are busy work; objectives without tactics are wishful thinking.
Phase shifts should be treated as commitment decisions, not momentum. Moving from exploration to execution means stating why you're ready to deepen, what would invalidate the current direction, and what should trigger a stop or pivot.
Context should privilege situated metis over generic advice. Persist the non-obvious lessons, constraints, and local patterns your experience adds; foundational-model knowledge that adds no local leverage is noise and pollutes the context space.
Skill prompts should stay hot-path sized. Move conditional detail into JIT references and load it only when it changes the decision or output quality. More context is not automatically more clarity.
Contract shape is executable. When changing skill output templates, session handoff text, or generated agent copies, run:
python3 scripts/validate_skill_contracts.pyThe validator guards against the CodeRabbit class of failures: emitted section anchors drifting from persisted session anchors, named obligations disappearing between phases, and generated agent prompts losing the source skill contract.
We test this workflow against a companion sensemaking benchmark: small implementation tasks where visible tests pass easily, but hidden checks catch the model stopping at the first plausible fix. The point is not to prove universal intelligence. The point is to ask a practical question: does the same model/runner behave better when the harness forces aim, framing, solution comparison, and adversarial risk retirement?
The companion bench is open-horizon-labs/sensemaking-bench; the calibrated comparison used dashboard-cache-trap, notification-third-flag, and timeout-symptom-trap.
On the calibrated baseline-failure set, the answer was yes. In the initial same-runner/same-model comparison, hidden pass rate moved from 1/9 without skills to 7/9 with the current adversarial risk-retirement flow.
| Condition | Hidden pass | What changed |
|---|---|---|
| No skills | 1/9 |
Agent usually made visible tests pass and stopped |
| Risk-routing skills | 2/9 |
Agent named risks, but could still retire them with weak evidence |
| Adversarial risk-retirement skills | 7/9 |
Agent had to name the tempting wrong patch and add checks that would fail it |
Behaviorally, a skilled harness outperformed the unskilled one by changing the stopping condition: not "visible tests pass," but "visible tests pass, model-checkable risks are retired by checks that would fail the tempting shortcut, and remaining risks are explicitly out of model-checkable scope."
The strongest gains came from tasks where the user request pointed at a symptom: duplicate notifications and timeout increases. The remaining weakness was narrower: dashboard payload-shape tasks still failed in some runs when agents removed obvious internal fields but did not derive the exact public field allowlist.
Scope of the claim: the benchmark result is evidence for the high-reasoning model/runner used in the trials. The result JSON did not record a public model label, so this README intentionally does not claim universal performance across every model or harness.
This workflow improves reasoning discipline: it makes aims explicit, surfaces assumptions, adds friction before commitment, and preserves context across phases. It does not make model reasoning inspectable, force logical soundness, or eliminate the gap between rhetorical and logical completeness. External verification, human judgment, and independent challenge remain necessary. The workflow makes those things easier to practice — it does not replace them.
Each skill works at multiple levels:
- Base - Works with just the prompt (no dependencies)
- With .oh/ session - Reads/writes
.oh/<session>.mdfor context handoff between skills - With RNA MCP - Semantic code search, graph traversal, outcome-to-code joins, and business context. Skills use RNA tools automatically when available.
- With OH MCP - Full integration with Open Horizons organizational graph
Skills share context via session files. Provide a name explicitly or accept the suggested one (derived from git branch):
/aim auth-refactor # Creates .oh/auth-refactor.md
/problem-statement auth-refactor # Reads aim, writes problem statement
/solution-space auth-refactor # Reads aim + problem statement, writes solutionFor oh-my-pi users, an optional hook detects where you are in the development cycle and suggests the right skill.
How it works:
- Reads your
.oh/session files to detect completed phases (state — primary signal) - Scans your prompt for intent signals (enrichment — used when state is ambiguous)
- Injects a
<oh-phase-context>block recommending the next skill - Deduplicates — only injects when the recommendation changes
Install manually:
mkdir -p .omp/hooks
cp hooks-omp/oh-skills-phase.ts .omp/hooks/Or via /teach-oh: offered during project setup, including .oh/skills-config.json customization.
Configuration (.oh/skills-config.json, optional):
{
"projectSkills": ["aim", "solution-space", "execute", "review", "dissent"],
"disabledSkills": [],
"phaseOverrides": {
"execute": ["dissent"]
}
}projectSkills— limit which skills get suggested (default: all)disabledSkills— skills to never suggestphaseOverrides— extra skills to suggest during specific phases
Config is loaded once at session start. Changes require restarting OMP.
The hook is entirely optional. Skills work the same with or without it.
Note: This hook requires the OMP hook API (
@oh-my-pi/pi-coding-agent). It is a TypeScript module that OMP loads at runtime — it cannot be compiled or tested independently within this repo. It lives here alongside the skills it serves, but its runtime home is.omp/hooks/.
Pre-built agent wrappers in agents-omp/ give each phase its own isolated context and scoped tools. Install via /teach-oh or manually copy to .omp/agents/:
mkdir -p .omp/agents
cp agents-omp/oh-*.md .omp/agents/When agents are installed, the phase-aware hook automatically switches from suggesting /skill commands to suggesting agent dispatch.
MIT