A from-first-principles curriculum in harness engineering for agentic systems. Build every pattern by hand before adopting frameworks. The repo is the artifact of the journey — modules in order, each with reference code and notes.
| Doc | Purpose |
|---|---|
docs/treatise.md |
The conceptual map — eight parts, from "why understanding the bare model call is non-negotiable" through state, modality, planes, frameworks, and the two-axis (modality × cognition) view of the curriculum. Read this to know why the modules are in the order they are. |
docs/curriculum.md |
The 15-module path — what every module teaches, the new lesson at each seam, the three-layer code organization (modules / strings / agents), the eight deeper territories (multi-modal seams, MCP, etc.), and a status table for what's built vs. on deck. Read this to know what's next. |
.
├── level_1_modules/ per-module reference implementations
│ └── module_01_bare_call/ (read this first)
├── level_2_strings/ cross-layer integrations (added when needed)
├── level_3_agents/ purpose-built agents (added when needed)
└── common/ shared building blocks (added when patterns stabilize)
Per-module isolation makes lessons legible — diff Module 4 against Module 3 and the change is the lesson. Strings prove composition; purpose-built agents prove the muscle holds when the problem is yours.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # then fill in ANTHROPIC_API_KEYCode is authored on branch claude/plan-task-dxiRu, pushed to GitHub, and
synced to the execution environment via git. To run a module:
git fetch origin
git checkout claude/plan-task-dxiRu
git pull
python level_1_modules/module_01_bare_call/bare_call.py "your question"| # | Module | Concept |
|---|---|---|
| 1 | module_01_bare_call |
The single model invocation. Stateless. No tools. |
| 1b | module_01b_bilateral |
Split read from act — separate parser and composer models, two tiers each (Anthropic only). Prototype for the bilateral piece of LIMBIC. |
| 1c | module_01c_bilateral_x |
Bilateral across three providers (Anthropic, OpenAI, Google). Adapter layer makes provider differences visible — the API-drift tax LIMBIC will amortize. |
| 1d | module_01d_modality |
Bilateral with image input. Parser sees the image once, composer reads only the text IR. S3-backed asset fetch via assets.py. |
| 1e | module_01e_audio |
Bilateral with audio input. Anthropic excluded from parser slots (capability matrix). First place where modality routing is forced, not optional. |
| 1f | module_01f_video |
Bilateral with video input. Single-provider parser (Gemini-only); composer is unconstrained. Modality forces parser; bilateral keeps composer free. |
| 1g | module_01g_audio_out |
Bilateral with audio output. Composer must be OpenAI; capability matrix splits into input/output halves. Audio bytes uploaded to s3://harness-eng/outputs/. |
| 1h | module_01h_modality_matrix |
Closes the modality matrix — any input modality (text/image/audio/video) × either output (text/audio). Dynamic --all generates valid configs per cell. |
| 1i | module_01i_image_out |
Bilateral with image output. Composer is a diffusion-transformer (gpt-image-1, gemini-2.5-flash-image); Anthropic excluded. IR becomes modality-shaped (subject, composition, style, lighting, aspect, negatives, safety). Cost arithmetic inverts — parser is a rounding error against per-image cost. |
| 1j | module_01j_video_out |
Bilateral with video output, async. Composer is Sora or Veo (submit/poll/fetch); Anthropic excluded. Refusal is a typed terminal state (completed / failed / rejected). Cost moves from cents to dollars per clip. |
| 1k | module_01k_image_edit |
Asset-conditioned image output. Closes (image, image) (true edit via images.edit / Gemini in-context edit) plus (audio, image) and (video, image) (parser translates asset to image brief). Two-path dispatch — edit when you can, translate when you must. |
| 1l | module_01l_video_edit |
Asset-conditioned video output. Closes (image, video) (image conditioning via Sora input_reference / Veo image=) plus (audio, video) and (video, video) (parser translates asset to video brief). Reuses 1j's async state machine. Closes the matrix at all 16 cells with no deferrals. |
Up next: Module 2 — The Conversation Loop. See docs/curriculum.md for the full 15-module path and the status table.
Subsequent modules are built on demand, in order.
The bilateral split is not free. A parser stage adds latency, input tokens, and coordination cost. Across modules 1–1g, recurring patterns have emerged that any future LIMBIC router must encode — including the patterns where bilateral loses and the right move is no modulation at all.
Where bilateral pays:
- Hard-to-parse input — image, audio, video, or ambiguous text where the parser does real interpretive work.
- Cheap-but-modality-capable parser → deep-prose composer — for
example
google-fast → anthropic-deep. Asymmetric advantage at near-baseline cost. - Pre-distilling Gemini Pro composer work — bilateral with a non-Gemini parser routinely cuts cost vs. Gemini Pro baseline, because the IR pre-distills work Pro would otherwise burn thinking tokens on (observed in 1c, 1d, 1f).
Where bilateral is overhead — LIMBIC must decline to modulate:
- Trivial prompts (e.g., "what is 2+2") — baseline single-call always wins; the bilateral overhead exceeds any quality lift.
gpt-4o-minibaseline on simple factual extracts — so cheap that almost no bilateral can compete on cost (e.g., 1d's $0.000049 baseline vs. $0.003 for the cheapest bilateral on the same prompt).- Parser slot = composer slot — pure overhead, no asymmetric advantage. Just route as baseline.
The discipline: LIMBIC's first decision is whether to modulate at all. "Decline to modulate, dispatch single-call baseline" must be a first-class routing outcome — not a fallback after a failed bilateral.
The bilateral seam maps cleanly to a 3-faculty cognitive model that's
sharper for routing than the 5-faculty taxonomy in docs/benchmark-lens.md
(which is sharper for grading, but conflates routing concerns):
| Faculty | Pipeline location | Tier choice driven by |
|---|---|---|
| Perception | parser stage | how hard the input is to understand (modality, noise, ambiguity, length) |
| Intellect | wherever reasoning happens (often composer) | depth of reasoning required (factual recall vs. novel inference) |
| Expression | composer stage | output modality and prose quality required |
Each faculty has its own (modality × tier) Goldilocks zone:
- Perception: fast tier when input is clean (terse text, simple chart); deep tier when input is noisy/ambiguous (long context, hard-to-read audio, multi-scene video).
- Intellect: fast tier on retrieval-shaped questions; deep tier on novel reasoning, multi-step inference, or any task where a thinking model's hidden tokens earn their cost.
- Expression: fast tier on terse factual replies; deep tier on long-form prose, persuasive arguments, or spoken delivery (audio output where pacing and word choice matter).
LIMBIC's per-call routing decomposes into three orthogonal Goldilocks lookups against this triad. Deeper Territory 6 (evaluation frameworks) is what turns the lookups from heuristic to data-driven.
Discrete dispatchable configurations across the bilateral × tier × modality space:
OUTPUT
text audio image video
┌────────────────────────────────┐
text │ ✅ ✅ ✅ ✅ │
INPUT image │ ✅ ✅ ✅ ✅ │
audio │ ✅ ✅ ✅ ✅ │
video │ ✅ ✅ ✅ ✅ │
└────────────────────────────────┘
16 modality cells (4 input × 4 output) × full bilateral expansion
(capability-filtered)
Modules 1 → 1h closed the text and audio output halves (8 cells).
Module 1i closes (text → image).
Module 1j closes (text → video) and proves the async-job primitive.
Module 1k closes (image|audio|video → image) — edit when you can,
translate when you must.
Module 1l closes (image|audio|video → video) — image conditioning on
Sora/Veo, parser-translate for non-image asset inputs.
All 16 cells are now covered with no deferrals.
After 1l the modality plane is solved. The remaining LIMBIC work is on axes orthogonal to modality: evaluation frameworks (Deeper Territory 6), telemetry/observability (Module 11), rule-based router (LIMBIC L2.1 forward-design), LIMBIC v0 (L3.1).
See docs/limbic-design.md for the full LIMBIC sketch and
docs/limbic-image-video-generative.md for the design grounding behind
1i and 1j (why image and video are not the same machine as text on
the other side of the wire).
| Doc | Purpose |
|---|---|
docs/treatise.md |
Conceptual map of the curriculum — eight parts. Read first to understand why the modules are ordered this way. |
docs/curriculum.md |
The 15-module path, three-layer code organization, deeper territories, and the build status table. |
docs/limbic-design.md |
Future-project sketch — multi-axis dynamic router (direction × faculty × modality × cost). Module 1b is the prototype of its bilateral axis. |
docs/measurement-seam.md |
Philosophical lens — frozen-weight LLM inference as a quantum-mechanical measurement act. The structural fact that makes the harness/researcher/inference role split inevitable. |
docs/limbic-image-video-generative.md |
Design grounding for 1i and 1j — why generative image/video output is a different operator family, why control flow goes async, and why cost units shift from micro-cents to dollars per clip. |
docs/seam-parameters.md |
Architectural lock — seam parameters (temperature, seed, voice, CFG, reasoning_effort, …) are dispatcher growth, not new curriculum modules. |