HalfSeed separates orchestration from model access. This document is the high-level tour; the precise data and execution model lives in graph-workflow-spec.md.
┌────────────────────────────────────────────────────────────┐
│ Orchestration (this repo) │
│ • ProjectArchitecture: a DAG of agents and edges │
│ • Director: plans iterations and decides when to stop │
│ • Workflow: turns the graph into LLM calls per iteration │
└─────────────────────────┬──────────────────────────────────┘
│ ChatBackend protocol
┌─────────────────────────▼──────────────────────────────────┐
│ Model access │
│ • DeepSeekChatBackend (live) │
│ • OpenAI / Anthropic via ProviderRouter (live) │
│ • RuleBasedBackend (offline, deterministic for tests) │
└────────────────────────────────────────────────────────────┘
The boundary is the ChatBackend protocol — anything below it is
"how to talk to a model", anything above it is "what to ask it".
A project's workflow is a graph of agents edited in the web UI. The graph is a DAG; the system enforces this at validation time.
The default architecture looks like:
Director
├─→ Analytical (specialist) ─┐
├─→ Numerical (specialist) ─┼─→ Skeptic ─→ Curator ─→ Writer ─→ Referee
└─→ Literature (specialist) ─┘ │
└─→ Presentation
Director is the only required role: it plans each iteration, issues
the brief into the graph, decides when to stop, and finalises the
user-facing prose (the legacy single-shot Principal role was merged
into Director). Post-Writer the runtime fans out Referee and
Presentation in parallel.
A → B means exactly: "A's output flows into B's input on this iteration."
Edges have no labels, no payload fields, no "purpose" — wires are wires;
all behavior is on the nodes (each node's mission and output_contract
free-text fields). See graph-workflow-spec.md §3
for the rationale and §9 for why edge-level prompts were rejected.
When the user removes an edge in the architecture editor, the runtime
genuinely stops flowing those artifacts to the downstream node — see
runtime/edge_filter.py.
Some behaviors are runtime mechanisms, not graph-expressible:
- Iteration loop. The graph is a DAG; multi-iteration loops are
the Director re-running the whole graph for another round. The
Director's
should_stopdecision sits outside the graph. - Attack-vector market. When ≥2 specialists share an upstream, the runtime auto-runs a "pick distinct angles" protocol before their layer executes.
- Cross-iteration context.
prior_iteration_summary,open_threads,rejected_claims, the artifact pool — auto-injected into every node's prompt, not drawn as edges. - Side effects. Numerical sandbox runs and Writer's edits to
paper.texhappen as post-processing hooks attached to the relevant role kinds.
Spec §5 covers each of these in detail.
The design avoids unbounded group chat. Every internal message is
converted into a ResearchArtifact, and the curator has a hard
max_items budget. Output is reviewable, testable, and visualizable.
Validation rules (cycles, duplicate singletons, unreachable specialists,
etc.) run server-side and surface in the architecture editor; the
"Run iteration" button is disabled when the graph has errors. See
architecture_validate.py and spec §6 for the full rule list.
HalfSeed currently has two non-visual tools:
ArxivSearchClientretrieves arXiv candidates and feeds them to whichever specialist matches the literature role kind when--literature-searchis enabled.NumericalSandboxruns project-owned numerical checks. The first built-in routine performs an RK4 smoke check for harmonic-oscillator stability questions. It does not run arbitrary model-written code; the sandbox that does run code artifacts is inqa/sandbox.pyand is invoked as a Numerical-role side effect.
Live LLM calls are not trusted to always return perfect JSON. The workflow:
- requests JSON mode from the backend;
- retries malformed JSON once with a compact regeneration prompt;
- normalizes common schema variations (e.g. list-style specialist questions);
- falls back to low-confidence artifacts when an internal specialist still fails;
- retries transient HTTP errors such as 429 and 5xx.
ProjectStore writes structured JSON for each iteration plus a
rendered Markdown briefing. The CLI also exposes RunStore for the
older one-shot run() path with --save-run.
All agents depend on ChatBackend:
class ChatBackend(Protocol):
def complete(self, request: ChatRequest) -> ChatResponse:
...The current live backends are DeepSeekChatBackend and the OpenAI/
Anthropic adapters routed via ProviderRouter. Tests and offline
demos use RuleBasedBackend. New backends only need to implement
the protocol.
| topic | start here |
|---|---|
| precise graph model + execution algorithm | graph-workflow-spec.md |
| validator rules | architecture_validate.py + spec §6 |
| edge-driven artifact filtering | runtime/edge_filter.py |
| graph executor (topo sort, layered run) | runtime/graph_executor.py |
| role definitions and singleton constraints | agents/roles.py |
| Director (iteration planner) | agents/director.py |
| paper rendering | publish/paper.py and publish/paper_edits.py |