Skip to content

Latest commit

 

History

History
148 lines (115 loc) · 6.36 KB

File metadata and controls

148 lines (115 loc) · 6.36 KB

Architecture

HalfSeed separates orchestration from model access. This document is the high-level tour; the precise data and execution model lives in graph-workflow-spec.md.

Two layers

┌────────────────────────────────────────────────────────────┐
│ Orchestration (this repo)                                   │
│   • ProjectArchitecture: a DAG of agents and edges          │
│   • Director: plans iterations and decides when to stop     │
│   • Workflow: turns the graph into LLM calls per iteration  │
└─────────────────────────┬──────────────────────────────────┘
                          │ ChatBackend protocol
┌─────────────────────────▼──────────────────────────────────┐
│ Model access                                                │
│   • DeepSeekChatBackend (live)                              │
│   • OpenAI / Anthropic via ProviderRouter (live)            │
│   • RuleBasedBackend (offline, deterministic for tests)     │
└────────────────────────────────────────────────────────────┘

The boundary is the ChatBackend protocol — anything below it is "how to talk to a model", anything above it is "what to ask it".

Workflow shape

A project's workflow is a graph of agents edited in the web UI. The graph is a DAG; the system enforces this at validation time.

The default architecture looks like:

Director
  ├─→ Analytical (specialist)  ─┐
  ├─→ Numerical  (specialist)  ─┼─→ Skeptic ─→ Curator ─→ Writer ─→ Referee
  └─→ Literature (specialist)  ─┘                                  │
                                                                   └─→ Presentation

Director is the only required role: it plans each iteration, issues the brief into the graph, decides when to stop, and finalises the user-facing prose (the legacy single-shot Principal role was merged into Director). Post-Writer the runtime fans out Referee and Presentation in parallel.

What edges mean

A → B means exactly: "A's output flows into B's input on this iteration." Edges have no labels, no payload fields, no "purpose" — wires are wires; all behavior is on the nodes (each node's mission and output_contract free-text fields). See graph-workflow-spec.md §3 for the rationale and §9 for why edge-level prompts were rejected.

When the user removes an edge in the architecture editor, the runtime genuinely stops flowing those artifacts to the downstream node — see runtime/edge_filter.py.

What's outside the graph

Some behaviors are runtime mechanisms, not graph-expressible:

  • Iteration loop. The graph is a DAG; multi-iteration loops are the Director re-running the whole graph for another round. The Director's should_stop decision sits outside the graph.
  • Attack-vector market. When ≥2 specialists share an upstream, the runtime auto-runs a "pick distinct angles" protocol before their layer executes.
  • Cross-iteration context. prior_iteration_summary, open_threads, rejected_claims, the artifact pool — auto-injected into every node's prompt, not drawn as edges.
  • Side effects. Numerical sandbox runs and Writer's edits to paper.tex happen as post-processing hooks attached to the relevant role kinds.

Spec §5 covers each of these in detail.

Why this shape

The design avoids unbounded group chat. Every internal message is converted into a ResearchArtifact, and the curator has a hard max_items budget. Output is reviewable, testable, and visualizable.

Validation rules (cycles, duplicate singletons, unreachable specialists, etc.) run server-side and surface in the architecture editor; the "Run iteration" button is disabled when the graph has errors. See architecture_validate.py and spec §6 for the full rule list.

Tools

HalfSeed currently has two non-visual tools:

  • ArxivSearchClient retrieves arXiv candidates and feeds them to whichever specialist matches the literature role kind when --literature-search is enabled.
  • NumericalSandbox runs project-owned numerical checks. The first built-in routine performs an RK4 smoke check for harmonic-oscillator stability questions. It does not run arbitrary model-written code; the sandbox that does run code artifacts is in qa/sandbox.py and is invoked as a Numerical-role side effect.

Stability

Live LLM calls are not trusted to always return perfect JSON. The workflow:

  • requests JSON mode from the backend;
  • retries malformed JSON once with a compact regeneration prompt;
  • normalizes common schema variations (e.g. list-style specialist questions);
  • falls back to low-confidence artifacts when an internal specialist still fails;
  • retries transient HTTP errors such as 429 and 5xx.

Persistence

ProjectStore writes structured JSON for each iteration plus a rendered Markdown briefing. The CLI also exposes RunStore for the older one-shot run() path with --save-run.

Backend boundary

All agents depend on ChatBackend:

class ChatBackend(Protocol):
    def complete(self, request: ChatRequest) -> ChatResponse:
        ...

The current live backends are DeepSeekChatBackend and the OpenAI/ Anthropic adapters routed via ProviderRouter. Tests and offline demos use RuleBasedBackend. New backends only need to implement the protocol.

Pointers

topic start here
precise graph model + execution algorithm graph-workflow-spec.md
validator rules architecture_validate.py + spec §6
edge-driven artifact filtering runtime/edge_filter.py
graph executor (topo sort, layered run) runtime/graph_executor.py
role definitions and singleton constraints agents/roles.py
Director (iteration planner) agents/director.py
paper rendering publish/paper.py and publish/paper_edits.py