Thanks for your interest! This monorepo hosts several agent families. The buildable packages each build, test, and ship on their own:
- Self-contained —
acp/mini-swe-acpandacp/mini-swe-code. - The
ai-sdk/group —acp,harness-pi,harness-codex,harness-claude-code,harness-mimo— each pairs a pure-JS ACP server (server.mjs) with a small Pythonregister.pythat wires the agent into BenchFlow. omnigent/— its own Python package.
The acp-registry/ family is different: the acp-registry/ pip package
catalogs every agent, and most adapting agents ship a declarative
acp/<id>/manifest.toml instead — 38 of them, with no server.mjs of their own
(a couple, like mimo-acp, wrap a thin shim package).
They are classified in
acp-registry/src/acp_registry/catalog.py
(the live per-agent table is generated into
acp-registry/AGENTS.md) and validated by contract/.
New agents come two ways:
- Declarative (now primary) — write
acp/<id>/manifest.toml, classify it inacp-registry/src/acp_registry/catalog.py, and letcontract/(manifest_schema.json+ the contract tests) validate it. Seedocs/tiers.mdfor the tier the classification picks. - ai-sdk server — scaffold a
server.mjs+register.py, then parity-check with theadaptation-parityskill (docs/adaptation.md,docs/parity.md).
cd acp/mini-swe-acp
uv venv .venv && source .venv/bin/activate
uv pip install --prerelease=allow -e ".[dev]" # benchflow pins an rc litellm
pytest -q # 12 tests, no API keys needed
ruff check src tests && ruff format --check src testscd acp/mini-swe-code
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev,opencode]"
# Key-free test subset (what CI runs):
MSWEA_SILENT_STARTUP=1 pytest -q \
tests/models tests/agents tests/config tests/utils \
tests/run/test_batch_progress.py tests/run/test_inspector.py \
tests/run/test_run_hello_world.py tests/run/test_local.py \
tests/run/test_save.py
ruff check src testsThe full upstream suite additionally needs provider API keys and container runtimes (podman, bubblewrap, apptainer) — not required for most changes.
cd ai-sdk/acp # or harness-pi / harness-codex / harness-claude-code / harness-mimo
uv venv .venv && source .venv/bin/activate
uv pip install --prerelease=allow -e ".[dev]" # benchflow pins an rc litellm
pytest -q # key-free; no sandbox/model needed
ruff check src tests
node --check src/*/server.mjs # JS server syntaxThe JS server.mjs is base64-deployed into the benchflow sandbox by
register.py's install command (its npm deps are installed there). Running it
against a live benchmark needs Node, a sandbox (docker/daytona), and a model.
cd acp-registry
uv venv .venv && source .venv/bin/activate
uv pip install --prerelease=allow -e ".[dev]" # benchflow pins an rc litellm
pytest -qAGENTS.md is generated from the catalog by scripts/gen_agents_md.py (from
catalog.py + registry.snapshot.json) — after any catalog.py or snapshot
change, regenerate it (python scripts/gen_agents_md.py > AGENTS.md) or the
acp-registry CI job's freshness check fails.
contract/ is a manifest loader + tests (not a pip-installable package) that
validate every acp/<id>/manifest.toml against manifest_schema.json:
cd contract
pip install jsonschema pytest
PYTHONPATH=. pytest -qcd omnigent
uv venv .venv && source .venv/bin/activate
uv pip install --prerelease=allow -e ".[dev]" # benchflow pins an rc litellm
pytest -qRoot .github/workflows/ runs per-package tests (path-filtered) — including the
acp-registry, parity, omnigent, mimo-acp, and ai-sdk (+ harness-mimo)
jobs — plus ruff (pinned to the .pre-commit-config.yaml version) and a markdown
link check on PRs. The acp-registry job also fails the build when AGENTS.md is
stale, so regenerate it after any catalog.py (or snapshot) edit (see the
acp-registry dev-setup above). The contract/ tests are not yet a CI job — run
them locally. Please make sure the relevant package's tests and lint pass locally
before opening a PR.
pre-commit install at the repo root enables the same hooks (ruff,
ruff-format, typos) locally.
- Keep changes scoped to one package per PR when possible.
acp/mini-swe-code/src/minisweagent/tracks upstream mini-swe-agent — prefer upstream-compatible changes there (the opencode integration lives insrc/minisweagent/run/opencode/and is ours).- Style follows the existing code: type annotations,
pathlib, minimal comments (see.github/copilot-instructions.md).
acp/mini-swe-code/.gitignorecontains*.traj.json, buttests/test_data/{local,github_issue}.traj.jsonare tracked test fixtures (force-added upstream). If you ever re-add the tree wholesale, usegit add -ffor those two or two end-to-end tests fail withFileNotFoundError.- The bundled opencode TUI binary
(
acp/mini-swe-code/src/minisweagent/run/opencode/bin/opencode, ~82 MB, macOS arm64) is committed as a normal blob. Rebuild recipe: docs/usage/opencode_tui.md.