Python 3.10+ async AI benchmarking tool for measuring LLM inference server performance. 10 services communicate via ZMQ message bus.
Reference documentation:
docs/architecture.md- Three-plane architecture, core components, credit system, data flow, communication patternsdocs/dev/patterns.md- Code examples for CLI commands, services, models, messages, plugins, error handling, logging, testingdocs/cli-options.md- Complete CLI command and option referencedocs/environment-variables.md- AllAIPERF_*environment variables by subsystemdocs/metrics-reference.md- Metric definitions, formulas, and requirementsdocs/plugins/plugin-system.md- Plugin architecture, categories, creation guideCONTRIBUTING.md- Development setup, available commands, pre-commit hooks, DCO
- async/await for ALL I/O - no
time.sleep, no blocking calls. Field(description="...")on EVERY Pydantic field. Docstrings on dataclass fields.- Type hints on ALL functions (params and return).
- KISS + DRY: minimal code, optimize for reader.
AIPerfBaseModelfor data,BaseConfigfor configuration.@dataclass(slots=True)for hot-path inner models created at high volume (e.g. SSE chunks, parsed responses) where Pydantic overhead matters. Use__pydantic_config__ = ConfigDict(extra="forbid")on dataclasses that participate in Pydantic union discrimination.BaseComponentServicefor services,BaseServicefor SystemController only.- Message bus for inter-service communication - no shared mutable state.
- CLI commands: one file per command in
cli_commands/, lazily loaded via import strings incli.py. Seedocs/dev/patterns.md. - YAML plugin registry for extensible features (
src/aiperf/plugin/plugins.yaml). - Lambda for expensive logs:
self.debug(lambda: f"{self._x()}"). Direct string for cheap ones. - Always
orjson.loads(s),orjson.dumps(d)for JSON. - Reading filesystem paths: use
aiperf.common.path_safety.safe_read_template_pathinstead of inlinePath().read_text()/open().read(). ReturnsNoneon safety-check failure; caller picks the fallback semantic. Seedocs/dev/patterns.md§ "Safe Filesystem Reads Pattern". - No
Optional[X]orUnion[X, Y]- useX | Y. - Comments only for "why?" not "what".
- Enums are string-based - use
MessageType.Xdirectly, never.value. - Dependencies: always use
uv(never pip) -uv add package,uv run pytest. - Use mermaid diagrams instead of ASCII art in markdown files.
- Do not create markdown files to document code changes or decisions.
- Do not over-comment code. Removing code is fine without adding comments to explain why.
- No emojis in code or comments.
- Hide a metric from the console table with
console_group = MetricConsoleGroup.NONE; group it into a separate section withMetricConsoleGroup.{USAGE,CACHE,PREDICTION,AUDIO,REASONING}. Default isDEFAULT. Seedocs/metrics-reference.md"Metric Console Group Reference".
Numeric metric values crossing a serialization boundary or feeding a numerical algorithm must be finite or explicitly None. Use aiperf.common.finite (FiniteFloat, scrub_non_finite, nan_safe_mean/std, is_finite_value). Mechanical CI invariants in tests/unit/property/test_finite_invariants.py reject new violations and ratchet existing debt to zero via baseline files. See docs/dev/patterns.md § "NaN/Inf Discipline Pattern" and docs/dev/global-invariants.md for the full contract.
make first-time-setup # Initial environment setup
make install # Install project + mock server
uv run pytest tests/unit/ -n auto # Unit tests (fast, isolated)
uv run pytest -m integration -n auto # Integration tests (real services, multiprocess)
uv run pytest -m component_integration -n auto # Component integration tests (single process)
ruff format . && ruff check --fix . # Format and lint
make validate-plugin-schemas # Validate plugin registry
pre-commit run # Pre-commit on staged files
pre-commit run --all-files # Pre-commit on all files
make generate-all-docs # Regenerate CLI + env var docs
make generate-all-plugin-files # Regenerate plugin enums, overloads, schemasRun pre-commit after every code change, even before creating commits:
pre-commit run # Staged files only
pre-commit run --all-files # All files (recommended after significant changes)Hooks: check-ast, debug-statements, detect-private-key, check-added-large-files, check-case-conflict, check-executables-have-shebangs, check-merge-conflict, check-json, check-toml, check-yaml, check-shebang-scripts-are-executable, end-of-file-fixer, mixed-line-ending, no-commit-to-branch, requirements-txt-fixer, trailing-whitespace, codespell, add-license, generate-cli-docs, generate-env-vars-docs, generate-plugin-artifacts, validate-plugin-schemas, test-imports, check-agent-files-sync, check-ergonomics, check-ruff-baselined, ruff, ruff-format.
- Create class extending
BaseComponentServicewith@on_messagehandlers - Register in
src/aiperf/plugin/plugins.yamlunderservicecategory withclass,description,metadata - Add message type to
common/enums/enums.pyif new messages needed - Create message class in
messages/withmessage_typefield - Validate with
aiperf plugins --validate
- Add enum value to
MessageTypeincommon/enums/enums.py - Create message class in
messages/inheriting fromMessagewithmessage_typefield set - Add
@on_message(MessageType.X)handler in the receiving service - Auto-subscription happens during
@on_initphase
- Create plugin class implementing the appropriate base
- Add entry to
src/aiperf/plugin/plugins.yamlwithclass,description,metadata - Validate with
make validate-plugin-schemas - Use via
plugins.get_class(PluginType.X, 'name')
See docs/dev/patterns.md § "Adding a New CLI Flag". CLIConfig is flat; never add a nested config class.
@pytest.mark.asynciofor async tests,@pytest.mark.parametrizefor data-drivenfrom tests.harness import mock_pluginfor plugin mocking- Name:
test_<function>_<scenario>_<expected>e.g.test_parse_config_missing_field_raises_error - Imports at file top, fixtures for setup, one focus per test
- Use
from pytest import paramand put# fmt: skipon the)line:@pytest.mark.parametrize( "arg", [ param(..., id="case1"), param(..., id="case2"), ], ) # fmt: skip
- Auto-fixtures (always active): asyncio.sleep runs instantly, RNG=42, singletons reset between tests
Feature branches use <username>/feature-name format, forked from main. One PR = one concern.
- SystemController uses
BaseService(notBaseComponentService) - it's the orchestrator. - Worker/TimingManager disable GC for latency - see
service_metadata.disable_gc. - macOS child processes close terminal FDs to prevent Textual UI corruption.
- Plugin priority resolves conflicts: higher wins, external beats built-in at equal priority.
- Decorators:
@on_init,@on_start,@on_stop,@on_message,@on_command,@background_task,@on_pull_message,@on_request. - Communication:
publish()for broadcast,@on_messageto subscribe,send_command_and_wait_for_response()for sync. AIPerfLifecycleMixinfor standalone components:CREATED->INITIALIZING->INITIALIZED->STARTING->RUNNING->STOPPING->STOPPED;FAILEDterminal.dag_jsonlinput type: conversation DAG benchmarks (fork + spawn modes). Seedocs/benchmark-modes/dag.mdfor abstractions and authoring.- Validator gate convention: unsupported constructs raise
NotImplementedErrorwith a leading"<loc>: <reason>"prefix where<loc>identifies the conversation/turn (e.g."conversation 'foo' turn 3: ..."). New validators must follow this shape. - Per-turn payload contract:
extra_body/max_tokens/modelare dispatch-turn only;raw_toolsis the lone field that walks history (system-prompt-like). Dataset rows authorextra, notextra_body. Seedocs/dev/patterns.md"Per-turn datasetextra".
AIPerfConfig.plot: PlotEnvelopeConfig | None lets a single AIPerf YAML own its
visualization. Two forms: bare-string path (plot: ./plots/baseline.yaml,
resolved relative to the AIPerf YAML's dir) or inline dict mirroring
src/aiperf/plot/default_plot_config.yaml. When set, ~/.aiperf/plot_config.yaml
is ignored and artifacts.auto_plot flips to True (unless explicitly False).
The auto-plot callback materializes the resolved envelope to
<artifact_dir>/.aiperf-plot-config.yaml so aiperf plot <dir> reproduces.
See src/aiperf/config/plot.py for the Pydantic models.
- Review diff: all lines required?
ruff format . && ruff check --fix .uv run pytest tests/unit/ -n autouv run pytest tests/unit/property/ -n auto(mechanical NaN/inf and field-validator invariants)- Type hints on all functions
Field(description=...)on all Pydantic fieldsgit commit -s
AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, and .cursor/rules/python.mdc must contain identical content (only headers/frontmatter differ). When updating one, update all four. Run make check-agent-files-sync after editing to confirm sync — pre-commit enforces this on every commit that touches one of these files.
DOCUMENTATION IS REQUIRED, NOT OPTIONAL. Any PR that adds or changes a feature, CLI option, env var, plugin, message type, or service without updating the relevant docs is incomplete and will not be merged.
When making changes, update the appropriate documentation files using the table below. When adding a new tutorial, also add it to README.md's tutorial index. Any new file under docs/ must also be added to docs/index.yml (the Fern site index) — tools/check_docs_index.py enforces this in CI. If the change is internal-only and not user-facing (e.g. developer reference, internal mechanics, debugging notes), put the doc under docs/reference/ rather than skipping documentation.
| Change type | Files to update |
|---|---|
| Architecture, components, data flow, communication | docs/architecture.md |
| Coding standards, build commands, new patterns | AGENTS.md + CLAUDE.md + .github/copilot-instructions.md + .cursor/rules/python.mdc |
| Code patterns, examples, base classes | docs/dev/patterns.md |
| CLI arguments or commands | docs/cli-options.md (auto-generated via make generate-cli-docs) |
| Environment variables | docs/environment-variables.md (auto-generated via make generate-env-vars-docs) |
| Metrics definitions or formulas | docs/metrics-reference.md |
| Plugin system, categories, creation | docs/plugins/plugin-system.md |
| Accuracy benchmarks, graders | docs/accuracy/ |
| Server metrics, schemas | docs/server-metrics/ |
| Benchmark modes, timing, traces | docs/benchmark-modes/ |
| Tokenizer, reference docs | docs/reference/ |
| Dataset synthesis API | docs/api/synthesis.md |
| Dev setup, make targets, pre-commit | CONTRIBUTING.md |
| Contribution process, DCO | CONTRIBUTING.md |
| New services, message types, plugin types | docs/architecture.md + docs/dev/patterns.md |
| Tutorials and feature guides | docs/tutorials/ + README.md tutorial index |
A feature is incomplete until documentation is updated.