AIPerf

Python 3.10+ async AI benchmarking tool for measuring LLM inference server performance. 10 services communicate via ZMQ message bus.

Reference documentation:

docs/architecture.md - Three-plane architecture, core components, credit system, data flow, communication patterns
docs/dev/patterns.md - Code examples for CLI commands, services, models, messages, plugins, error handling, logging, testing
docs/cli-options.md - Complete CLI command and option reference
docs/environment-variables.md - All AIPERF_* environment variables by subsystem
docs/metrics-reference.md - Metric definitions, formulas, and requirements
docs/plugins/plugin-system.md - Plugin architecture, categories, creation guide
CONTRIBUTING.md - Development setup, available commands, pre-commit hooks, DCO

Coding Standards

async/await for ALL I/O - no time.sleep, no blocking calls.
Field(description="...") on EVERY Pydantic field. Docstrings on dataclass fields.
Type hints on ALL functions (params and return).
KISS + DRY: minimal code, optimize for reader.
AIPerfBaseModel for data, BaseConfig for configuration. @dataclass(slots=True) for hot-path inner models created at high volume (e.g. SSE chunks, parsed responses) where Pydantic overhead matters. Use __pydantic_config__ = ConfigDict(extra="forbid") on dataclasses that participate in Pydantic union discrimination.
BaseComponentService for services, BaseService for SystemController only.
Message bus for inter-service communication - no shared mutable state.
CLI commands: one file per command in cli_commands/, lazily loaded via import strings in cli.py. See docs/dev/patterns.md.
YAML plugin registry for extensible features (src/aiperf/plugin/plugins.yaml).
Lambda for expensive logs: self.debug(lambda: f"{self._x()}"). Direct string for cheap ones.
Always orjson.loads(s), orjson.dumps(d) for JSON.
Reading filesystem paths: use aiperf.common.path_safety.safe_read_template_path instead of inline Path().read_text() / open().read(). Returns None on safety-check failure; caller picks the fallback semantic. See docs/dev/patterns.md § "Safe Filesystem Reads Pattern".
No Optional[X] or Union[X, Y] - use X | Y.
Comments only for "why?" not "what".
Enums are string-based - use MessageType.X directly, never .value.
Dependencies: always use uv (never pip) - uv add package, uv run pytest.
Use mermaid diagrams instead of ASCII art in markdown files.
Do not create markdown files to document code changes or decisions.
Do not over-comment code. Removing code is fine without adding comments to explain why.
No emojis in code or comments.
Hide a metric from the console table with console_group = MetricConsoleGroup.NONE; group it into a separate section with MetricConsoleGroup.{USAGE,CACHE,PREDICTION,AUDIO,REASONING}. Default is DEFAULT. See docs/metrics-reference.md "Metric Console Group Reference".

NaN/Inf Discipline

Numeric metric values crossing a serialization boundary or feeding a numerical algorithm must be finite or explicitly None. Use aiperf.common.finite (FiniteFloat, scrub_non_finite, nan_safe_mean/std, is_finite_value). Mechanical CI invariants in tests/unit/property/test_finite_invariants.py reject new violations and ratchet existing debt to zero via baseline files. See docs/dev/patterns.md § "NaN/Inf Discipline Pattern" and docs/dev/global-invariants.md for the full contract.

Build and Test Commands

make first-time-setup                                      # Initial environment setup
make install                                               # Install project + mock server
uv run pytest tests/unit/ -n auto                          # Unit tests (fast, isolated)
uv run pytest -m integration -n auto                       # Integration tests (real services, multiprocess)
uv run pytest -m component_integration -n auto             # Component integration tests (single process)
ruff format . && ruff check --fix .                        # Format and lint
make validate-plugin-schemas                               # Validate plugin registry
pre-commit run                                             # Pre-commit on staged files
pre-commit run --all-files                                 # Pre-commit on all files
make generate-all-docs                                     # Regenerate CLI + env var docs
make generate-all-plugin-files                             # Regenerate plugin enums, overloads, schemas

Pre-Commit Hooks

Run pre-commit after every code change, even before creating commits:

pre-commit run              # Staged files only
pre-commit run --all-files  # All files (recommended after significant changes)

Hooks: check-ast, debug-statements, detect-private-key, check-added-large-files, check-case-conflict, check-executables-have-shebangs, check-merge-conflict, check-json, check-toml, check-yaml, check-shebang-scripts-are-executable, end-of-file-fixer, mixed-line-ending, no-commit-to-branch, requirements-txt-fixer, trailing-whitespace, codespell, add-license, generate-cli-docs, generate-env-vars-docs, generate-plugin-artifacts, validate-plugin-schemas, test-imports, check-agent-files-sync, check-ergonomics, check-ruff-baselined, ruff, ruff-format.

Adding a New Service

Create class extending BaseComponentService with @on_message handlers
Register in src/aiperf/plugin/plugins.yaml under service category with class, description, metadata
Add message type to common/enums/enums.py if new messages needed
Create message class in messages/ with message_type field
Validate with aiperf plugins --validate

Adding a New Message

Add enum value to MessageType in common/enums/enums.py
Create message class in messages/ inheriting from Message with message_type field set
Add @on_message(MessageType.X) handler in the receiving service
Auto-subscription happens during @on_init phase

Adding a New Plugin

Create plugin class implementing the appropriate base
Add entry to src/aiperf/plugin/plugins.yaml with class, description, metadata
Validate with make validate-plugin-schemas
Use via plugins.get_class(PluginType.X, 'name')

Adding a New CLI Flag

See docs/dev/patterns.md § "Adding a New CLI Flag". CLIConfig is flat; never add a nested config class.

Testing Conventions

@pytest.mark.asyncio for async tests, @pytest.mark.parametrize for data-driven
from tests.harness import mock_plugin for plugin mocking
Name: test_<function>_<scenario>_<expected> e.g. test_parse_config_missing_field_raises_error
Imports at file top, fixtures for setup, one focus per test

Use from pytest import param and put # fmt: skip on the ) line:

@pytest.mark.parametrize(
    "arg",
    [
        param(..., id="case1"),
        param(..., id="case2"),
    ],
)  # fmt: skip

Auto-fixtures (always active): asyncio.sleep runs instantly, RNG=42, singletons reset between tests

Git Workflow

Feature branches use <username>/feature-name format, forked from main. One PR = one concern.

Tips

SystemController uses BaseService (not BaseComponentService) - it's the orchestrator.
Worker/TimingManager disable GC for latency - see service_metadata.disable_gc.
macOS child processes close terminal FDs to prevent Textual UI corruption.
Plugin priority resolves conflicts: higher wins, external beats built-in at equal priority.
Decorators: @on_init, @on_start, @on_stop, @on_message, @on_command, @background_task, @on_pull_message, @on_request.
Communication: publish() for broadcast, @on_message to subscribe, send_command_and_wait_for_response() for sync.
AIPerfLifecycleMixin for standalone components: CREATED -> INITIALIZING -> INITIALIZED -> STARTING -> RUNNING -> STOPPING -> STOPPED; FAILED terminal.
dag_jsonl input type: conversation DAG benchmarks (fork + spawn modes). See docs/benchmark-modes/dag.md for abstractions and authoring.
Validator gate convention: unsupported constructs raise NotImplementedError with a leading "<loc>: <reason>" prefix where <loc> identifies the conversation/turn (e.g. "conversation 'foo' turn 3: ..."). New validators must follow this shape.
Per-turn payload contract: extra_body / max_tokens / model are dispatch-turn only; raw_tools is the lone field that walks history (system-prompt-like). Dataset rows author extra, not extra_body. See docs/dev/patterns.md "Per-turn dataset extra".

Plot Envelope Section

AIPerfConfig.plot: PlotEnvelopeConfig | None lets a single AIPerf YAML own its visualization. Two forms: bare-string path (plot: ./plots/baseline.yaml, resolved relative to the AIPerf YAML's dir) or inline dict mirroring src/aiperf/plot/default_plot_config.yaml. When set, ~/.aiperf/plot_config.yaml is ignored and artifacts.auto_plot flips to True (unless explicitly False). The auto-plot callback materializes the resolved envelope to <artifact_dir>/.aiperf-plot-config.yaml so aiperf plot <dir> reproduces. See src/aiperf/config/plot.py for the Pydantic models.

Pre-Commit Checklist

Review diff: all lines required?
ruff format . && ruff check --fix .
uv run pytest tests/unit/ -n auto
uv run pytest tests/unit/property/ -n auto (mechanical NaN/inf and field-validator invariants)
Type hints on all functions
Field(description=...) on all Pydantic fields
git commit -s

Four-File Sync Rule

AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, and .cursor/rules/python.mdc must contain identical content (only headers/frontmatter differ). When updating one, update all four. Run make check-agent-files-sync after editing to confirm sync — pre-commit enforces this on every commit that touches one of these files.

Documentation Updates

DOCUMENTATION IS REQUIRED, NOT OPTIONAL. Any PR that adds or changes a feature, CLI option, env var, plugin, message type, or service without updating the relevant docs is incomplete and will not be merged.

When making changes, update the appropriate documentation files using the table below. When adding a new tutorial, also add it to README.md's tutorial index. Any new file under docs/ must also be added to docs/index.yml (the Fern site index) — tools/check_docs_index.py enforces this in CI. If the change is internal-only and not user-facing (e.g. developer reference, internal mechanics, debugging notes), put the doc under docs/reference/ rather than skipping documentation.

Change type	Files to update
Architecture, components, data flow, communication	`docs/architecture.md`
Coding standards, build commands, new patterns	`AGENTS.md` + `CLAUDE.md` + `.github/copilot-instructions.md` + `.cursor/rules/python.mdc`
Code patterns, examples, base classes	`docs/dev/patterns.md`
CLI arguments or commands	`docs/cli-options.md` (auto-generated via `make generate-cli-docs`)
Environment variables	`docs/environment-variables.md` (auto-generated via `make generate-env-vars-docs`)
Metrics definitions or formulas	`docs/metrics-reference.md`
Plugin system, categories, creation	`docs/plugins/plugin-system.md`
Accuracy benchmarks, graders	`docs/accuracy/`
Server metrics, schemas	`docs/server-metrics/`
Benchmark modes, timing, traces	`docs/benchmark-modes/`
Tokenizer, reference docs	`docs/reference/`
Dataset synthesis API	`docs/api/synthesis.md`
Dev setup, make targets, pre-commit	`CONTRIBUTING.md`
Contribution process, DCO	`CONTRIBUTING.md`
New services, message types, plugin types	`docs/architecture.md` + `docs/dev/patterns.md`
Tutorials and feature guides	`docs/tutorials/` + `README.md` tutorial index

A feature is incomplete until documentation is updated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AIPerf

Coding Standards

NaN/Inf Discipline

Build and Test Commands

Pre-Commit Hooks

Adding a New Service

Adding a New Message

Adding a New Plugin

Adding a New CLI Flag

Testing Conventions

Git Workflow

Tips

Plot Envelope Section

Pre-Commit Checklist

Four-File Sync Rule

Documentation Updates

Uh oh!

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AIPerf

Coding Standards

NaN/Inf Discipline

Build and Test Commands

Pre-Commit Hooks

Adding a New Service

Adding a New Message

Adding a New Plugin

Adding a New CLI Flag

Testing Conventions

Git Workflow

Tips

Plot Envelope Section

Pre-Commit Checklist

Four-File Sync Rule

Documentation Updates