Skip to content

Latest commit

 

History

History
184 lines (147 loc) · 12.2 KB

File metadata and controls

184 lines (147 loc) · 12.2 KB

AIPerf

Python 3.10+ async AI benchmarking tool for measuring LLM inference server performance. 10 services communicate via ZMQ message bus.

Reference documentation:

Coding Standards

  • async/await for ALL I/O - no time.sleep, no blocking calls.
  • Field(description="...") on EVERY Pydantic field. Docstrings on dataclass fields.
  • Type hints on ALL functions (params and return).
  • KISS + DRY: minimal code, optimize for reader.
  • AIPerfBaseModel for data, BaseConfig for configuration. @dataclass(slots=True) for hot-path inner models created at high volume (e.g. SSE chunks, parsed responses) where Pydantic overhead matters. Use __pydantic_config__ = ConfigDict(extra="forbid") on dataclasses that participate in Pydantic union discrimination.
  • BaseComponentService for services, BaseService for SystemController only.
  • Message bus for inter-service communication - no shared mutable state.
  • CLI commands: one file per command in cli_commands/, lazily loaded via import strings in cli.py. See docs/dev/patterns.md.
  • YAML plugin registry for extensible features (src/aiperf/plugin/plugins.yaml).
  • Lambda for expensive logs: self.debug(lambda: f"{self._x()}"). Direct string for cheap ones.
  • Always orjson.loads(s), orjson.dumps(d) for JSON.
  • Reading filesystem paths: use aiperf.common.path_safety.safe_read_template_path instead of inline Path().read_text() / open().read(). Returns None on safety-check failure; caller picks the fallback semantic. See docs/dev/patterns.md § "Safe Filesystem Reads Pattern".
  • No Optional[X] or Union[X, Y] - use X | Y.
  • Comments only for "why?" not "what".
  • Enums are string-based - use MessageType.X directly, never .value.
  • Dependencies: always use uv (never pip) - uv add package, uv run pytest.
  • Use mermaid diagrams instead of ASCII art in markdown files.
  • Do not create markdown files to document code changes or decisions.
  • Do not over-comment code. Removing code is fine without adding comments to explain why.
  • No emojis in code or comments.
  • Hide a metric from the console table with console_group = MetricConsoleGroup.NONE; group it into a separate section with MetricConsoleGroup.{USAGE,CACHE,PREDICTION,AUDIO,REASONING}. Default is DEFAULT. See docs/metrics-reference.md "Metric Console Group Reference".

NaN/Inf Discipline

Numeric metric values crossing a serialization boundary or feeding a numerical algorithm must be finite or explicitly None. Use aiperf.common.finite (FiniteFloat, scrub_non_finite, nan_safe_mean/std, is_finite_value). Mechanical CI invariants in tests/unit/property/test_finite_invariants.py reject new violations and ratchet existing debt to zero via baseline files. See docs/dev/patterns.md § "NaN/Inf Discipline Pattern" and docs/dev/global-invariants.md for the full contract.

Build and Test Commands

make first-time-setup                                      # Initial environment setup
make install                                               # Install project + mock server
uv run pytest tests/unit/ -n auto                          # Unit tests (fast, isolated)
uv run pytest -m integration -n auto                       # Integration tests (real services, multiprocess)
uv run pytest -m component_integration -n auto             # Component integration tests (single process)
ruff format . && ruff check --fix .                        # Format and lint
make validate-plugin-schemas                               # Validate plugin registry
pre-commit run                                             # Pre-commit on staged files
pre-commit run --all-files                                 # Pre-commit on all files
make generate-all-docs                                     # Regenerate CLI + env var docs
make generate-all-plugin-files                             # Regenerate plugin enums, overloads, schemas

Pre-Commit Hooks

Run pre-commit after every code change, even before creating commits:

pre-commit run              # Staged files only
pre-commit run --all-files  # All files (recommended after significant changes)

Hooks: check-ast, debug-statements, detect-private-key, check-added-large-files, check-case-conflict, check-executables-have-shebangs, check-merge-conflict, check-json, check-toml, check-yaml, check-shebang-scripts-are-executable, end-of-file-fixer, mixed-line-ending, no-commit-to-branch, requirements-txt-fixer, trailing-whitespace, codespell, add-license, generate-cli-docs, generate-env-vars-docs, generate-plugin-artifacts, validate-plugin-schemas, test-imports, check-agent-files-sync, check-ergonomics, check-ruff-baselined, ruff, ruff-format.

Adding a New Service

  1. Create class extending BaseComponentService with @on_message handlers
  2. Register in src/aiperf/plugin/plugins.yaml under service category with class, description, metadata
  3. Add message type to common/enums/enums.py if new messages needed
  4. Create message class in messages/ with message_type field
  5. Validate with aiperf plugins --validate

Adding a New Message

  1. Add enum value to MessageType in common/enums/enums.py
  2. Create message class in messages/ inheriting from Message with message_type field set
  3. Add @on_message(MessageType.X) handler in the receiving service
  4. Auto-subscription happens during @on_init phase

Adding a New Plugin

  1. Create plugin class implementing the appropriate base
  2. Add entry to src/aiperf/plugin/plugins.yaml with class, description, metadata
  3. Validate with make validate-plugin-schemas
  4. Use via plugins.get_class(PluginType.X, 'name')

Adding a New CLI Flag

See docs/dev/patterns.md § "Adding a New CLI Flag". CLIConfig is flat; never add a nested config class.

Testing Conventions

  • @pytest.mark.asyncio for async tests, @pytest.mark.parametrize for data-driven
  • from tests.harness import mock_plugin for plugin mocking
  • Name: test_<function>_<scenario>_<expected> e.g. test_parse_config_missing_field_raises_error
  • Imports at file top, fixtures for setup, one focus per test
  • Use from pytest import param and put # fmt: skip on the ) line:
    @pytest.mark.parametrize(
        "arg",
        [
            param(..., id="case1"),
            param(..., id="case2"),
        ],
    )  # fmt: skip
  • Auto-fixtures (always active): asyncio.sleep runs instantly, RNG=42, singletons reset between tests

Git Workflow

Feature branches use <username>/feature-name format, forked from main. One PR = one concern.

Tips

  • SystemController uses BaseService (not BaseComponentService) - it's the orchestrator.
  • Worker/TimingManager disable GC for latency - see service_metadata.disable_gc.
  • macOS child processes close terminal FDs to prevent Textual UI corruption.
  • Plugin priority resolves conflicts: higher wins, external beats built-in at equal priority.
  • Decorators: @on_init, @on_start, @on_stop, @on_message, @on_command, @background_task, @on_pull_message, @on_request.
  • Communication: publish() for broadcast, @on_message to subscribe, send_command_and_wait_for_response() for sync.
  • AIPerfLifecycleMixin for standalone components: CREATED -> INITIALIZING -> INITIALIZED -> STARTING -> RUNNING -> STOPPING -> STOPPED; FAILED terminal.
  • dag_jsonl input type: conversation DAG benchmarks (fork + spawn modes). See docs/benchmark-modes/dag.md for abstractions and authoring.
  • Validator gate convention: unsupported constructs raise NotImplementedError with a leading "<loc>: <reason>" prefix where <loc> identifies the conversation/turn (e.g. "conversation 'foo' turn 3: ..."). New validators must follow this shape.
  • Per-turn payload contract: extra_body / max_tokens / model are dispatch-turn only; raw_tools is the lone field that walks history (system-prompt-like). Dataset rows author extra, not extra_body. See docs/dev/patterns.md "Per-turn dataset extra".

Plot Envelope Section

AIPerfConfig.plot: PlotEnvelopeConfig | None lets a single AIPerf YAML own its visualization. Two forms: bare-string path (plot: ./plots/baseline.yaml, resolved relative to the AIPerf YAML's dir) or inline dict mirroring src/aiperf/plot/default_plot_config.yaml. When set, ~/.aiperf/plot_config.yaml is ignored and artifacts.auto_plot flips to True (unless explicitly False). The auto-plot callback materializes the resolved envelope to <artifact_dir>/.aiperf-plot-config.yaml so aiperf plot <dir> reproduces. See src/aiperf/config/plot.py for the Pydantic models.

Pre-Commit Checklist

  1. Review diff: all lines required?
  2. ruff format . && ruff check --fix .
  3. uv run pytest tests/unit/ -n auto
  4. uv run pytest tests/unit/property/ -n auto (mechanical NaN/inf and field-validator invariants)
  5. Type hints on all functions
  6. Field(description=...) on all Pydantic fields
  7. git commit -s

Four-File Sync Rule

AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, and .cursor/rules/python.mdc must contain identical content (only headers/frontmatter differ). When updating one, update all four. Run make check-agent-files-sync after editing to confirm sync — pre-commit enforces this on every commit that touches one of these files.

Documentation Updates

DOCUMENTATION IS REQUIRED, NOT OPTIONAL. Any PR that adds or changes a feature, CLI option, env var, plugin, message type, or service without updating the relevant docs is incomplete and will not be merged.

When making changes, update the appropriate documentation files using the table below. When adding a new tutorial, also add it to README.md's tutorial index. Any new file under docs/ must also be added to docs/index.yml (the Fern site index) — tools/check_docs_index.py enforces this in CI. If the change is internal-only and not user-facing (e.g. developer reference, internal mechanics, debugging notes), put the doc under docs/reference/ rather than skipping documentation.

Change type Files to update
Architecture, components, data flow, communication docs/architecture.md
Coding standards, build commands, new patterns AGENTS.md + CLAUDE.md + .github/copilot-instructions.md + .cursor/rules/python.mdc
Code patterns, examples, base classes docs/dev/patterns.md
CLI arguments or commands docs/cli-options.md (auto-generated via make generate-cli-docs)
Environment variables docs/environment-variables.md (auto-generated via make generate-env-vars-docs)
Metrics definitions or formulas docs/metrics-reference.md
Plugin system, categories, creation docs/plugins/plugin-system.md
Accuracy benchmarks, graders docs/accuracy/
Server metrics, schemas docs/server-metrics/
Benchmark modes, timing, traces docs/benchmark-modes/
Tokenizer, reference docs docs/reference/
Dataset synthesis API docs/api/synthesis.md
Dev setup, make targets, pre-commit CONTRIBUTING.md
Contribution process, DCO CONTRIBUTING.md
New services, message types, plugin types docs/architecture.md + docs/dev/patterns.md
Tutorials and feature guides docs/tutorials/ + README.md tutorial index

A feature is incomplete until documentation is updated.