refactor(agent): Pin container memory to NUMA-local node via CpusetMems#11222
refactor(agent): Pin container memory to NUMA-local node via CpusetMems#11222
Conversation
Set `CpusetMems` in Docker HostConfig when all allocated CPU cores reside on a single NUMA node, so container memory is pinned to the same node as its CPUs. The key is omitted when the allocation spans multiple nodes or when NUMA info is unavailable on the host. Closes #11217 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds NUMA-aware memory pinning for CPU-only Docker containers by setting HostConfig.CpusetMems when the allocated CPU cores are all on the same NUMA node, and omitting it otherwise.
Changes:
- Build
HostConfiginCPUPlugin.generate_docker_args()and conditionally setCpusetMemswhen allocation is NUMA-local. - Add a changelog entry for the enhancement.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/ai/backend/agent/docker/intrinsic.py |
Detects allocated cores’ NUMA node(s) and sets HostConfig.CpusetMems only for single-node allocations. |
changes/11217.enhance.md |
Documents the NUMA-local CpusetMems behavior change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| devices = await self.list_devices() | ||
| core_to_node = {int(dev.device_id): dev.numa_node for dev in devices} | ||
| allocated_nodes: set[int] = set() | ||
| for core in cores: | ||
| node = core_to_node.get(core) |
There was a problem hiding this comment.
generate_docker_args() now calls await self.list_devices() just to map allocated core IDs to NUMA nodes. list_devices() iterates all available cores and calls libnuma.node_of_cpu() for each, so this adds per-container startup overhead that scales with host CPU count. Consider computing the NUMA node only for the allocated cores (e.g., call libnuma.node_of_cpu(core) inside the loop), or cache a core_id -> numa_node mapping on the plugin instance and reuse it.
| devices = await self.list_devices() | |
| core_to_node = {int(dev.device_id): dev.numa_node for dev in devices} | |
| allocated_nodes: set[int] = set() | |
| for core in cores: | |
| node = core_to_node.get(core) | |
| allocated_nodes: set[int] = set() | |
| for core in cores: | |
| node = libnuma.node_of_cpu(core) |
| # Pin memory to the NUMA node only when the allocation is fully node-local; | ||
| # Docker does not expose a multi-node cpuset.mems equivalent via HostConfig. |
There was a problem hiding this comment.
The comment says Docker does not expose a multi-node cpuset.mems equivalent via HostConfig, but HostConfig.CpusetMems accepts the same cpuset list/range syntax as cpuset.mems (e.g., "0,1"). If the intent is to only pin when node-local, reword this to reflect that it’s a deliberate policy choice (and why), not a Docker API limitation.
| # Pin memory to the NUMA node only when the allocation is fully node-local; | |
| # Docker does not expose a multi-node cpuset.mems equivalent via HostConfig. | |
| # Pin memory only when the CPU allocation is fully node-local. | |
| # For multi-node CPU allocations, intentionally leave CpusetMems unset | |
| # so Docker/kernel default NUMA memory placement policy can apply. |
| host_config: dict[str, Any] = { | ||
| "Cpus": len(cores), | ||
| "CpusetCpus": ",".join(sorted_core_ids), | ||
| } | ||
| devices = await self.list_devices() | ||
| core_to_node = {int(dev.device_id): dev.numa_node for dev in devices} | ||
| allocated_nodes: set[int] = set() | ||
| for core in cores: | ||
| node = core_to_node.get(core) | ||
| if node is None or node < 0: | ||
| allocated_nodes.clear() | ||
| break | ||
| allocated_nodes.add(node) | ||
| # Pin memory to the NUMA node only when the allocation is fully node-local; | ||
| # Docker does not expose a multi-node cpuset.mems equivalent via HostConfig. | ||
| if len(allocated_nodes) == 1: | ||
| host_config["CpusetMems"] = str(next(iter(allocated_nodes))) | ||
| return { | ||
| "HostConfig": { | ||
| "Cpus": len(cores), | ||
| "CpusetCpus": ",".join(sorted_core_ids), | ||
| # 'CpusetMems': f'{resource_spec.numa_node}', | ||
| }, | ||
| "HostConfig": host_config, | ||
| } |
There was a problem hiding this comment.
This introduces new branching behavior for Docker HostConfig generation (set/omit CpusetMems based on NUMA locality), but there are no unit tests covering it. Add tests that mock CPUPlugin.list_devices() to verify: (1) single-node allocation sets HostConfig.CpusetMems to the node ID, (2) multi-node allocation omits it, and (3) unknown/negative node IDs omit it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds unit coverage for generate_docker_args() setting HostConfig.CpusetMems only when the CPU allocation is fully within a single NUMA node, and omitting the key for multi-node or unknown/negative NUMA mappings. Refs #11217 Refs #11222 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Query libnuma.node_of_cpu() only for allocated cores instead of calling self.list_devices(), which iterates every core on the host on each container start. Drops per-start overhead from O(host_cores) to O(allocated_cores). - Reword the CpusetMems comment to reflect that omission for multi-node allocations is a deliberate policy choice (defer to kernel default NUMA placement) rather than a Docker HostConfig API limitation. - Update the NUMA-locality unit tests to patch libnuma.node_of_cpu directly, matching the new implementation seam. Refs #11217 Refs #11222 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mbstone Addresses review feedback on PR #11222: - Skip CpusetMems entirely when libnuma reports NUMA unsupported or the host has a single node, so non-NUMA Linux hosts and macOS/WSL dev environments do not get CpusetMems="0" pinned on every container. - Add a unit test covering the non-NUMA short-circuit branch. - Remove the twin commented-out CpusetMems tombstone from src/ai/backend/agent/dummy/intrinsic.py. Refs #11217 Refs #11222 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review summary from an independent pass + resolutions pushed to this branch. Verdict: ship with changes — addressed.
Scope stayed tight: only |
- Remove unused local_config + _docker mock from the NUMA-locality cpu_plugin fixture: CPUPlugin.generate_docker_args doesn't read them. - Replace the case_id parametrize column with pytest's `ids=` kwarg; pytest already labels failures via the parametrize id. Refs #11217 Refs #11222 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # Skip CpusetMems entirely when NUMA is unsupported (non-Linux hosts, | ||
| # Linux without libnuma.so, Docker Desktop, WSL, etc.) or when the host | ||
| # exposes a single node; libnuma.node_of_cpu would otherwise fall back | ||
| # to 0 and cause every container to be pinned to "CpusetMems": "0". | ||
| if libnuma.num_nodes() > 1: | ||
| allocated_nodes: set[int] = set() | ||
| for core in cores: | ||
| node = libnuma.node_of_cpu(core) | ||
| if node < 0: | ||
| allocated_nodes.clear() | ||
| break | ||
| allocated_nodes.add(node) | ||
| # Pin memory only when the CPU allocation is fully node-local. | ||
| # For multi-node CPU allocations, intentionally leave CpusetMems unset | ||
| # so Docker/kernel default NUMA memory placement policy can apply. | ||
| if len(allocated_nodes) == 1: | ||
| host_config["CpusetMems"] = str(next(iter(allocated_nodes))) |
There was a problem hiding this comment.
How about extracting this logic into a separate private method?
|
|
||
| @staticmethod | ||
| @contextmanager | ||
| def _patch_node_of_cpu( |
There was a problem hiding this comment.
Since this method is intended to reproduce specific test scenarios, it would be more readable to inject fixtures for each scenario and apply parametrization.
Closes #11217
Refs #11216
Summary
HostConfig.CpusetMemsto the NUMA node ID when all allocated CPU cores are on the same node.CpusetMemsfor multi-node allocations so the kernel's default NUMA memory placement policy applies.libnuma.num_nodes() <= 1so non-NUMA Linux hosts and macOS / WSL / Docker Desktop don't getCpusetMems="0"pinned on every container.libnuma.node_of_cpu()instead of a host-widelist_devices()scan.Why
Pre-PR the NUMA memory pinning code was commented out, so CPU-only workloads on NUMA hosts could pay 2-3× remote-memory access latency. Part of the agent-runtime performance epic #11216.
Test plan
pants check src/ai/backend/agent/docker::andpants test tests/unit/agent::passHostConfig.CpusetMemsreflects the allocation nodeCpusetMemsis absent