Skip to content

Update DPC log parser to consume the centralized "HTTP call done" log line #1547

Description

@rubambiza

Component: Harnesses

Desired use case or feature

The FMA timing harness parses Dual-Pods Controller (DPC) logs to report per-request actuation latency (hot / warm / cold-launcher). This parser shipped to main in #1504. It derives those timings by reading a httpCallStartTime field off three specific DPC milestone log messages — the wake call, the instance-create call, and the readiness relay.

Upstream PR llm-d-incubation/llm-d-fast-model-actuation#586 ("Centralize HTTP call logging and metrics") changes how the DPC logs HTTP calls: instead of stamping a start time onto each individual milestone message, it emits a single centralized "HTTP call done" log line for every HTTP call, carrying the start time, a purpose discriminator, latency, and status. The per-message start-time fields the parser relies on are removed.

As a benchmark maintainer, I want the DPC log parser updated to consume this centralized log line, because once #586 (or its successor) lands, the parser on main will stop producing DPC-derived timings — silently falling back to the coarser Kubernetes-timestamp upper bounds, with the sub-second wake fidelity quietly going blank. This is a compatibility break against a feature that is already shipped, not a hypothetical one.

Proposed solution

Update the parser to treat the centralized "HTTP call done" line as the source of HTTP-call timing, correlating each line to a request via the requester identity that the DPC already attaches to its log context, and distinguishing the actuation paths via the line's purpose value:

  • the wake call provides the hot-start anchor,
  • the instance-create call provides the warm-start anchor,
  • the readiness-relay call provides the shared end time used by all paths.

The cold-launcher anchor is unaffected (it comes from a Kubernetes API call, not an HTTP call, and is not part of the centralization), so that path should continue to work unchanged.

Alongside the parser change: refresh the module's documented assumptions (the relevant log verbosity and the fact that timing now comes from the centralized line rather than per-message fields), and add test coverage that locks in the new contract — including that the requester identity is present on the centralized line, since that correlation key is the linchpin and is currently provided implicitly by the logging context.

Alternatives

  1. Ask upstream to preserve the per-message start-time fields (raised via review comments on PR CLI for config explorer #586). Rejected as a solution to this issue: even if accepted, it only delays the work — centralizing HTTP-call logging is the direction the DPC is moving, so the parser will eventually have to consume the centralized line regardless. At best this buys time; it does not remove the need for this change.
  2. Keep relying on Kubernetes-timestamp upper bounds only (drop DPC-derived timing). Rejected: the whole value of the DPC parser is that it removes the multi-second inflation introduced by the kubelet readiness-probe interval; falling back loses sub-second wake fidelity.
  3. Pin the benchmark to a pre-CLI for config explorer #586 DPC build. Rejected: not sustainable; blocks picking up other DPC improvements and only defers the work.

Additional context

(AI-assisted issue draft.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions