feat(datalayer): add metricsPort parameter to metrics-data-source plugin by janghyukjin · Pull Request #2762 · kubernetes-sigs/gateway-api-inference-extension

janghyukjin · 2026-04-02T09:17:53Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

Adds metricsPort to the metrics-data-source plugin parameters so that EPP can scrape metrics from a port different from the inference port.

The metrics-data-source plugin currently derives the scrape target from MetricsHost, which encodes the inference port. This breaks in deployments where the model server exposes metrics on a dedicated port (separate from inference), or where network/mesh policy makes the inference port unsuitable for direct scraping.

--model-server-metrics-port previously covered this use case but was removed in #2441 without an equivalent EndpointPickerConfig parameter, leaving a gap. This PR fills that gap.

When metricsPort is 0 (default), behavior is unchanged — the port from MetricsHost is used as-is.
Invalid metricsPort values (outside 1–65535) are rejected at construction time.

Motivating example: In an Istio mTLS STRICT environment, EPP's direct pod-IP scraping hits PassthroughCluster (no mTLS upgrade → plain HTTP), and the receiving sidecar rejects it with 503. A model server serving metrics on a separate port excluded from sidecar interception resolves this without weakening mTLS on the inference port.

Which issue(s) this PR fixes:

Related to #1396

Does this PR introduce a user-facing change?:

  Add `metricsPort` parameter to the `metrics-data-source` plugin. When set, EPP scrapes metrics from the specified port instead of the inference port encoded in `MetricsHost`. Defaults to 0 (existing behavior unchanged).

netlify · 2026-04-02T09:18:00Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`ba507e6`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69ce4f5f6263db00079382e9
😎 Deploy Preview	https://deploy-preview-2762--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

linux-foundation-easycla · 2026-04-02T09:18:03Z

The committers listed above are authorized under a signed CLA.

✅ login: janghyukjin / name: hyukjin.jang (ba507e6)

k8s-ci-robot · 2026-04-02T09:18:04Z

Welcome @janghyukjin!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2026-04-02T09:18:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: janghyukjin
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-04-02T09:18:05Z

Hi @janghyukjin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Adds MetricsPort to metricsDatasourceParams so the metrics-data-source plugin can scrape a port different from the inference port. When metricsPort is non-zero in EndpointPickerConfig plugin parameters, it overrides the port derived from the endpoint's MetricsHost; zero (default) preserves existing behavior. This restores functionality that was available via --model-server-metrics-port but lost when that flag was removed in kubernetes-sigs#2441 without an equivalent plugin config parameter.

elevran · 2026-04-02T13:16:24Z

@janghyukjin thanks for the PR. The metrics port was intentionally dropped from CLI and configuration. See #1886.
Is there a specific use case you're trying to address where inference and metrics can not be run on the same port?

janghyukjin · 2026-04-02T14:31:21Z

@elevran Thanks for the context! There are practical scenarios where separating inference and metrics ports is necessary.

A common case is model servers that expose metrics on a dedicated port, where metrics are intentionally decoupled from inference.

In Istio mTLS STRICT mode, EPP's direct pod-IP scraping bypasses Service-level configuration and results in passthrough traffic without mTLS upgrade, leading to failed scrapes. Using a separate metrics port allows operators to control interception or policy independently without affecting inference.

This PR does not change default behavior and only restores the ability to configure a separate metrics port, which was previously supported via --model-server-metrics-port but is currently not expressible via EndpointPickerConfig.

Keeping the configuration scoped to the metrics-data-source plugin rather than InferencePool.spec, as this is an EPP scraping concern rather than a pool-level attribute. Open to feedback if the preferred direction differs.

elevran · 2026-04-02T15:42:03Z

The main concern I have is how this would work for multi-port inference pool. I presume the use case and reasoning would include those as well. How should that be handled?

janghyukjin · 2026-04-02T16:01:36Z

@elevran Thanks for raising this. My assumption in this PR is that the metrics endpoint is pod-level rather than serving-port-level.

Even with a multi-port InferencePool, the common case is that a backend pod exposes a single /metrics endpoint for the whole process, regardless of how many serving ports it has. In that case, a single plugin-level metricsPort remains sufficient, because it overrides the scrape port for the pod's metrics endpoint rather than mapping to individual serving ports.

If metricsPort is not set, the existing behavior remains unchanged and the port derived from MetricsHost is used as before.

If there are use cases where metrics need to be differentiated per serving port, that would likely be a separate discussion at a different scope, potentially in InferencePool.spec. I was aiming to keep this PR scoped to the EPP scraping configuration, where a single process-level metrics port is the common case.

Thanks!

elevran · 2026-04-02T18:33:23Z

Even with a multi-port InferencePool, the common case is that a backend pod exposes a single /metrics endpoint for the whole process, regardless of how many serving ports it has. In that case, a single plugin-level metricsPort remains sufficient, because it overrides the scrape port for the pod's metrics endpoint rather than mapping to individual serving ports.

I don't think this is true of vLLM with data parallel > 1.
Note that the existing code explicitly schedules to a specific serving port based on its metrics. There is no code that takes a single /metrics output and can split it by serving port for subsequent scheduling decisions.
EPP creates a different "pod" (or more correctly an Endpoint) for each port defined in the inference pool. See datastore.podUpdateOrAddIfNotExist

I understand (and agree) with the use cases. But I'd request that we address this across the board and not as a special case.
I would argue that the InferencePool should be extended to support additional port groups

serving over HTTP
serving over grpc (if different than above - AFAIK, the model servers do not currently support OpenAI API and grpc on the same port HTTP. For example, vllm has vllm.entrypoints.openai.api_server (typically port 8000) and vllm.entrypoints.grpc_server (default port 50051))
metrics (if different than serving ports)

@ahg-g @kfswain @nirrozenbaum @danehans

k8s-ci-robot · 2026-04-03T01:58:02Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

janghyukjin · 2026-04-03T02:08:48Z

@elevran Thanks for the detailed explanation — that makes sense. :)

You're right that EPP already treats each serving port as a distinct endpoint and makes scheduling decisions based on per-port metrics. In that context, a single metrics port at the pod level is not always sufficient, especially for vLLM with data parallelism.

The use case I had in mind (Istio mTLS STRICT where EPP's pod-IP scraping bypasses mTLS) still exists, but I agree it's better addressed as part of the InferencePool port group design rather than a plugin-level workaround.

Happy to contribute toward that direction if useful — whether that's drafting a design proposal or working on an incremental PR.

Thanks again!

elevran · 2026-04-06T07:07:32Z

@janghyukjin perhaps resurrect #1396? There was no interest at the time hence closed due to lifecycle. Maybe your use case will resurrect the issue/discussion?

janghyukjin · 2026-04-06T13:29:33Z

@elevran That makes sense, thanks for the suggestion!

Happy to revive #1396 and continue the discussion there. I can summarize the use case (separate metrics port, e.g. for environments like Istio mTLS STRICT) and how it relates to the current gaps in InferencePool port role specification.

I'll go ahead and add context to #1396 to restart the discussion, unless you'd prefer a different approach.

k8s-ci-robot requested a review from liu-cong April 2, 2026 09:18

k8s-ci-robot requested a review from nirrozenbaum April 2, 2026 09:18

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 2, 2026

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 2, 2026

janghyukjin force-pushed the feat/metrics-datasource-port branch 3 times, most recently from a88f1c3 to c99f3f8 Compare April 2, 2026 10:57

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 2, 2026

janghyukjin force-pushed the feat/metrics-datasource-port branch from c99f3f8 to ba507e6 Compare April 2, 2026 11:13

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2026

janghyukjin mentioned this pull request Apr 6, 2026

InferencePool port specification (e.g., serving, metrics and health) #1396

Closed

Conversation

janghyukjin commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

linux-foundation-easycla bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Apr 2, 2026

Uh oh!

k8s-ci-robot commented Apr 2, 2026

Uh oh!

k8s-ci-robot commented Apr 2, 2026

Uh oh!

elevran commented Apr 2, 2026

Uh oh!

janghyukjin commented Apr 2, 2026

Uh oh!

elevran commented Apr 2, 2026

Uh oh!

janghyukjin commented Apr 2, 2026

Uh oh!

elevran commented Apr 2, 2026

Uh oh!

k8s-ci-robot commented Apr 3, 2026

Uh oh!

janghyukjin commented Apr 3, 2026

Uh oh!

elevran commented Apr 6, 2026

Uh oh!

janghyukjin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janghyukjin commented Apr 2, 2026 •

edited

Loading

netlify bot commented Apr 2, 2026 •

edited

Loading

linux-foundation-easycla bot commented Apr 2, 2026 •

edited

Loading