Skip to content

feat(datalayer): add metricsPort parameter to metrics-data-source plugin#2762

Open
janghyukjin wants to merge 1 commit intokubernetes-sigs:mainfrom
janghyukjin:feat/metrics-datasource-port
Open

feat(datalayer): add metricsPort parameter to metrics-data-source plugin#2762
janghyukjin wants to merge 1 commit intokubernetes-sigs:mainfrom
janghyukjin:feat/metrics-datasource-port

Conversation

@janghyukjin
Copy link
Copy Markdown

@janghyukjin janghyukjin commented Apr 2, 2026

What type of PR is this?
/kind feature

What this PR does / why we need it:

Adds metricsPort to the metrics-data-source plugin parameters so that EPP can scrape metrics from a port different from the inference port.

The metrics-data-source plugin currently derives the scrape target from MetricsHost, which encodes the inference port. This breaks in deployments where the model server exposes metrics on a dedicated port (separate from inference), or where network/mesh policy makes the inference port unsuitable for direct scraping.

--model-server-metrics-port previously covered this use case but was removed in #2441 without an equivalent EndpointPickerConfig parameter, leaving a gap. This PR fills that gap.

When metricsPort is 0 (default), behavior is unchanged — the port from MetricsHost is used as-is.
Invalid metricsPort values (outside 1–65535) are rejected at construction time.

Motivating example: In an Istio mTLS STRICT environment, EPP's direct pod-IP scraping hits PassthroughCluster (no mTLS upgrade → plain HTTP), and the receiving sidecar rejects it with 503. A model server serving metrics on a separate port excluded from sidecar interception resolves this without weakening mTLS on the inference port.

Which issue(s) this PR fixes:

Related to #1396

Does this PR introduce a user-facing change?:

  Add `metricsPort` parameter to the `metrics-data-source` plugin. When set, EPP scrapes metrics from the specified port instead of the inference port encoded in `MetricsHost`. Defaults to 0 (existing behavior unchanged). 

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 2, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit ba507e6
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69ce4f5f6263db00079382e9
😎 Deploy Preview https://deploy-preview-2762--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Apr 2, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: janghyukjin / name: hyukjin.jang (ba507e6)

@k8s-ci-robot k8s-ci-robot requested a review from liu-cong April 2, 2026 09:18
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @janghyukjin!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: janghyukjin
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 2, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @janghyukjin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 2, 2026
@janghyukjin janghyukjin force-pushed the feat/metrics-datasource-port branch 3 times, most recently from a88f1c3 to c99f3f8 Compare April 2, 2026 10:57
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 2, 2026
Adds MetricsPort to metricsDatasourceParams so the metrics-data-source
plugin can scrape a port different from the inference port. When metricsPort
is non-zero in EndpointPickerConfig plugin parameters, it overrides the port
derived from the endpoint's MetricsHost; zero (default) preserves existing
behavior.

This restores functionality that was available via --model-server-metrics-port
but lost when that flag was removed in kubernetes-sigs#2441 without an equivalent plugin
config parameter.
@janghyukjin janghyukjin force-pushed the feat/metrics-datasource-port branch from c99f3f8 to ba507e6 Compare April 2, 2026 11:13
@elevran
Copy link
Copy Markdown
Contributor

elevran commented Apr 2, 2026

@janghyukjin thanks for the PR. The metrics port was intentionally dropped from CLI and configuration. See #1886.
Is there a specific use case you're trying to address where inference and metrics can not be run on the same port?

@janghyukjin
Copy link
Copy Markdown
Author

@elevran Thanks for the context! There are practical scenarios where separating inference and metrics ports is necessary.

A common case is model servers that expose metrics on a dedicated port, where metrics are intentionally decoupled from inference.

In Istio mTLS STRICT mode, EPP's direct pod-IP scraping bypasses Service-level configuration and results in passthrough traffic without mTLS upgrade, leading to failed scrapes. Using a separate metrics port allows operators to control interception or policy independently without affecting inference.

This PR does not change default behavior and only restores the ability to configure a separate metrics port, which was previously supported via --model-server-metrics-port but is currently not expressible via EndpointPickerConfig.

Keeping the configuration scoped to the metrics-data-source plugin rather than InferencePool.spec, as this is an EPP scraping concern rather than a pool-level attribute. Open to feedback if the preferred direction differs.

@elevran
Copy link
Copy Markdown
Contributor

elevran commented Apr 2, 2026

The main concern I have is how this would work for multi-port inference pool. I presume the use case and reasoning would include those as well. How should that be handled?

@janghyukjin
Copy link
Copy Markdown
Author

@elevran Thanks for raising this. My assumption in this PR is that the metrics endpoint is pod-level rather than serving-port-level.

Even with a multi-port InferencePool, the common case is that a backend pod exposes a single /metrics endpoint for the whole process, regardless of how many serving ports it has. In that case, a single plugin-level metricsPort remains sufficient, because it overrides the scrape port for the pod's metrics endpoint rather than mapping to individual serving ports.

If metricsPort is not set, the existing behavior remains unchanged and the port derived from MetricsHost is used as before.

If there are use cases where metrics need to be differentiated per serving port, that would likely be a separate discussion at a different scope, potentially in InferencePool.spec. I was aiming to keep this PR scoped to the EPP scraping configuration, where a single process-level metrics port is the common case.

Thanks!

@elevran
Copy link
Copy Markdown
Contributor

elevran commented Apr 2, 2026

Even with a multi-port InferencePool, the common case is that a backend pod exposes a single /metrics endpoint for the whole process, regardless of how many serving ports it has. In that case, a single plugin-level metricsPort remains sufficient, because it overrides the scrape port for the pod's metrics endpoint rather than mapping to individual serving ports.

I don't think this is true of vLLM with data parallel > 1.
Note that the existing code explicitly schedules to a specific serving port based on its metrics. There is no code that takes a single /metrics output and can split it by serving port for subsequent scheduling decisions.
EPP creates a different "pod" (or more correctly an Endpoint) for each port defined in the inference pool. See datastore.podUpdateOrAddIfNotExist

I understand (and agree) with the use cases. But I'd request that we address this across the board and not as a special case.
I would argue that the InferencePool should be extended to support additional port groups

  • serving over HTTP
  • serving over grpc (if different than above - AFAIK, the model servers do not currently support OpenAI API and grpc on the same port HTTP. For example, vllm has vllm.entrypoints.openai.api_server (typically port 8000) and vllm.entrypoints.grpc_server (default port 50051))
  • metrics (if different than serving ports)

@ahg-g @kfswain @nirrozenbaum @danehans

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@janghyukjin
Copy link
Copy Markdown
Author

@elevran Thanks for the detailed explanation — that makes sense. :)

You're right that EPP already treats each serving port as a distinct endpoint and makes scheduling decisions based on per-port metrics. In that context, a single metrics port at the pod level is not always sufficient, especially for vLLM with data parallelism.

The use case I had in mind (Istio mTLS STRICT where EPP's pod-IP scraping bypasses mTLS) still exists, but I agree it's better addressed as part of the InferencePool port group design rather than a plugin-level workaround.

Happy to contribute toward that direction if useful — whether that's drafting a design proposal or working on an incremental PR.

Thanks again!

@elevran
Copy link
Copy Markdown
Contributor

elevran commented Apr 6, 2026

@janghyukjin perhaps resurrect #1396? There was no interest at the time hence closed due to lifecycle. Maybe your use case will resurrect the issue/discussion?

@janghyukjin
Copy link
Copy Markdown
Author

@elevran That makes sense, thanks for the suggestion!

Happy to revive #1396 and continue the discussion there. I can summarize the use case (separate metrics port, e.g. for environments like Istio mTLS STRICT) and how it relates to the current gaps in InferencePool port role specification.

I'll go ahead and add context to #1396 to restart the discussion, unless you'd prefer a different approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants