Generalize the request type passed down the framework plugins: rename LLM->Inference by RyanRosario · Pull Request #2673 · kubernetes-sigs/gateway-api-inference-extension

RyanRosario · 2026-03-23T22:06:08Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Enables direct application across various GenAI models, not only OpenAI format, without rewriting the core admission, mutation, or scheduling flows. Pluggable parsers can now intercept raw request bytes and construct a generic InferenceRequest upfront, giving the EPP the flexibility to route, process, and score payloads transparently regardless of the original protocol.

Which issue(s) this PR fixes:
Fixes #2447

Does this PR introduce a user-facing change?:

Enables the user to use protocols other than OpenAI via the generic InferenceRequest interface.

k8s-ci-robot · 2026-03-23T22:06:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RyanRosario
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2026-03-23T22:06:17Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`014fd18`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69d5769a52ef490008288b0b
😎 Deploy Preview	https://deploy-preview-2673--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-03-23T22:06:20Z

Hi @RyanRosario. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pkg/epp/framework/interface/scheduling/types.go

design_request_handling_refactor.md

RyanRosario · 2026-03-25T23:42:39Z

@zetxqx For now, I've had to commit some other files into this PR to get PROW to pass. I am not sure what the issue is here, but I want to keep the ball rolling. go.mod, go.sum, Makefile and kal.yaml should/will not be part of the final PR.

zetxqx · 2026-03-26T00:00:45Z

.github/workflows/kal.yml

        uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # tag=v5.5.0
        with:
-          go-version-file: 'go.work'
+          go-version-file: 'go.mod'


This should not be changed

Correct. I committed this to start the review process and get PROW to pass. This needs to be removed, but I need to fix my local environment first. Thanks.

pkg/bbr/server/options_test.go

pkg/common/observability/logging/options_test.go

pkg/epp/datalayer/data_graph_test.go

pkg/epp/framework/interface/scheduling/types_test.go

pkg/epp/framework/interface/requesthandling/types.go

pkg/epp/framework/interface/scheduling/types.go

pkg/epp/requestcontrol/admission.go

pkg/epp/requestcontrol/director.go

zetxqx · 2026-03-26T00:28:45Z

pkg/epp/handlers/server.go

can we also rename the whole package name from handlers to requesthandling? We can do it in a separate PR.

pkg/epp/framework/interface/requestcontrol/plugins.go

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/requestcontrol_hooks_test.go

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/scorer_test.go

pkg/epp/framework/plugins/scheduling/scorer/runningrequests/running_test.go

RyanRosario · 2026-03-26T21:33:45Z

@zetxqx I still have a few comments to address but wanted to address the rest in this PR.

pkg/epp/framework/interface/scheduling/types.go

zetxqx · 2026-04-03T19:00:24Z

+1 on @kaushikmitr can we do a rebase? @RyanRosario

RyanRosario · 2026-04-04T00:28:26Z

/retest

RyanRosario · 2026-04-06T16:40:57Z

/retest

RyanRosario · 2026-04-06T18:54:44Z

@zetxqx @kaushikmitr Rebase complete. All tests pass. Ready for review.

kaushikmitr · 2026-04-06T19:31:57Z

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

-// buildTrainingEntry constructs a training entry from actual latency measurements.
-// If endpointRoleLabel is configured, it extracts the role from the endpoint's labels and
-// populates the PodType field, enabling role-specific model training.
-func buildTrainingEntry(


why was this removed?

+1, we should focus on refactoring on using the new type. Since the PR is already large, any other refactoring even it's reasonable can be very confusing for reviewers.

kaushikmitr · 2026-04-06T19:33:03Z

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

 ) {
 	logger := log.FromContext(ctx)
-	targetName := predictedLatencyCtx.targetMetadata.NamespacedName.Name
-	if m := predictedLatencyCtx.prefillTargetMetadata; m != nil {


same this logic is needed to support disagg serving.

kaushikmitr · 2026-04-06T19:33:19Z

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

 	ctx context.Context,
 	predictor latencypredictor.PredictorInterface,
 	streamingMode bool,
-	endpointRoleLabel string,


this is also needed.

kaushikmitr · 2026-04-06T19:33:49Z

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

 	prefixCacheScore float64,
 ) {
 	logger := log.FromContext(ctx)
-	entry := buildTrainingEntry(


yes this method is also needed.

kaushikmitr · 2026-04-06T19:34:23Z

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

 	logger := log.FromContext(ctx)
-	targetName := predictedLatencyCtx.targetMetadata.NamespacedName.Name
-	if storedPred, ok := predictedLatencyCtx.predictionsForScheduling[targetName]; ok {
-		logger.V(logutil.DEBUG).Info("first TPOT from stored prediction", "value_ms", storedPred.TPOT)


same i think some of this code was removed inadvertantly?

kaushikmitr · 2026-04-06T19:37:22Z

pkg/epp/framework/plugins/scheduling/scorer/predictedlatency/latencypredictor_helper.go

-		)
+		in := latencypredictor.PredictionRequest{
+			KVCachePercentage:  m.KVCacheUsagePercent,
+			InputTokenLength:   len(strings.Fields(predictedLatencyCtx.schedulingRequest.Body.Completions.Prompt.PlainText())),


this wont work for other APIs like chat completion. Use the same as before

RyanRosario · 2026-04-06T19:39:57Z

@kaushikmitr Thanks Kaushik. These will all be restored. Some of this was removed inadvertently, automatically when I made other changes.

kaushikmitr · 2026-04-06T19:41:16Z

@RyanRosario I think the rebase resolved conflicts by keeping the PR's old code and discarding main's improvements. Just focussing on scorer/predictedlatency/ I think we should re-do the rebase, accepting main's versions of all functions in scorer/predictedlatency/ and then applying only the rename (LLMRequest → InferenceRequest) on top.

zetxqx · 2026-04-06T21:42:55Z

pkg/epp/framework/interface/requesthandling/types.go

+	// ParsedBody contains the unmarshaled request payload.
+	// Note: Because this handles multiple protocols, this field is strictly expected
+	// to be either a map[string]any (for HTTP/JSON) or a proto.Message (for gRPC).
+	ParsedBody any `json:"-"`


the ParsedBody is strong typed and renamed to Payload now , can you move the strong type here as well?

gateway-api-inference-extension/pkg/epp/framework/interface/scheduling/types.go

Lines 139 to 142 in eaaa946

// Payload contains the unmarshaled request payload or raw bytes.

// If the payload is unmarshaled, we can perform advanced processing (like prefix cache aware routing).

// If it remains as raw bytes, such processing may not be supported.

Payload RequestPayload `json:"-"`

zetxqx · 2026-04-06T21:46:09Z

pkg/epp/framework/interface/requesthandling/types_test.go

It seems some unittest is dropped TestPrompt_PlainText, can we make sure all the unittests(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/eaaa9469efdf847656b0c0ce7ecb5c5928d84e2f/pkg/epp/framework/interface/scheduling/types_test.go) are moved here?

zetxqx · 2026-04-06T21:49:46Z

pkg/epp/framework/interface/scheduling/types.go

+// It is populated by external tokenization plugins (e.g., via a PrepareData plugin)
+// and consumed by scheduling plugins that benefit from actual token data
+// (e.g., prefix cache scoring, latency prediction).
+type TokenizedPrompt struct {


gateway-api-inference-extension/pkg/epp/framework/interface/scheduling/types.go

Lines 68 to 88 in eaaa946

type TokenizedPrompt struct {

// TokenIDs are the token IDs for the prompt, including multimodal placeholder tokens.

TokenIDs []uint32

// MultiModalFeatures holds one entry per multimodal item in prompt order.

// Nil if the prompt contains no multimodal content.

MultiModalFeatures []MultiModalFeature

}

// MultiModalFeature holds all data needed for precise prefix-cache scoring of a single

// multimodal item. Items are ordered by token position within the prompt.

// Currently only ModalityImage is supported.

type MultiModalFeature struct {

// Modality identifies the type of content.

Modality Modality

// Hash is the content hash of the item, used for KV-cache reuse across requests.

Hash string

// Offset is the index of the first placeholder token for this item in TokenIDs.

Offset int

// Length is the number of placeholder tokens this item occupies in TokenIDs.

Length int

}

These field should not be dropped.

zetxqx · 2026-04-06T21:53:13Z

pkg/epp/framework/plugins/requestcontrol/requestattributereporter/plugin_test.go

 			}
 			if plugin == nil {
 				t.Fatalf("New() returned nil plugin without error")
+				return


super nit: this is not needed since Fatalf will return

zetxqx · 2026-04-06T21:55:14Z

pkg/epp/framework/plugins/requesthandling/parsers/openai/openai.go

 }

-// ParseResponse extracts usage metadata from the provider's response.
+// ParseResponse extracts usage metada	ta from the provider's response.


super nit: metadata

zetxqx · 2026-04-06T22:06:21Z

pkg/epp/handlers/server.go

 				body = []byte{}

-				reqCtx, err = s.director.HandleRequest(ctx, reqCtx)
+				parsedBody, processErr := s.director.ProcessRequestBody(ctx, reqCtx, s.parser)


here we can just use s.parser to parser the request instead of using the director's newly added method.

zetxqx · 2026-04-06T22:20:05Z

pkg/epp/requestcontrol/director.go

 	// TODO: to extend fallback functionality, handle cases where target pod is unavailable
 	// https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/1224
-	d.runResponseHeaderPlugins(ctx, reqCtx.SchedulingRequest, response, reqCtx.TargetPod)
+	d.runResponseReceivedPlugins(ctx, reqCtx.SchedulingRequest, response, reqCtx.TargetPod)


we should change it back to runResponseHeaderPlugins

zetxqx · 2026-04-06T22:21:33Z

pkg/epp/requestcontrol/director.go

+func (d *Director) ProcessRequestBody(ctx context.Context, reqCtx *handlers.RequestContext, parser fwkrh.Parser) (*fwkrh.RequestBody, error) {
+	requestBody, err := parser.ParseRequest(ctx, reqCtx.Request.RawBody, reqCtx.Request.Headers)
 	if err != nil {
 		return nil, errcommon.Error{Code: errcommon.BadRequest, Msg: err.Error()}
 	}

-	switch v := llmRequestBody.Payload.(type) {
-	case fwksched.PayloadProto:
-		// Protos are not currently mutated, return as-is.
-		reqCtx.RequestSize = len(reqCtx.Request.RawBody)
-	case fwksched.PayloadMap:
+	switch v := requestBody.ParsedBody.(type) {
+	case map[string]any:
 		if err := d.mutateAndRepackage(ctx, reqCtx, v); err != nil {
 			return nil, err
 		}
-	case fwksched.RawPayload:
-		reqCtx.RequestSize = len(reqCtx.Request.RawBody)
 	default:
-		return nil, errcommon.Error{Code: errcommon.BadRequest, Msg: "Unsupported llmRequest parsedBody"}
+		// For other types (like gRPC, custom structs) or nil, we just set the request size.
+		reqCtx.RequestSize = len(reqCtx.Request.RawBody)
 	}
-	return llmRequestBody, nil
+	return requestBody, nil
 }


instead of exposing this ProcessRequestBody, we can just move the logic to the handlers/server.go

zetxqx · 2026-04-06T22:24:02Z

pkg/epp/saturationdetector/framework/plugins/concurrencydetector/config.go

the concurrencydetector has been moved out of here, so we can drop this

zetxqx · 2026-04-06T22:25:29Z

test/e2e/epp/e2e_test.go

Can you rebase again, I don't think this file needs to be modified

RyanRosario · 2026-04-07T06:57:49Z

@RyanRosario I think the rebase resolved conflicts by keeping the PR's old code and discarding main's improvements. Just focussing on scorer/predictedlatency/ I think we should re-do the rebase, accepting main's versions of all functions in scorer/predictedlatency/ and then applying only the rename (LLMRequest → InferenceRequest) on top.

Thank you for your patience. That seems to be the issue.

k8s-ci-robot · 2026-04-07T21:33:24Z

@RyanRosario: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-gateway-api-inference-extension-verify-main	`014fd18`	link	true	`/test pull-gateway-api-inference-extension-verify-main`
pull-gateway-api-inference-extension-test-unit-main	`014fd18`	link	true	`/test pull-gateway-api-inference-extension-test-unit-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 23, 2026

k8s-ci-robot requested review from ahg-g and liu-cong March 23, 2026 22:06

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 23, 2026

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 23, 2026

zetxqx reviewed Mar 23, 2026

View reviewed changes

pkg/epp/framework/interface/scheduling/types.go Outdated Show resolved Hide resolved

RyanRosario force-pushed the issue-2447 branch from 4476f4a to 6ae537d Compare March 24, 2026 00:51

hexfusion reviewed Mar 24, 2026

View reviewed changes

design_request_handling_refactor.md Outdated Show resolved Hide resolved

RyanRosario mentioned this pull request Mar 24, 2026

Extract shared request/response types from scheduling package into a dependency-free package #2677

Open

RyanRosario force-pushed the issue-2447 branch from 087a158 to 8eec71b Compare March 25, 2026 03:28

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026

RyanRosario force-pushed the issue-2447 branch 2 times, most recently from 1cce76e to f6b5ec6 Compare March 25, 2026 17:58

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026

RyanRosario force-pushed the issue-2447 branch from 8c75863 to b606bac Compare March 25, 2026 23:36

zetxqx reviewed Mar 26, 2026

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2026

zetxqx reviewed Mar 27, 2026

View reviewed changes

pkg/epp/framework/interface/scheduling/types.go Outdated Show resolved Hide resolved

RyanRosario changed the title ~~Generalize the request type passed down the framework plugins~~ Generalize the request type passed down the framework plugins: move parser out of director Mar 27, 2026

RyanRosario force-pushed the issue-2447 branch from 253977d to cb1c67d Compare March 28, 2026 08:35

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2026

RyanRosario changed the title ~~Generalize the request type passed down the framework plugins: move parser out of director~~ [WIP] Generalize the request type passed down the framework plugins: move parser out of director Mar 28, 2026

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 28, 2026

RyanRosario force-pushed the issue-2447 branch from b2eba95 to 599e701 Compare April 3, 2026 22:59

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2026

RyanRosario changed the title ~~[WIP] Generalize the request type passed down the framework plugins: move parser out of director~~ Generalize the request type passed down the framework plugins: move parser out of director Apr 3, 2026

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2026

RyanRosario force-pushed the issue-2447 branch from 2b49795 to 5200f01 Compare April 4, 2026 00:07

RyanRosario force-pushed the issue-2447 branch 2 times, most recently from 0846f3e to 863a894 Compare April 4, 2026 05:13

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2026

RyanRosario force-pushed the issue-2447 branch from 863a894 to 5360ea4 Compare April 6, 2026 17:16

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 6, 2026

RyanRosario force-pushed the issue-2447 branch from 5360ea4 to 20996bc Compare April 6, 2026 18:15

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2026

kaushikmitr reviewed Apr 6, 2026

View reviewed changes

zetxqx reviewed Apr 6, 2026

View reviewed changes

RyanRosario force-pushed the issue-2447 branch from 11411ae to 20996bc Compare April 7, 2026 20:04

Rename LLM ->Inference

014fd18

RyanRosario force-pushed the issue-2447 branch from 20996bc to 014fd18 Compare April 7, 2026 21:26

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 7, 2026

RyanRosario changed the title ~~Generalize the request type passed down the framework plugins: move parser out of director~~ Generalize the request type passed down the framework plugins: rename LLM->Inference Apr 8, 2026

	// Payload contains the unmarshaled request payload or raw bytes.
	// If the payload is unmarshaled, we can perform advanced processing (like prefix cache aware routing).
	// If it remains as raw bytes, such processing may not be supported.
	Payload RequestPayload `json:"-"`

	type TokenizedPrompt struct {
	// TokenIDs are the token IDs for the prompt, including multimodal placeholder tokens.
	TokenIDs []uint32
	// MultiModalFeatures holds one entry per multimodal item in prompt order.
	// Nil if the prompt contains no multimodal content.
	MultiModalFeatures []MultiModalFeature
	}

	// MultiModalFeature holds all data needed for precise prefix-cache scoring of a single
	// multimodal item. Items are ordered by token position within the prompt.
	// Currently only ModalityImage is supported.
	type MultiModalFeature struct {
	// Modality identifies the type of content.
	Modality Modality
	// Hash is the content hash of the item, used for KV-cache reuse across requests.
	Hash string
	// Offset is the index of the first placeholder token for this item in TokenIDs.
	Offset int
	// Length is the number of placeholder tokens this item occupies in TokenIDs.
	Length int
	}

Conversation

RyanRosario commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Mar 23, 2026

Uh oh!

netlify bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

RyanRosario commented Mar 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RyanRosario commented Mar 26, 2026

Uh oh!

Uh oh!

zetxqx commented Apr 3, 2026

Uh oh!

RyanRosario commented Apr 4, 2026

Uh oh!

RyanRosario commented Apr 6, 2026

Uh oh!

RyanRosario commented Apr 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanRosario commented Apr 6, 2026

Uh oh!

kaushikmitr commented Apr 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanRosario commented Mar 23, 2026 •

edited

Loading

netlify bot commented Mar 23, 2026 •

edited

Loading

k8s-ci-robot commented Apr 7, 2026 •

edited

Loading