Skip to content

Generalize the request type passed down the framework plugins: rename LLM->Inference#2673

Open
RyanRosario wants to merge 1 commit intokubernetes-sigs:mainfrom
RyanRosario:issue-2447
Open

Generalize the request type passed down the framework plugins: rename LLM->Inference#2673
RyanRosario wants to merge 1 commit intokubernetes-sigs:mainfrom
RyanRosario:issue-2447

Conversation

@RyanRosario
Copy link
Copy Markdown
Contributor

@RyanRosario RyanRosario commented Mar 23, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Enables direct application across various GenAI models, not only OpenAI format, without rewriting the core admission, mutation, or scheduling flows. Pluggable parsers can now intercept raw request bytes and construct a generic InferenceRequest upfront, giving the EPP the flexibility to route, process, and score payloads transparently regardless of the original protocol.

Which issue(s) this PR fixes:
Fixes #2447

Does this PR introduce a user-facing change?:

Enables the user to use protocols other than OpenAI via the generic InferenceRequest interface.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 23, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RyanRosario
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 23, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 014fd18
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69d5769a52ef490008288b0b
😎 Deploy Preview https://deploy-preview-2673--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and liu-cong March 23, 2026 22:06
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 23, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @RyanRosario. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 23, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
@RyanRosario RyanRosario force-pushed the issue-2447 branch 2 times, most recently from 1cce76e to f6b5ec6 Compare March 25, 2026 17:58
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

@zetxqx For now, I've had to commit some other files into this PR to get PROW to pass. I am not sure what the issue is here, but I want to keep the ball rolling. go.mod, go.sum, Makefile and kal.yaml should/will not be part of the final PR.

uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # tag=v5.5.0
with:
go-version-file: 'go.work'
go-version-file: 'go.mod'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be changed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I committed this to start the review process and get PROW to pass. This needs to be removed, but I need to fix my local environment first. Thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also rename the whole package name from handlers to requesthandling? We can do it in a separate PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

@zetxqx I still have a few comments to address but wanted to address the rest in this PR.

@RyanRosario RyanRosario changed the title Generalize the request type passed down the framework plugins Generalize the request type passed down the framework plugins: move parser out of director Mar 27, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2026
@RyanRosario RyanRosario changed the title Generalize the request type passed down the framework plugins: move parser out of director [WIP] Generalize the request type passed down the framework plugins: move parser out of director Mar 28, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 28, 2026
@zetxqx
Copy link
Copy Markdown
Contributor

zetxqx commented Apr 3, 2026

+1 on @kaushikmitr can we do a rebase? @RyanRosario

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2026
@RyanRosario RyanRosario changed the title [WIP] Generalize the request type passed down the framework plugins: move parser out of director Generalize the request type passed down the framework plugins: move parser out of director Apr 3, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

/retest

@RyanRosario RyanRosario force-pushed the issue-2447 branch 2 times, most recently from 0846f3e to 863a894 Compare April 4, 2026 05:13
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 6, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

@zetxqx @kaushikmitr Rebase complete. All tests pass. Ready for review.

// buildTrainingEntry constructs a training entry from actual latency measurements.
// If endpointRoleLabel is configured, it extracts the role from the endpoint's labels and
// populates the PodType field, enabling role-specific model training.
func buildTrainingEntry(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should focus on refactoring on using the new type. Since the PR is already large, any other refactoring even it's reasonable can be very confusing for reviewers.

) {
logger := log.FromContext(ctx)
targetName := predictedLatencyCtx.targetMetadata.NamespacedName.Name
if m := predictedLatencyCtx.prefillTargetMetadata; m != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same this logic is needed to support disagg serving.

ctx context.Context,
predictor latencypredictor.PredictorInterface,
streamingMode bool,
endpointRoleLabel string,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also needed.

prefixCacheScore float64,
) {
logger := log.FromContext(ctx)
entry := buildTrainingEntry(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this method is also needed.

logger := log.FromContext(ctx)
targetName := predictedLatencyCtx.targetMetadata.NamespacedName.Name
if storedPred, ok := predictedLatencyCtx.predictionsForScheduling[targetName]; ok {
logger.V(logutil.DEBUG).Info("first TPOT from stored prediction", "value_ms", storedPred.TPOT)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same i think some of this code was removed inadvertantly?

)
in := latencypredictor.PredictionRequest{
KVCachePercentage: m.KVCacheUsagePercent,
InputTokenLength: len(strings.Fields(predictedLatencyCtx.schedulingRequest.Body.Completions.Prompt.PlainText())),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wont work for other APIs like chat completion. Use the same as before

@RyanRosario
Copy link
Copy Markdown
Contributor Author

@kaushikmitr Thanks Kaushik. These will all be restored. Some of this was removed inadvertently, automatically when I made other changes.

@kaushikmitr
Copy link
Copy Markdown
Contributor

@RyanRosario I think the rebase resolved conflicts by keeping the PR's old code and discarding main's improvements. Just focussing on scorer/predictedlatency/ I think we should re-do the rebase, accepting main's versions of all functions in scorer/predictedlatency/ and then applying only the rename (LLMRequest → InferenceRequest) on top.

// ParsedBody contains the unmarshaled request payload.
// Note: Because this handles multiple protocols, this field is strictly expected
// to be either a map[string]any (for HTTP/JSON) or a proto.Message (for gRPC).
ParsedBody any `json:"-"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ParsedBody is strong typed and renamed to Payload now , can you move the strong type here as well?

// Payload contains the unmarshaled request payload or raw bytes.
// If the payload is unmarshaled, we can perform advanced processing (like prefix cache aware routing).
// If it remains as raw bytes, such processing may not be supported.
Payload RequestPayload `json:"-"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// It is populated by external tokenization plugins (e.g., via a PrepareData plugin)
// and consumed by scheduling plugins that benefit from actual token data
// (e.g., prefix cache scoring, latency prediction).
type TokenizedPrompt struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type TokenizedPrompt struct {
// TokenIDs are the token IDs for the prompt, including multimodal placeholder tokens.
TokenIDs []uint32
// MultiModalFeatures holds one entry per multimodal item in prompt order.
// Nil if the prompt contains no multimodal content.
MultiModalFeatures []MultiModalFeature
}
// MultiModalFeature holds all data needed for precise prefix-cache scoring of a single
// multimodal item. Items are ordered by token position within the prompt.
// Currently only ModalityImage is supported.
type MultiModalFeature struct {
// Modality identifies the type of content.
Modality Modality
// Hash is the content hash of the item, used for KV-cache reuse across requests.
Hash string
// Offset is the index of the first placeholder token for this item in TokenIDs.
Offset int
// Length is the number of placeholder tokens this item occupies in TokenIDs.
Length int
}

These field should not be dropped.

}
if plugin == nil {
t.Fatalf("New() returned nil plugin without error")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: this is not needed since Fatalf will return

}

// ParseResponse extracts usage metadata from the provider's response.
// ParseResponse extracts usage metada ta from the provider's response.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: metadata

body = []byte{}

reqCtx, err = s.director.HandleRequest(ctx, reqCtx)
parsedBody, processErr := s.director.ProcessRequestBody(ctx, reqCtx, s.parser)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can just use s.parser to parser the request instead of using the director's newly added method.

// TODO: to extend fallback functionality, handle cases where target pod is unavailable
// https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/1224
d.runResponseHeaderPlugins(ctx, reqCtx.SchedulingRequest, response, reqCtx.TargetPod)
d.runResponseReceivedPlugins(ctx, reqCtx.SchedulingRequest, response, reqCtx.TargetPod)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should change it back to runResponseHeaderPlugins

Comment on lines 202 to 218
func (d *Director) ProcessRequestBody(ctx context.Context, reqCtx *handlers.RequestContext, parser fwkrh.Parser) (*fwkrh.RequestBody, error) {
requestBody, err := parser.ParseRequest(ctx, reqCtx.Request.RawBody, reqCtx.Request.Headers)
if err != nil {
return nil, errcommon.Error{Code: errcommon.BadRequest, Msg: err.Error()}
}

switch v := llmRequestBody.Payload.(type) {
case fwksched.PayloadProto:
// Protos are not currently mutated, return as-is.
reqCtx.RequestSize = len(reqCtx.Request.RawBody)
case fwksched.PayloadMap:
switch v := requestBody.ParsedBody.(type) {
case map[string]any:
if err := d.mutateAndRepackage(ctx, reqCtx, v); err != nil {
return nil, err
}
case fwksched.RawPayload:
reqCtx.RequestSize = len(reqCtx.Request.RawBody)
default:
return nil, errcommon.Error{Code: errcommon.BadRequest, Msg: "Unsupported llmRequest parsedBody"}
// For other types (like gRPC, custom structs) or nil, we just set the request size.
reqCtx.RequestSize = len(reqCtx.Request.RawBody)
}
return llmRequestBody, nil
return requestBody, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of exposing this ProcessRequestBody, we can just move the logic to the handlers/server.go

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concurrencydetector has been moved out of here, so we can drop this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase again, I don't think this file needs to be modified

@RyanRosario
Copy link
Copy Markdown
Contributor Author

@RyanRosario I think the rebase resolved conflicts by keeping the PR's old code and discarding main's improvements. Just focussing on scorer/predictedlatency/ I think we should re-do the rebase, accepting main's versions of all functions in scorer/predictedlatency/ and then applying only the rename (LLMRequest → InferenceRequest) on top.

Thank you for your patience. That seems to be the issue.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 7, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Apr 7, 2026

@RyanRosario: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-inference-extension-verify-main 014fd18 link true /test pull-gateway-api-inference-extension-verify-main
pull-gateway-api-inference-extension-test-unit-main 014fd18 link true /test pull-gateway-api-inference-extension-test-unit-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@RyanRosario RyanRosario changed the title Generalize the request type passed down the framework plugins: move parser out of director Generalize the request type passed down the framework plugins: rename LLM->Inference Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generalize the request type passed down the framework plugins

5 participants